Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/#review55979 --- Ship it! Ship It! - Bill Farner On Oct. 9, 2014, 12:15 a.m., Joshua Cohen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/ --- (Updated Oct. 9, 2014, 12:15 a.m.) Review request for Aurora, Bill Farner and Zameer Manji. Repository: aurora Description --- Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly. Diffs - build.gradle 8f7eed0b58a00f8e5e0521512975087d5788b5b6 src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java eaf63382f689a045f837847736ef24fa75dee874 src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java 62154045f49c5b23949dc739d735c3e5d3680b89 Diff: https://reviews.apache.org/r/26469/diff/ Testing --- ./gradlew build -Pq $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html HTTP/1.1 200 OK Content-Type: text/html Vary: Accept-Encoding Content-Length: 0 Server: Jetty(7.6.15.v20140411) Thanks, Joshua Cohen
Re: Review Request 26394: Deprecating Identity struct (renaming fields).
On Oct. 9, 2014, 12:05 a.m., Bill Farner wrote: Thanks again for leading this - i'm very happy to see momentum on removing `Identity`. Stepping back - i wonder if we should re-evaluate the way we do field deprecations now that we've established a protocol with JIRA and releases. This might mean we don't need to do the `DEPRECATED` mangling. What do you think? Maxim Khutornenko wrote: I still think visual code reminder is quite beneficial in avoiding the use of deprecated fields. The between-release time is just too long to remember what needs to be avoided when coding far away from thrift schema. Besides, it helps validate the concept and will facilitate later removal. Right, they're basically our poor-man's deprecation warnings. However, this field and file (api.thrift) makes up part of our API, and any consumers of the thrift definition, or `/apibeta` will be broken by it. I don't think we can get around a formal deprecation process for field renames. - Bill --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26394/#review55907 --- On Oct. 8, 2014, 11:39 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26394/ --- (Updated Oct. 8, 2014, 11:39 p.m.) Review request for Aurora, David McLaughlin, Mark Chu-Carroll, and Bill Farner. Bugs: AURORA-84 https://issues.apache.org/jira/browse/AURORA-84 Repository: aurora Description --- Part 1 of Identity struct deprecation: renaming fields. Diffs - examples/jobs/hello_world.aurora fc7877c3a60e56e301d9ee1fabd73446afca7236 src/main/java/org/apache/aurora/scheduler/TaskIdGenerator.java 5c75cc8cae53edfa069c85c37ebad34774682081 src/main/java/org/apache/aurora/scheduler/async/Preemptor.java e9f251508257cd7287ff00773e0073a3cd130df8 src/main/java/org/apache/aurora/scheduler/base/JobKeys.java a76c3fac71b35115064fba6644cff0066fd9e630 src/main/java/org/apache/aurora/scheduler/base/Query.java eded7a59eb394748b93d7fbc085a1bdf64b043cc src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java 865742171c11fbe5cf1469a69dd7258ec1be28c2 src/main/java/org/apache/aurora/scheduler/http/Utilization.java a0cb7bf56aeb7edd92b25d8d69a739d87452777a src/main/java/org/apache/aurora/scheduler/storage/StorageBackfill.java 8c20ab6f2bebf1d1c0f91fed3f1e48361cdf45d6 src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 37176237fac336413267f3c8bb4e1b9a6255150c src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 5dcae4a6132026504cf02093ad4c68ab521e4ab8 src/main/python/apache/aurora/client/api/instance_watcher.py b390aa8993205f1f6938f8c295e3c16a0bf4df6d src/main/python/apache/aurora/client/api/sla.py b9b64680b15f5395ed6aca681b9b1c30ffe2822c src/main/python/apache/aurora/client/cli/task.py c41484bdc27266443bc4e139e1ebb362a59be0f9 src/main/python/apache/aurora/client/commands/admin.py deee0250f3ba9837feeb92acc654f5b3b68b4e0f src/main/python/apache/aurora/client/commands/core.py 58f419e674f1a9a0ae9da6faa2e39c8167bab597 src/main/python/apache/aurora/client/commands/ssh.py d2b8bf675556b924d3d63b545d036dc48a081486 src/main/python/apache/aurora/config/thrift.py 288fb40f65629c8fd4eb7d92c8bf02369237de3b src/main/python/apache/aurora/executor/aurora_executor.py 2c6423d096656f426a4385f4edef6875ebad7049 src/main/python/apache/aurora/executor/common/announcer.py 74b2114d1ede7a4a4a68b78144f9b08bc994bd87 src/main/python/apache/aurora/executor/thermos_task_runner.py 7b346e253677ee9b42c57782f7f67ff63b6a0083 src/main/resources/scheduler/assets/js/controllers.js 7e9037ee921b009dc2b7c5adcf057bedebb01632 src/main/resources/scheduler/assets/js/services.js b744f375411e09b7f776e4a05ee5961227143439 src/main/thrift/org/apache/aurora/gen/api.thrift 8794731f4b3f1033588bdfa33c292e4796319a2a src/test/java/org/apache/aurora/scheduler/MesosTaskFactoryImplTest.java e96974764844b5d1a3a05f6996075fccee209594 src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 371ae87f5954fa5f092db1f6d21e2291d7576173 src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 606c4434b7158220ccf1403b6deac939021fee31 src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java f2d153f446247032ad9d8d173fb70870dbfdcca1 src/test/java/org/apache/aurora/scheduler/async/PreemptorImplTest.java 8ee84cda8670d117e2efa2d1a114da6f0d8315d6 src/test/java/org/apache/aurora/scheduler/async/RescheduleCalculatorImplTest.java 131bd826dfe47f40f3c27f29c095ed42953e316c
Re: Review Request 26376: Implementing non-prod MTTA/R SLA metrics.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26376/#review55984 --- Ship it! src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java https://reviews.apache.org/r/26376/#comment96329 Can you instead export these values into a MapString, AtomicLong and validate the full map at the end? That would firm up the value checking here. Something like: expect(...makeGauge(..)).andAnswer( new IAnswer..() { ... AtomicLong value = new AtomicLong(). map.put(key, value). return new StatImpl()...; } - Bill Farner On Oct. 6, 2014, 7:10 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26376/ --- (Updated Oct. 6, 2014, 7:10 p.m.) Review request for Aurora and Bill Farner. Bugs: AURORA-774 https://issues.apache.org/jira/browse/AURORA-774 Repository: aurora Description --- Implementing non-prod MTTA/R SLA metrics. Diffs - src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java dca680415426be2bc760875d5774c9e9399ea94b src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 33b6bbe4b39d516595276128bbfa0686d0bb9cbd src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java aeb90bbb822254cbe4691e45092b9581596ad800 src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 96a04389e6b14c4d1c0bdb4d06abc046e7ea2151 Diff: https://reviews.apache.org/r/26376/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 26383: Health Check Disabler
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- src/main/python/apache/aurora/executor/common/health_checker.py https://reviews.apache.org/r/26383/#comment96330 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. - Bill Farner On Oct. 9, 2014, 1:56 a.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 9, 2014, 1:56 a.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review55986 --- src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java https://reviews.apache.org/r/26478/#comment96339 Method interceptors should work fine for package-priviate methods [1]. Just for my understanding - are you going with protected because package-private class + protected method is slightly more restrictive? [1] https://github.com/google/guice/wiki/AOP#limitations src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java https://reviews.apache.org/r/26478/#comment96331 First sentence contains mixed thoughts for an odd sentence, rewrite? src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java https://reviews.apache.org/r/26478/#comment96332 Please make this more verbose to explain what exactly is backwards in compatible. Don't worry about being too wordy. I'd also like to see a TODO to remove this flag, since it should become the default. Perhaps this should be included in the 0.7.0 deprecations list. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96334 Tasks.SCHEDULED_TO_INFO src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96335 Do you have numbers on how much time this routine saves when compared to a full deep copy and unsetting the field you're trying to clear? Unless it's a significant contributor to overall snapshot performance, i'd prefer the more concise code of the latter approach. My hunch is that this one might save O(100 ms), but the ones below are noise and not worth the code. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96336 Inline this on :114. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96337 It would be nice to see a brief comment here to give the reader an overview of what's going on in the routine. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96338 remove newline src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96340 The mention of taskConfigId is vague. There should be a doc somewhere indicating that `taskConfigId` is referencing the _index_ of an entry in `taskConfigs`. src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96341 remove newline src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96342 Please doc. src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96343 The assertion error will be much more useful if you do `assertEquals(emptySet, b)` rather than `assertEquals(n, b.size())`. src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96344 Isn't this check redundant to the one below? src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96345 Can you try for `assertEquals` here as well, with an expected list of `DeduplicatedScheduledTask` objects? It will catch classes of bugs missed with this check (e.g. extra entries in the `partialTasks` list), and make it easier to diagnose a failed test. - Bill Farner On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 9, 2014, 2:39 a.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464
Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.
On Oct. 9, 2014, 2:32 p.m., Bill Farner wrote: Ship It! Hmm, i'm now getting a different coverage error when preparing to submit this: * What went wrong: Execution failed for task ':jacocoTestReport'. Thanks for adding the first test coverage to: org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove it from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). Values: coverage = 6 This remains after cleaning the reports directory. Do you see that one as well? - Bill --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/#review55979 --- On Oct. 9, 2014, 12:15 a.m., Joshua Cohen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/ --- (Updated Oct. 9, 2014, 12:15 a.m.) Review request for Aurora, Bill Farner and Zameer Manji. Repository: aurora Description --- Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly. Diffs - build.gradle 8f7eed0b58a00f8e5e0521512975087d5788b5b6 src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java eaf63382f689a045f837847736ef24fa75dee874 src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java 62154045f49c5b23949dc739d735c3e5d3680b89 Diff: https://reviews.apache.org/r/26469/diff/ Testing --- ./gradlew build -Pq $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html HTTP/1.1 200 OK Content-Type: text/html Vary: Accept-Encoding Content-Length: 0 Server: Jetty(7.6.15.v20140411) Thanks, Joshua Cohen
Re: Review Request 26425: Fixing quota checking for updates.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26425/#review55991 --- Ship it! Ship It! - Bill Farner On Oct. 8, 2014, 10:58 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26425/ --- (Updated Oct. 8, 2014, 10:58 p.m.) Review request for Aurora and Bill Farner. Bugs: AURORA-802 https://issues.apache.org/jira/browse/AURORA-802 Repository: aurora Description --- Rolling up prod consumption does not work for quota checking during job update intitiation where per-instance filtering is required. Splitting the QuotaManager interface into 2 use cases: - createJob/addInstances - startJobUpdate Reverting the IResourceAggregate abstraction in QuotaManager in favor of task/update objects to simplify handling. Diffs - src/main/java/org/apache/aurora/scheduler/quota/QuotaManager.java 2442b06f318c8d0cefe60296950baa1b916b42f7 src/main/java/org/apache/aurora/scheduler/quota/QuotaUtil.java d197dd515eb646cb4babadf22e9cf6480a919060 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 5dcae4a6132026504cf02093ad4c68ab521e4ab8 src/test/java/org/apache/aurora/scheduler/quota/QuotaManagerImplTest.java b58c8f363e6e7c72accaf590b2a7cb7bf24275ea src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java 02cd8f712fff3d283abf8e3eb1b4dcab1e762ac2 Diff: https://reviews.apache.org/r/26425/diff/ Testing --- ./gradlew -Pq build Also, tested various update scenarios in vagrant. Thanks, Maxim Khutornenko
Re: Review Request 26383: Health Check Disabler
On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. - Zameer --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 8, 2014, 6:56 p.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 8, 2014, 6:56 p.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26383: Health Check Disabler
On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Zameer Manji wrote: Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. Presumably we can trust the user here - health checking is after all opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies the code on our end. - Kevin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 8, 2014, 6:56 p.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 8, 2014, 6:56 p.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Review Request 26504: Restore JDK8 compatibility
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26504/ --- Review request for Aurora, Maxim Khutornenko and Zameer Manji. Repository: aurora Description --- I misconfigured my IDE and noticed that we'd stopped compiling on JDK8. Diffs - src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 7ecb05ae87058e7a25d88959cf629b2da17dc520 src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java d3c90416e48e25d83f28f966b21c018ee2a1ac27 Diff: https://reviews.apache.org/r/26504/diff/ Testing --- ./gradlew -Pq build With ``` diff --git a/build.gradle b/build.gradle index 8f7eed0..af3d4ba 100644 --- a/build.gradle +++ b/build.gradle @@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets' def thriftVersion = '0.9.1' compileJava { - sourceCompatibility = 1.7 - targetCompatibility = 1.7 + sourceCompatibility = 1.8 + targetCompatibility = 1.8 } ``` Thanks, Kevin Sweeney
Re: Review Request 26504: Restore JDK8 compatibility
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26504/#review56003 --- Ship it! Ship It! - Maxim Khutornenko On Oct. 9, 2014, 4:50 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26504/ --- (Updated Oct. 9, 2014, 4:50 p.m.) Review request for Aurora, Maxim Khutornenko and Zameer Manji. Repository: aurora Description --- I misconfigured my IDE and noticed that we'd stopped compiling on JDK8. Diffs - src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 7ecb05ae87058e7a25d88959cf629b2da17dc520 src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java d3c90416e48e25d83f28f966b21c018ee2a1ac27 Diff: https://reviews.apache.org/r/26504/diff/ Testing --- ./gradlew -Pq build With ``` diff --git a/build.gradle b/build.gradle index 8f7eed0..af3d4ba 100644 --- a/build.gradle +++ b/build.gradle @@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets' def thriftVersion = '0.9.1' compileJava { - sourceCompatibility = 1.7 - targetCompatibility = 1.7 + sourceCompatibility = 1.8 + targetCompatibility = 1.8 } ``` Thanks, Kevin Sweeney
Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.
Nope, it's passing for me, but it's similar to the behavior I was seeing with ApiBeta$2; even after cleaning the reports directory I was *not* getting the failure about it missing coverage even though it actually was. On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote: This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/ On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote: Ship It! Hmm, i'm now getting a different coverage error when preparing to submit this: * What went wrong: Execution failed for task ':jacocoTestReport'. Thanks for adding the first test coverage to: org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove it from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). Values: coverage = 6 This remains after cleaning the reports directory. Do you see that one as well? - Bill On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote: Review request for Aurora, Bill Farner and Zameer Manji. By Joshua Cohen. *Updated Oct. 9, 2014, 12:15 a.m.* *Repository: * aurora Description Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly. Testing ./gradlew build -Pq $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html HTTP/1.1 200 OK Content-Type: text/html Vary: Accept-Encoding Content-Length: 0 Server: Jetty(7.6.15.v20140411) Diffs - build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6) - src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java (eaf63382f689a045f837847736ef24fa75dee874) - src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java (62154045f49c5b23949dc739d735c3e5d3680b89) View Diff https://reviews.apache.org/r/26469/diff/
Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.
I can also verify that running the entire test suite in a debugger, a breakpoint set in OfferQueue$OfferQueueImpl$2 is not hit. On Thu, Oct 9, 2014 at 10:05 AM, Joshua Cohen jco...@twopensource.com wrote: Nope, it's passing for me, but it's similar to the behavior I was seeing with ApiBeta$2; even after cleaning the reports directory I was *not* getting the failure about it missing coverage even though it actually was. On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote: This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/ On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote: Ship It! Hmm, i'm now getting a different coverage error when preparing to submit this: * What went wrong: Execution failed for task ':jacocoTestReport'. Thanks for adding the first test coverage to: org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove it from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). Values: coverage = 6 This remains after cleaning the reports directory. Do you see that one as well? - Bill On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote: Review request for Aurora, Bill Farner and Zameer Manji. By Joshua Cohen. *Updated Oct. 9, 2014, 12:15 a.m.* *Repository: * aurora Description Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly. Testing ./gradlew build -Pq $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html HTTP/1.1 200 OK Content-Type: text/html Vary: Accept-Encoding Content-Length: 0 Server: Jetty(7.6.15.v20140411) Diffs - build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6) - src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java (eaf63382f689a045f837847736ef24fa75dee874) - src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java (62154045f49c5b23949dc739d735c3e5d3680b89) View Diff https://reviews.apache.org/r/26469/diff/
Re: Review Request 26504: Restore JDK8 compatibility
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26504/#review56011 --- Ship it! Ship It! - Joshua Cohen On Oct. 9, 2014, 4:50 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26504/ --- (Updated Oct. 9, 2014, 4:50 p.m.) Review request for Aurora, Maxim Khutornenko and Zameer Manji. Repository: aurora Description --- I misconfigured my IDE and noticed that we'd stopped compiling on JDK8. Diffs - src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 7ecb05ae87058e7a25d88959cf629b2da17dc520 src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java d3c90416e48e25d83f28f966b21c018ee2a1ac27 Diff: https://reviews.apache.org/r/26504/diff/ Testing --- ./gradlew -Pq build With ``` diff --git a/build.gradle b/build.gradle index 8f7eed0..af3d4ba 100644 --- a/build.gradle +++ b/build.gradle @@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets' def thriftVersion = '0.9.1' compileJava { - sourceCompatibility = 1.7 - targetCompatibility = 1.7 + sourceCompatibility = 1.8 + targetCompatibility = 1.8 } ``` Thanks, Kevin Sweeney
Re: Review Request 26504: Restore JDK8 compatibility
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26504/#review56012 --- Ship it! Ship It! - Zameer Manji On Oct. 9, 2014, 9:50 a.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26504/ --- (Updated Oct. 9, 2014, 9:50 a.m.) Review request for Aurora, Maxim Khutornenko and Zameer Manji. Repository: aurora Description --- I misconfigured my IDE and noticed that we'd stopped compiling on JDK8. Diffs - src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 7ecb05ae87058e7a25d88959cf629b2da17dc520 src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java d3c90416e48e25d83f28f966b21c018ee2a1ac27 Diff: https://reviews.apache.org/r/26504/diff/ Testing --- ./gradlew -Pq build With ``` diff --git a/build.gradle b/build.gradle index 8f7eed0..af3d4ba 100644 --- a/build.gradle +++ b/build.gradle @@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets' def thriftVersion = '0.9.1' compileJava { - sourceCompatibility = 1.7 - targetCompatibility = 1.7 + sourceCompatibility = 1.8 + targetCompatibility = 1.8 } ``` Thanks, Kevin Sweeney
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56010 --- src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96362 This javadoc would highly benefit from some details about the source of duplication and a proposed solution. It's not obvious for a newcomer why TaskConfigs are duplicated. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96365 Why result field here? src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96360 There is a remote possibility of numOutputTasks to be zero in a snapshot for an empty cluster. src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java https://reviews.apache.org/r/26478/#comment96363 Inverse log message of a hydration ratio would be useful here along with a Starting redupulication. src/main/thrift/org/apache/aurora/gen/storage.thrift https://reviews.apache.org/r/26478/#comment96364 Please, document fields. What is taskConfigId here? src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java https://reviews.apache.org/r/26478/#comment96368 How about a roundtrip test with no tasks in a snapshot? - Maxim Khutornenko On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 9, 2014, 2:39 a.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26383: Health Check Disabler
On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Zameer Manji wrote: Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. Kevin Sweeney wrote: Presumably we can trust the user here - health checking is after all opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies the code on our end. I'm very worried about introducing a feature which can allow unhealthy instances live forever. Furthermore this information isn't exposed on the Aurora UI or Observer UI so there isn't an easy way to check if an instance has health checking disabled or not. A user can be completely oblivious that health checking of their instance has been disabled. - Zameer --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 8, 2014, 6:56 p.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 8, 2014, 6:56 p.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.
Applying the patch, cleaning my reports directory and then running `./gradlew clean build` produces no error for me. I can share my xml report if needed. On Thu, Oct 9, 2014 at 10:42 AM, Bill Farner wfar...@apache.org wrote: Can you post your xml report to a ticket? It might help get to the bottom of this flakiness. On Thursday, October 9, 2014, Joshua Cohen jco...@twopensource.com wrote: I can also verify that running the entire test suite in a debugger, a breakpoint set in OfferQueue$OfferQueueImpl$2 is not hit. On Thu, Oct 9, 2014 at 10:05 AM, Joshua Cohen jco...@twopensource.com wrote: Nope, it's passing for me, but it's similar to the behavior I was seeing with ApiBeta$2; even after cleaning the reports directory I was *not* getting the failure about it missing coverage even though it actually was. On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote: This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/ On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote: Ship It! Hmm, i'm now getting a different coverage error when preparing to submit this: * What went wrong: Execution failed for task ':jacocoTestReport'. Thanks for adding the first test coverage to: org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove it from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). Values: coverage = 6 This remains after cleaning the reports directory. Do you see that one as well? - Bill On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote: Review request for Aurora, Bill Farner and Zameer Manji. By Joshua Cohen. *Updated Oct. 9, 2014, 12:15 a.m.* *Repository: * aurora Description Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly. Testing ./gradlew build -Pq $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html HTTP/1.1 200 OK Content-Type: text/html Vary: Accept-Encoding Content-Length: 0 Server: Jetty(7.6.15.v20140411) Diffs - build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6) - src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java (eaf63382f689a045f837847736ef24fa75dee874) - src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java (62154045f49c5b23949dc739d735c3e5d3680b89) View Diff https://reviews.apache.org/r/26469/diff/ -- -=Bill -- Zameer Manji
Re: Review Request 26383: Health Check Disabler
On Oct. 9, 2014, 2:53 p.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Zameer Manji wrote: Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. Kevin Sweeney wrote: Presumably we can trust the user here - health checking is after all opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies the code on our end. Zameer Manji wrote: I'm very worried about introducing a feature which can allow unhealthy instances live forever. Furthermore this information isn't exposed on the Aurora UI or Observer UI so there isn't an easy way to check if an instance has health checking disabled or not. A user can be completely oblivious that health checking of their instance has been disabled. The motivation behind the health check disabler is for another project I am working on, which is the JVM Heap Dumper. Long story short, the JVM Heap Dumper needs to disable health checks for the task that is being heap dumped. Under normal circumstances, the heap dumper can re-enable the health checks. However, if the heap dumper dies unexpectedly, the health checks will remain disabled forever if we don't implement time control. We don't want the disabling and enabling of health checks to be a manual process. - David --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 9, 2014, 1:56 a.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 9, 2014, 1:56 a.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.
Filed https://issues.apache.org/jira/browse/AURORA-822 and attached my report. On Thu, Oct 9, 2014 at 11:29 AM, Zameer Manji zma...@twopensource.com wrote: Applying the patch, cleaning my reports directory and then running `./gradlew clean build` produces no error for me. I can share my xml report if needed. On Thu, Oct 9, 2014 at 10:42 AM, Bill Farner wfar...@apache.org wrote: Can you post your xml report to a ticket? It might help get to the bottom of this flakiness. On Thursday, October 9, 2014, Joshua Cohen jco...@twopensource.com wrote: I can also verify that running the entire test suite in a debugger, a breakpoint set in OfferQueue$OfferQueueImpl$2 is not hit. On Thu, Oct 9, 2014 at 10:05 AM, Joshua Cohen jco...@twopensource.com wrote: Nope, it's passing for me, but it's similar to the behavior I was seeing with ApiBeta$2; even after cleaning the reports directory I was *not* getting the failure about it missing coverage even though it actually was. On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote: This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26469/ On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote: Ship It! Hmm, i'm now getting a different coverage error when preparing to submit this: * What went wrong: Execution failed for task ':jacocoTestReport'. Thanks for adding the first test coverage to: org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove it from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). Values: coverage = 6 This remains after cleaning the reports directory. Do you see that one as well? - Bill On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote: Review request for Aurora, Bill Farner and Zameer Manji. By Joshua Cohen. *Updated Oct. 9, 2014, 12:15 a.m.* *Repository: * aurora Description Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly. Testing ./gradlew build -Pq $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html HTTP/1.1 200 OK Content-Type: text/html Vary: Accept-Encoding Content-Length: 0 Server: Jetty(7.6.15.v20140411) Diffs - build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6) - src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java (eaf63382f689a045f837847736ef24fa75dee874) - src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java (62154045f49c5b23949dc739d735c3e5d3680b89) View Diff https://reviews.apache.org/r/26469/diff/ -- -=Bill -- Zameer Manji
Re: Review Request 26383: Health Check Disabler
On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Zameer Manji wrote: Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. Kevin Sweeney wrote: Presumably we can trust the user here - health checking is after all opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies the code on our end. Zameer Manji wrote: I'm very worried about introducing a feature which can allow unhealthy instances live forever. Furthermore this information isn't exposed on the Aurora UI or Observer UI so there isn't an easy way to check if an instance has health checking disabled or not. A user can be completely oblivious that health checking of their instance has been disabled. David Pan wrote: The motivation behind the health check disabler is for another project I am working on, which is the JVM Heap Dumper. Long story short, the JVM Heap Dumper needs to disable health checks for the task that is being heap dumped. Under normal circumstances, the heap dumper can re-enable the health checks. However, if the heap dumper dies unexpectedly, the health checks will remain disabled forever if we don't implement time control. We don't want the disabling and enabling of health checks to be a manual process. I still agree with Bill here - the simpler implementation (skip health check if the file exists) is easier to reason about and a cleaner API. A process running in the sandbox could implement some form of timeout logic itself (for example, the application could expose an endpoint that forks the equivalent of touch .healthchecksnooze; sleep 60; rm .healthchecksnooze) but I like the idea of keeping the executor API here simple to explain and reason about. - Kevin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 8, 2014, 6:56 p.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 8, 2014, 6:56 p.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26376: Implementing non-prod MTTA/R SLA metrics.
On Oct. 9, 2014, 2:46 p.m., Bill Farner wrote: src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java, line 59 https://reviews.apache.org/r/26376/diff/1/?file=714221#file714221line59 Can you instead export these values into a MapString, AtomicLong and validate the full map at the end? That would firm up the value checking here. Something like: expect(...makeGauge(..)).andAnswer( new IAnswer..() { ... AtomicLong value = new AtomicLong(). map.put(key, value). return new StatImpl()...; } Thought about that but it would require hardcoding all 27 of currently exposed metrics along with their values. I have modified a bit to fully validate metric names instead. That's the only validation suitable for this test. Values are asserted in SlaAlgorithmTest. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26376/#review55984 --- On Oct. 6, 2014, 7:10 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26376/ --- (Updated Oct. 6, 2014, 7:10 p.m.) Review request for Aurora and Bill Farner. Bugs: AURORA-774 https://issues.apache.org/jira/browse/AURORA-774 Repository: aurora Description --- Implementing non-prod MTTA/R SLA metrics. Diffs - src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java dca680415426be2bc760875d5774c9e9399ea94b src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 33b6bbe4b39d516595276128bbfa0686d0bb9cbd src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java aeb90bbb822254cbe4691e45092b9581596ad800 src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 96a04389e6b14c4d1c0bdb4d06abc046e7ea2151 Diff: https://reviews.apache.org/r/26376/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 26376: Implementing non-prod MTTA/R SLA metrics.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26376/ --- (Updated Oct. 9, 2014, 9:04 p.m.) Review request for Aurora and Bill Farner. Changes --- CR comments. Bugs: AURORA-774 https://issues.apache.org/jira/browse/AURORA-774 Repository: aurora Description --- Implementing non-prod MTTA/R SLA metrics. Diffs (updated) - src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java dca680415426be2bc760875d5774c9e9399ea94b src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 33b6bbe4b39d516595276128bbfa0686d0bb9cbd src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java aeb90bbb822254cbe4691e45092b9581596ad800 src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 96a04389e6b14c4d1c0bdb4d06abc046e7ea2151 Diff: https://reviews.apache.org/r/26376/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 26383: Health Check Disabler
On Oct. 9, 2014, 2:53 p.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Zameer Manji wrote: Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. Kevin Sweeney wrote: Presumably we can trust the user here - health checking is after all opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies the code on our end. Zameer Manji wrote: I'm very worried about introducing a feature which can allow unhealthy instances live forever. Furthermore this information isn't exposed on the Aurora UI or Observer UI so there isn't an easy way to check if an instance has health checking disabled or not. A user can be completely oblivious that health checking of their instance has been disabled. David Pan wrote: The motivation behind the health check disabler is for another project I am working on, which is the JVM Heap Dumper. Long story short, the JVM Heap Dumper needs to disable health checks for the task that is being heap dumped. Under normal circumstances, the heap dumper can re-enable the health checks. However, if the heap dumper dies unexpectedly, the health checks will remain disabled forever if we don't implement time control. We don't want the disabling and enabling of health checks to be a manual process. Kevin Sweeney wrote: I still agree with Bill here - the simpler implementation (skip health check if the file exists) is easier to reason about and a cleaner API. A process running in the sandbox could implement some form of timeout logic itself (for example, the application could expose an endpoint that forks the equivalent of touch .healthchecksnooze; sleep 60; rm .healthchecksnooze) but I like the idea of keeping the executor API here simple to explain and reason about. Overall, wouldn't it be simpler for the executor API to handle the time control, rather than requiring an identical process in every task's sandbox to handle the time control? - David --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 9, 2014, 1:56 a.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 9, 2014, 1:56 a.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26383: Health Check Disabler
I feel like we should err on the side of correctness here, rather than simplicity? The dangers of someone accidentally leaving health checks disabled indefinitely (on a service that has opted in to health checks) are not insignificant. On Thu, Oct 9, 2014 at 1:34 PM, Kevin Sweeney kevi...@apache.org wrote: On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Zameer Manji wrote: Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. Kevin Sweeney wrote: Presumably we can trust the user here - health checking is after all opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies the code on our end. Zameer Manji wrote: I'm very worried about introducing a feature which can allow unhealthy instances live forever. Furthermore this information isn't exposed on the Aurora UI or Observer UI so there isn't an easy way to check if an instance has health checking disabled or not. A user can be completely oblivious that health checking of their instance has been disabled. David Pan wrote: The motivation behind the health check disabler is for another project I am working on, which is the JVM Heap Dumper. Long story short, the JVM Heap Dumper needs to disable health checks for the task that is being heap dumped. Under normal circumstances, the heap dumper can re-enable the health checks. However, if the heap dumper dies unexpectedly, the health checks will remain disabled forever if we don't implement time control. We don't want the disabling and enabling of health checks to be a manual process. I still agree with Bill here - the simpler implementation (skip health check if the file exists) is easier to reason about and a cleaner API. A process running in the sandbox could implement some form of timeout logic itself (for example, the application could expose an endpoint that forks the equivalent of touch .healthchecksnooze; sleep 60; rm .healthchecksnooze) but I like the idea of keeping the executor API here simple to explain and reason about. - Kevin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 8, 2014, 6:56 p.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 8, 2014, 6:56 p.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26383: Health Check Disabler
On Oct. 9, 2014, 2:53 p.m., Bill Farner wrote: src/main/python/apache/aurora/executor/common/health_checker.py, line 66 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66 FWIW i actually meant to suggest that the snooze has no concept of time at all. If the file is there, don't perform health checks. When you want to re-enable health checks, delete the file. Happy to hear what others think about that. Zameer Manji wrote: Doesn't this mean user error can disable health checks forever? I think we should treat disabling health checking as an exceptional state (since the proceses has opted in to health checking) and therefore require user action (increasing mtime) to continue to stay in this state. Kevin Sweeney wrote: Presumably we can trust the user here - health checking is after all opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies the code on our end. Zameer Manji wrote: I'm very worried about introducing a feature which can allow unhealthy instances live forever. Furthermore this information isn't exposed on the Aurora UI or Observer UI so there isn't an easy way to check if an instance has health checking disabled or not. A user can be completely oblivious that health checking of their instance has been disabled. David Pan wrote: The motivation behind the health check disabler is for another project I am working on, which is the JVM Heap Dumper. Long story short, the JVM Heap Dumper needs to disable health checks for the task that is being heap dumped. Under normal circumstances, the heap dumper can re-enable the health checks. However, if the heap dumper dies unexpectedly, the health checks will remain disabled forever if we don't implement time control. We don't want the disabling and enabling of health checks to be a manual process. Kevin Sweeney wrote: I still agree with Bill here - the simpler implementation (skip health check if the file exists) is easier to reason about and a cleaner API. A process running in the sandbox could implement some form of timeout logic itself (for example, the application could expose an endpoint that forks the equivalent of touch .healthchecksnooze; sleep 60; rm .healthchecksnooze) but I like the idea of keeping the executor API here simple to explain and reason about. David Pan wrote: Overall, wouldn't it be simpler for the executor API to handle the time control, rather than requiring an identical process in every task's sandbox to handle the time control? I feel like we should err on the side of correctness here, rather than simplicity? The dangers of someone accidentally leaving health checks disabled indefinitely (on a service that has opted in to health checks) are not insignificant. - Joshua --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/#review55985 --- On Oct. 9, 2014, 1:56 a.m., David Pan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26383/ --- (Updated Oct. 9, 2014, 1:56 a.m.) Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji. Repository: aurora Description --- The health check disabler allows health checks for a job to be snoozed temporarily by touching a snooze file in the job's sandbox. The appropriate unit tests were modified/added. The corresponding JIRA ticket is the following: https://issues.apache.org/jira/browse/AURORA-795 Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 4980411c847d12655cbb363404707ebd9f0bd163 src/test/python/apache/aurora/executor/common/BUILD c7f7a003c865d479ba6e3cd7b5349322f884f653 src/test/python/apache/aurora/executor/common/test_health_checker.py aa36415fa891fc523a3a376ffeca5d3cd5ceabec Diff: https://reviews.apache.org/r/26383/diff/ Testing --- On vagrant in ~/aurora, I ran ./pants src/test/python/apache/aurora/executor:: Thanks, David Pan
Re: Review Request 26478: Add a flag to deduplicate storage snapshots
On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote: src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java, line 133 https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line133 There is a remote possibility of numOutputTasks to be zero in a snapshot for an empty cluster. hmm - there is, and that would result in NaN appearing in the log. Which is technically accurate - Kevin --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/#review56010 --- On Oct. 8, 2014, 7:39 p.m., Kevin Sweeney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26478/ --- (Updated Oct. 8, 2014, 7:39 p.m.) Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. Bugs: AURORA-722 https://issues.apache.org/jira/browse/AURORA-722 Repository: aurora Description --- Add a new format for deduplicated storage snapshots. Microbenchmarks show a 10x deduplication ratio on Twitter's production snapshots. This format is backwards-incompatible, so this patch introduces a flag to control its use (defaulting off). This only changes the format used to write to the replicated log (where time is of the essence since all writes are done holding the global storage lock) - the format of backups written to disk is unchanged, as backups don't hold the lock. Diffs - build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 65e986eaa2c4193431ca048425a1ed3ab60f5882 src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 7239a6a5eb5479e395e16423c83fdf80a77e5a83 src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 4b50e2069407dc263b4fc93f1827d3a8836253bf src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java f806297d1d0700155c976743f936b2b8a3a390fb src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 769348e6b8a5c701734afff391b1c77de35222c6 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 22db80eaf34fe736fa5a3a9289836c9ac9e59906 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a src/main/thrift/org/apache/aurora/gen/storage.thrift 5350ec945fbe028ee4641683815a068ce00b5efc src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 39729b374fe4e383f9b5ada7d016923766df9af7 src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 7a8c3b882633376a1bf6a78616d55aaa7401d13f src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java PRE-CREATION Diff: https://reviews.apache.org/r/26478/diff/ Testing --- ./gradlew -Pq build Thanks, Kevin Sweeney
Re: Review Request 26431: Moving post_drain script execution into host_maintenance.py
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26431/#review56076 --- Ping, Mark. - Maxim Khutornenko On Oct. 8, 2014, 11:46 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26431/ --- (Updated Oct. 8, 2014, 11:46 p.m.) Review request for Aurora, Joe Smith and Mark Chu-Carroll. Bugs: AURORA-806 https://issues.apache.org/jira/browse/AURORA-806 Repository: aurora Description --- Moving post_drain script functionality into host_maintenance.py to support per batch execution. Diffs - src/main/python/apache/aurora/admin/host_maintenance.py 9c2a9f77109791da574e1624d27b6b7096a2678e src/main/python/apache/aurora/client/commands/maintenance.py e465d973e9f764076e18491e1691d44303c0f388 src/test/python/apache/aurora/admin/test_host_maintenance.py 40228df59e43bc6034f2dc651c166a0c4b78aea8 Diff: https://reviews.apache.org/r/26431/diff/ Testing --- ./pants src/test/python:all Thanks, Maxim Khutornenko
Review Request 26531: Defining schema for the heartbeat RPC.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/ --- Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-690 https://issues.apache.org/jira/browse/AURORA-690 Repository: aurora Description --- Defining the flag and the schema for the update coordination (aka heartbeats). Diffs - src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml 77032d84f91b149c01ce4ac62da7ca331a2b6445 src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql db6a8719d18fee720f74ebcd2079ad36201cc831 src/main/thrift/org/apache/aurora/gen/api.thrift 8794731f4b3f1033588bdfa33c292e4796319a2a src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 3871dae68fcdc6402cb61a7244b46114617eecff src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e Diff: https://reviews.apache.org/r/26531/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 26531: Defining schema for the heartbeat RPC.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/#review56080 --- Ship it! src/main/thrift/org/apache/aurora/gen/api.thrift https://reviews.apache.org/r/26531/#comment96435 s/at/within/ src/main/thrift/org/apache/aurora/gen/api.thrift https://reviews.apache.org/r/26531/#comment96436 I think we decided not to do auto-resume once a job is paused due to lack of heartbeat, but I can't remember why. Maybe someone else remembers? - David McLaughlin On Oct. 9, 2014, 11:02 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/ --- (Updated Oct. 9, 2014, 11:02 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-690 https://issues.apache.org/jira/browse/AURORA-690 Repository: aurora Description --- Defining the flag and the schema for the update coordination (aka heartbeats). Diffs - src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml 77032d84f91b149c01ce4ac62da7ca331a2b6445 src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql db6a8719d18fee720f74ebcd2079ad36201cc831 src/main/thrift/org/apache/aurora/gen/api.thrift 8794731f4b3f1033588bdfa33c292e4796319a2a src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 3871dae68fcdc6402cb61a7244b46114617eecff src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e Diff: https://reviews.apache.org/r/26531/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 26531: Defining schema for the heartbeat RPC.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/ --- (Updated Oct. 9, 2014, 11:23 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Changes --- Adding SessionKey into the heartbeat RPC and adding python test. Bugs: AURORA-690 https://issues.apache.org/jira/browse/AURORA-690 Repository: aurora Description --- Defining the flag and the schema for the update coordination (aka heartbeats). Diffs (updated) - src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml 77032d84f91b149c01ce4ac62da7ca331a2b6445 src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql db6a8719d18fee720f74ebcd2079ad36201cc831 src/main/thrift/org/apache/aurora/gen/api.thrift 8794731f4b3f1033588bdfa33c292e4796319a2a src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 3871dae68fcdc6402cb61a7244b46114617eecff src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e src/test/python/apache/aurora/client/api/test_scheduler_client.py d78e7dca28d67997bc6c98cff619ab94a257c7dc Diff: https://reviews.apache.org/r/26531/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 26531: Defining schema for the heartbeat RPC.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/#review56083 --- Ship it! Ship It! - David McLaughlin On Oct. 9, 2014, 11:23 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/ --- (Updated Oct. 9, 2014, 11:23 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-690 https://issues.apache.org/jira/browse/AURORA-690 Repository: aurora Description --- Defining the flag and the schema for the update coordination (aka heartbeats). Diffs - src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml 77032d84f91b149c01ce4ac62da7ca331a2b6445 src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql db6a8719d18fee720f74ebcd2079ad36201cc831 src/main/thrift/org/apache/aurora/gen/api.thrift 8794731f4b3f1033588bdfa33c292e4796319a2a src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 3871dae68fcdc6402cb61a7244b46114617eecff src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e src/test/python/apache/aurora/client/api/test_scheduler_client.py d78e7dca28d67997bc6c98cff619ab94a257c7dc Diff: https://reviews.apache.org/r/26531/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 26531: Defining schema for the heartbeat RPC.
On Oct. 9, 2014, 11:20 p.m., David McLaughlin wrote: src/main/thrift/org/apache/aurora/gen/api.thrift, line 619 https://reviews.apache.org/r/26531/diff/1/?file=717117#file717117line619 I think we decided not to do auto-resume once a job is paused due to lack of heartbeat, but I can't remember why. Maybe someone else remembers? I don't quite see a difference between user pausing/resuming an update or the HB service doing so. Differentiating between the two would require a pause reason stored somewhere. I'd rather avoid extra complexity unless absolutely necessary. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/#review56080 --- On Oct. 9, 2014, 11:23 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26531/ --- (Updated Oct. 9, 2014, 11:23 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-690 https://issues.apache.org/jira/browse/AURORA-690 Repository: aurora Description --- Defining the flag and the schema for the update coordination (aka heartbeats). Diffs - src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml 77032d84f91b149c01ce4ac62da7ca331a2b6445 src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql db6a8719d18fee720f74ebcd2079ad36201cc831 src/main/thrift/org/apache/aurora/gen/api.thrift 8794731f4b3f1033588bdfa33c292e4796319a2a src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 3871dae68fcdc6402cb61a7244b46114617eecff src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e src/test/python/apache/aurora/client/api/test_scheduler_client.py d78e7dca28d67997bc6c98cff619ab94a257c7dc Diff: https://reviews.apache.org/r/26531/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko