Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.

2014-10-09 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26469/#review55979
---

Ship it!


Ship It!

- Bill Farner


On Oct. 9, 2014, 12:15 a.m., Joshua Cohen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26469/
 ---
 
 (Updated Oct. 9, 2014, 12:15 a.m.)
 
 
 Review request for Aurora, Bill Farner and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Kill code to serve ApiBeta help pages that's no longer used now that the 
 content is served directly.
 
 
 Diffs
 -
 
   build.gradle 8f7eed0b58a00f8e5e0521512975087d5788b5b6 
   src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java 
 eaf63382f689a045f837847736ef24fa75dee874 
   src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java 
 62154045f49c5b23949dc739d735c3e5d3680b89 
 
 Diff: https://reviews.apache.org/r/26469/diff/
 
 
 Testing
 ---
 
 ./gradlew build -Pq
 
 $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html
 HTTP/1.1 200 OK
 Content-Type: text/html
 Vary: Accept-Encoding
 Content-Length: 0
 Server: Jetty(7.6.15.v20140411)
 
 
 Thanks,
 
 Joshua Cohen
 




Re: Review Request 26394: Deprecating Identity struct (renaming fields).

2014-10-09 Thread Bill Farner


 On Oct. 9, 2014, 12:05 a.m., Bill Farner wrote:
  Thanks again for leading this - i'm very happy to see momentum on removing 
  `Identity`.
  
  Stepping back - i wonder if we should re-evaluate the way we do field 
  deprecations now that we've established a protocol with JIRA and releases.  
  This might mean we don't need to do the `DEPRECATED` mangling.  What do you 
  think?
 
 Maxim Khutornenko wrote:
 I still think visual code reminder is quite beneficial in avoiding the 
 use of deprecated fields. The between-release time is just too long to 
 remember what needs to be avoided when coding far away from thrift schema. 
 Besides, it helps validate the concept and will facilitate later removal.

Right, they're basically our poor-man's deprecation warnings.  However, this 
field and file (api.thrift) makes up part of our API, and any consumers of the 
thrift definition, or `/apibeta` will be broken by it.  I don't think we can 
get around a formal deprecation process for field renames.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26394/#review55907
---


On Oct. 8, 2014, 11:39 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26394/
 ---
 
 (Updated Oct. 8, 2014, 11:39 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Mark Chu-Carroll, and Bill 
 Farner.
 
 
 Bugs: AURORA-84
 https://issues.apache.org/jira/browse/AURORA-84
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Part 1 of Identity struct deprecation: renaming fields.
 
 
 Diffs
 -
 
   examples/jobs/hello_world.aurora fc7877c3a60e56e301d9ee1fabd73446afca7236 
   src/main/java/org/apache/aurora/scheduler/TaskIdGenerator.java 
 5c75cc8cae53edfa069c85c37ebad34774682081 
   src/main/java/org/apache/aurora/scheduler/async/Preemptor.java 
 e9f251508257cd7287ff00773e0073a3cd130df8 
   src/main/java/org/apache/aurora/scheduler/base/JobKeys.java 
 a76c3fac71b35115064fba6644cff0066fd9e630 
   src/main/java/org/apache/aurora/scheduler/base/Query.java 
 eded7a59eb394748b93d7fbc085a1bdf64b043cc 
   
 src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
  865742171c11fbe5cf1469a69dd7258ec1be28c2 
   src/main/java/org/apache/aurora/scheduler/http/Utilization.java 
 a0cb7bf56aeb7edd92b25d8d69a739d87452777a 
   src/main/java/org/apache/aurora/scheduler/storage/StorageBackfill.java 
 8c20ab6f2bebf1d1c0f91fed3f1e48361cdf45d6 
   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
 37176237fac336413267f3c8bb4e1b9a6255150c 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  5dcae4a6132026504cf02093ad4c68ab521e4ab8 
   src/main/python/apache/aurora/client/api/instance_watcher.py 
 b390aa8993205f1f6938f8c295e3c16a0bf4df6d 
   src/main/python/apache/aurora/client/api/sla.py 
 b9b64680b15f5395ed6aca681b9b1c30ffe2822c 
   src/main/python/apache/aurora/client/cli/task.py 
 c41484bdc27266443bc4e139e1ebb362a59be0f9 
   src/main/python/apache/aurora/client/commands/admin.py 
 deee0250f3ba9837feeb92acc654f5b3b68b4e0f 
   src/main/python/apache/aurora/client/commands/core.py 
 58f419e674f1a9a0ae9da6faa2e39c8167bab597 
   src/main/python/apache/aurora/client/commands/ssh.py 
 d2b8bf675556b924d3d63b545d036dc48a081486 
   src/main/python/apache/aurora/config/thrift.py 
 288fb40f65629c8fd4eb7d92c8bf02369237de3b 
   src/main/python/apache/aurora/executor/aurora_executor.py 
 2c6423d096656f426a4385f4edef6875ebad7049 
   src/main/python/apache/aurora/executor/common/announcer.py 
 74b2114d1ede7a4a4a68b78144f9b08bc994bd87 
   src/main/python/apache/aurora/executor/thermos_task_runner.py 
 7b346e253677ee9b42c57782f7f67ff63b6a0083 
   src/main/resources/scheduler/assets/js/controllers.js 
 7e9037ee921b009dc2b7c5adcf057bedebb01632 
   src/main/resources/scheduler/assets/js/services.js 
 b744f375411e09b7f776e4a05ee5961227143439 
   src/main/thrift/org/apache/aurora/gen/api.thrift 
 8794731f4b3f1033588bdfa33c292e4796319a2a 
   src/test/java/org/apache/aurora/scheduler/MesosTaskFactoryImplTest.java 
 e96974764844b5d1a3a05f6996075fccee209594 
   src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 
 371ae87f5954fa5f092db1f6d21e2291d7576173 
   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
 606c4434b7158220ccf1403b6deac939021fee31 
   src/test/java/org/apache/aurora/scheduler/async/GcExecutorLauncherTest.java 
 f2d153f446247032ad9d8d173fb70870dbfdcca1 
   src/test/java/org/apache/aurora/scheduler/async/PreemptorImplTest.java 
 8ee84cda8670d117e2efa2d1a114da6f0d8315d6 
   
 src/test/java/org/apache/aurora/scheduler/async/RescheduleCalculatorImplTest.java
  131bd826dfe47f40f3c27f29c095ed42953e316c 
   

Re: Review Request 26376: Implementing non-prod MTTA/R SLA metrics.

2014-10-09 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26376/#review55984
---

Ship it!



src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java
https://reviews.apache.org/r/26376/#comment96329

Can you instead export these values into a MapString, AtomicLong and 
validate the full map at the end?  That would firm up the value checking here.

Something like:

expect(...makeGauge(..)).andAnswer(
new IAnswer..() {
  ...
AtomicLong value = new AtomicLong().
map.put(key, value).
return new StatImpl()...;
}


- Bill Farner


On Oct. 6, 2014, 7:10 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26376/
 ---
 
 (Updated Oct. 6, 2014, 7:10 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-774
 https://issues.apache.org/jira/browse/AURORA-774
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Implementing non-prod MTTA/R SLA metrics.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
 dca680415426be2bc760875d5774c9e9399ea94b 
   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
 33b6bbe4b39d516595276128bbfa0686d0bb9cbd 
   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
 aeb90bbb822254cbe4691e45092b9581596ad800 
   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
 96a04389e6b14c4d1c0bdb4d06abc046e7ea2151 
 
 Diff: https://reviews.apache.org/r/26376/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---



src/main/python/apache/aurora/executor/common/health_checker.py
https://reviews.apache.org/r/26383/#comment96330

FWIW i actually meant to suggest that the snooze has no concept of time at 
all.  If the file is there, don't perform health checks.  When you want to 
re-enable health checks, delete the file.  Happy to hear what others think 
about that.


- Bill Farner


On Oct. 9, 2014, 1:56 a.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 9, 2014, 1:56 a.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-09 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review55986
---



src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java
https://reviews.apache.org/r/26478/#comment96339

Method interceptors should work fine for package-priviate methods [1].  
Just for my understanding - are you going with protected because 
package-private class + protected method is slightly more restrictive?

[1] https://github.com/google/guice/wiki/AOP#limitations



src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java
https://reviews.apache.org/r/26478/#comment96331

First sentence contains mixed thoughts for an odd sentence, rewrite?



src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java
https://reviews.apache.org/r/26478/#comment96332

Please make this more verbose to explain what exactly is backwards in 
compatible.  Don't worry about being too wordy.

I'd also like to see a TODO to remove this flag, since it should become the 
default.  Perhaps this should be included in the 0.7.0 deprecations list.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96334

Tasks.SCHEDULED_TO_INFO



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96335

Do you have numbers on how much time this routine saves when compared to a 
full deep copy and unsetting the field you're trying to clear?  Unless it's a 
significant contributor to overall snapshot performance, i'd prefer the more 
concise code of the latter approach.

My hunch is that this one might save O(100 ms), but the ones below are 
noise and not worth the code.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96336

Inline this on :114.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96337

It would be nice to see a brief comment here to give the reader an overview 
of what's going on in the routine.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96338

remove newline



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96340

The mention of taskConfigId is vague.  There should be a doc somewhere 
indicating that `taskConfigId` is referencing the _index_ of an entry in 
`taskConfigs`.



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96341

remove newline



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96342

Please doc.



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96343

The assertion error will be much more useful if you do 
`assertEquals(emptySet, b)` rather than `assertEquals(n, b.size())`.



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96344

Isn't this check redundant to the one below?



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96345

Can you try for `assertEquals` here as well, with an expected list of 
`DeduplicatedScheduledTask` objects?  It will catch classes of bugs missed with 
this check (e.g. extra entries in the `partialTasks` list), and make it easier 
to diagnose a failed test.


- Bill Farner


On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 9, 2014, 2:39 a.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   

Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.

2014-10-09 Thread Bill Farner


 On Oct. 9, 2014, 2:32 p.m., Bill Farner wrote:
  Ship It!

Hmm, i'm now getting a different coverage error when preparing to submit this:

* What went wrong:
Execution failed for task ':jacocoTestReport'.
 Thanks for adding the first test coverage to: 
org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove it 
from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). 
Values: coverage = 6

This remains after cleaning the reports directory.  Do you see that one as well?


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26469/#review55979
---


On Oct. 9, 2014, 12:15 a.m., Joshua Cohen wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26469/
 ---
 
 (Updated Oct. 9, 2014, 12:15 a.m.)
 
 
 Review request for Aurora, Bill Farner and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Kill code to serve ApiBeta help pages that's no longer used now that the 
 content is served directly.
 
 
 Diffs
 -
 
   build.gradle 8f7eed0b58a00f8e5e0521512975087d5788b5b6 
   src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java 
 eaf63382f689a045f837847736ef24fa75dee874 
   src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java 
 62154045f49c5b23949dc739d735c3e5d3680b89 
 
 Diff: https://reviews.apache.org/r/26469/diff/
 
 
 Testing
 ---
 
 ./gradlew build -Pq
 
 $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html
 HTTP/1.1 200 OK
 Content-Type: text/html
 Vary: Accept-Encoding
 Content-Length: 0
 Server: Jetty(7.6.15.v20140411)
 
 
 Thanks,
 
 Joshua Cohen
 




Re: Review Request 26425: Fixing quota checking for updates.

2014-10-09 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26425/#review55991
---

Ship it!


Ship It!

- Bill Farner


On Oct. 8, 2014, 10:58 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26425/
 ---
 
 (Updated Oct. 8, 2014, 10:58 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-802
 https://issues.apache.org/jira/browse/AURORA-802
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Rolling up prod consumption does not work for quota checking during job 
 update intitiation where per-instance filtering is required. Splitting the 
 QuotaManager interface into 2 use cases:
 - createJob/addInstances
 - startJobUpdate
 
 Reverting the IResourceAggregate abstraction in QuotaManager in favor of 
 task/update objects to simplify handling.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/quota/QuotaManager.java 
 2442b06f318c8d0cefe60296950baa1b916b42f7 
   src/main/java/org/apache/aurora/scheduler/quota/QuotaUtil.java 
 d197dd515eb646cb4babadf22e9cf6480a919060 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  5dcae4a6132026504cf02093ad4c68ab521e4ab8 
   src/test/java/org/apache/aurora/scheduler/quota/QuotaManagerImplTest.java 
 b58c8f363e6e7c72accaf590b2a7cb7bf24275ea 
   
 src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
  02cd8f712fff3d283abf8e3eb1b4dcab1e762ac2 
 
 Diff: https://reviews.apache.org/r/26425/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 Also, tested various update scenarios in vagrant.
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread Zameer Manji


 On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote:
  src/main/python/apache/aurora/executor/common/health_checker.py, line 66
  https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
 
  FWIW i actually meant to suggest that the snooze has no concept of time 
  at all.  If the file is there, don't perform health checks.  When you want 
  to re-enable health checks, delete the file.  Happy to hear what others 
  think about that.

Doesn't this mean user error can disable health checks forever? I think we 
should treat disabling health checking as an exceptional state (since the 
proceses has opted in to health checking) and therefore require user action 
(increasing mtime) to continue to stay in this state.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---


On Oct. 8, 2014, 6:56 p.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 8, 2014, 6:56 p.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread Kevin Sweeney


 On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote:
  src/main/python/apache/aurora/executor/common/health_checker.py, line 66
  https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
 
  FWIW i actually meant to suggest that the snooze has no concept of time 
  at all.  If the file is there, don't perform health checks.  When you want 
  to re-enable health checks, delete the file.  Happy to hear what others 
  think about that.
 
 Zameer Manji wrote:
 Doesn't this mean user error can disable health checks forever? I think 
 we should treat disabling health checking as an exceptional state (since the 
 proceses has opted in to health checking) and therefore require user action 
 (increasing mtime) to continue to stay in this state.

Presumably we can trust the user here - health checking is after all opt-in via 
{{thermos.ports[health]}}. Making it a simple on/off switch greatly simplifies 
the code on our end.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---


On Oct. 8, 2014, 6:56 p.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 8, 2014, 6:56 p.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Review Request 26504: Restore JDK8 compatibility

2014-10-09 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26504/
---

Review request for Aurora, Maxim Khutornenko and Zameer Manji.


Repository: aurora


Description
---

I misconfigured my IDE and noticed that we'd stopped compiling on JDK8.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
7ecb05ae87058e7a25d88959cf629b2da17dc520 
  src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java 
d3c90416e48e25d83f28f966b21c018ee2a1ac27 

Diff: https://reviews.apache.org/r/26504/diff/


Testing
---

./gradlew -Pq build

With

```
diff --git a/build.gradle b/build.gradle
index 8f7eed0..af3d4ba 100644
--- a/build.gradle
+++ b/build.gradle
@@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets'
 def thriftVersion = '0.9.1'

 compileJava {
-  sourceCompatibility = 1.7
-  targetCompatibility = 1.7
+  sourceCompatibility = 1.8
+  targetCompatibility = 1.8
 }
```


Thanks,

Kevin Sweeney



Re: Review Request 26504: Restore JDK8 compatibility

2014-10-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26504/#review56003
---

Ship it!


Ship It!

- Maxim Khutornenko


On Oct. 9, 2014, 4:50 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26504/
 ---
 
 (Updated Oct. 9, 2014, 4:50 p.m.)
 
 
 Review request for Aurora, Maxim Khutornenko and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 I misconfigured my IDE and noticed that we'd stopped compiling on JDK8.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 7ecb05ae87058e7a25d88959cf629b2da17dc520 
   src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java 
 d3c90416e48e25d83f28f966b21c018ee2a1ac27 
 
 Diff: https://reviews.apache.org/r/26504/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 With
 
 ```
 diff --git a/build.gradle b/build.gradle
 index 8f7eed0..af3d4ba 100644
 --- a/build.gradle
 +++ b/build.gradle
 @@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets'
  def thriftVersion = '0.9.1'
 
  compileJava {
 -  sourceCompatibility = 1.7
 -  targetCompatibility = 1.7
 +  sourceCompatibility = 1.8
 +  targetCompatibility = 1.8
  }
 ```
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.

2014-10-09 Thread Joshua Cohen
Nope, it's passing for me, but it's similar to the behavior I was seeing
with ApiBeta$2; even after cleaning the reports directory I was *not*
getting the failure about it missing coverage even though it actually was.

On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote:

This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26469/

 On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote:

 Ship It!

  Hmm, i'm now getting a different coverage error when preparing to submit 
 this:

 * What went wrong:
 Execution failed for task ':jacocoTestReport'. Thanks for adding the first 
 test coverage to: 
 org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove 
 it from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). 
 Values: coverage = 6

  This remains after cleaning the reports directory. Do you see that one
 as well?


 - Bill

 On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote:
   Review request for Aurora, Bill Farner and Zameer Manji.
 By Joshua Cohen.

 *Updated Oct. 9, 2014, 12:15 a.m.*
  *Repository: * aurora
 Description

 Kill code to serve ApiBeta help pages that's no longer used now that the 
 content is served directly.

   Testing

 ./gradlew build -Pq

 $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html

 HTTP/1.1 200 OK

 Content-Type: text/html

 Vary: Accept-Encoding

 Content-Length: 0

 Server: Jetty(7.6.15.v20140411)

   Diffs

- build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6)
- src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java
(eaf63382f689a045f837847736ef24fa75dee874)
- src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java
(62154045f49c5b23949dc739d735c3e5d3680b89)

 View Diff https://reviews.apache.org/r/26469/diff/



Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.

2014-10-09 Thread Joshua Cohen
I can also verify that running the entire test suite in a debugger, a
breakpoint set in OfferQueue$OfferQueueImpl$2 is not hit.

On Thu, Oct 9, 2014 at 10:05 AM, Joshua Cohen jco...@twopensource.com
wrote:

 Nope, it's passing for me, but it's similar to the behavior I was seeing
 with ApiBeta$2; even after cleaning the reports directory I was *not*
 getting the failure about it missing coverage even though it actually was.

 On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote:

This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26469/

 On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote:

 Ship It!

  Hmm, i'm now getting a different coverage error when preparing to submit 
 this:

 * What went wrong:
 Execution failed for task ':jacocoTestReport'. Thanks for adding the first 
 test coverage to: 
 org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please remove 
 it from the legacyClassesWithoutCoverage list. Expression: (coverage == 0). 
 Values: coverage = 6

  This remains after cleaning the reports directory. Do you see that one
 as well?


 - Bill

 On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote:
   Review request for Aurora, Bill Farner and Zameer Manji.
 By Joshua Cohen.

 *Updated Oct. 9, 2014, 12:15 a.m.*
  *Repository: * aurora
 Description

 Kill code to serve ApiBeta help pages that's no longer used now that the 
 content is served directly.

   Testing

 ./gradlew build -Pq

 $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html

 HTTP/1.1 200 OK

 Content-Type: text/html

 Vary: Accept-Encoding

 Content-Length: 0

 Server: Jetty(7.6.15.v20140411)

   Diffs

- build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6)
- src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java
(eaf63382f689a045f837847736ef24fa75dee874)
- src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java
(62154045f49c5b23949dc739d735c3e5d3680b89)

 View Diff https://reviews.apache.org/r/26469/diff/





Re: Review Request 26504: Restore JDK8 compatibility

2014-10-09 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26504/#review56011
---

Ship it!


Ship It!

- Joshua Cohen


On Oct. 9, 2014, 4:50 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26504/
 ---
 
 (Updated Oct. 9, 2014, 4:50 p.m.)
 
 
 Review request for Aurora, Maxim Khutornenko and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 I misconfigured my IDE and noticed that we'd stopped compiling on JDK8.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 7ecb05ae87058e7a25d88959cf629b2da17dc520 
   src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java 
 d3c90416e48e25d83f28f966b21c018ee2a1ac27 
 
 Diff: https://reviews.apache.org/r/26504/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 With
 
 ```
 diff --git a/build.gradle b/build.gradle
 index 8f7eed0..af3d4ba 100644
 --- a/build.gradle
 +++ b/build.gradle
 @@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets'
  def thriftVersion = '0.9.1'
 
  compileJava {
 -  sourceCompatibility = 1.7
 -  targetCompatibility = 1.7
 +  sourceCompatibility = 1.8
 +  targetCompatibility = 1.8
  }
 ```
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26504: Restore JDK8 compatibility

2014-10-09 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26504/#review56012
---

Ship it!


Ship It!

- Zameer Manji


On Oct. 9, 2014, 9:50 a.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26504/
 ---
 
 (Updated Oct. 9, 2014, 9:50 a.m.)
 
 
 Review request for Aurora, Maxim Khutornenko and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 I misconfigured my IDE and noticed that we'd stopped compiling on JDK8.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 7ecb05ae87058e7a25d88959cf629b2da17dc520 
   src/test/java/org/apache/aurora/scheduler/state/LockManagerImplTest.java 
 d3c90416e48e25d83f28f966b21c018ee2a1ac27 
 
 Diff: https://reviews.apache.org/r/26504/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 With
 
 ```
 diff --git a/build.gradle b/build.gradle
 index 8f7eed0..af3d4ba 100644
 --- a/build.gradle
 +++ b/build.gradle
 @@ -38,8 +38,8 @@ def httpAssetsPath = 'scheduler/assets'
  def thriftVersion = '0.9.1'
 
  compileJava {
 -  sourceCompatibility = 1.7
 -  targetCompatibility = 1.7
 +  sourceCompatibility = 1.8
 +  targetCompatibility = 1.8
  }
 ```
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56010
---



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96362

This javadoc would highly benefit from some details about the source of 
duplication and a proposed solution. It's not obvious for a newcomer why 
TaskConfigs are duplicated.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96365

Why result field here?



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96360

There is a remote possibility of numOutputTasks to be zero in a snapshot 
for an empty cluster.



src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
https://reviews.apache.org/r/26478/#comment96363

Inverse log message of a hydration ratio would be useful here along with a 
Starting redupulication.



src/main/thrift/org/apache/aurora/gen/storage.thrift
https://reviews.apache.org/r/26478/#comment96364

Please, document fields. What is taskConfigId here?



src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
https://reviews.apache.org/r/26478/#comment96368

How about a roundtrip test with no tasks in a snapshot?


- Maxim Khutornenko


On Oct. 9, 2014, 2:39 a.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 9, 2014, 2:39 a.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread Zameer Manji


 On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote:
  src/main/python/apache/aurora/executor/common/health_checker.py, line 66
  https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
 
  FWIW i actually meant to suggest that the snooze has no concept of time 
  at all.  If the file is there, don't perform health checks.  When you want 
  to re-enable health checks, delete the file.  Happy to hear what others 
  think about that.
 
 Zameer Manji wrote:
 Doesn't this mean user error can disable health checks forever? I think 
 we should treat disabling health checking as an exceptional state (since the 
 proceses has opted in to health checking) and therefore require user action 
 (increasing mtime) to continue to stay in this state.
 
 Kevin Sweeney wrote:
 Presumably we can trust the user here - health checking is after all 
 opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch 
 greatly simplifies the code on our end.

I'm very worried about introducing a feature which can allow unhealthy 
instances live forever. Furthermore this information isn't exposed on the 
Aurora UI or Observer UI so there isn't an easy way to check if an instance has 
health checking disabled or not. A user can be completely oblivious that health 
checking of their instance has been disabled.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---


On Oct. 8, 2014, 6:56 p.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 8, 2014, 6:56 p.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.

2014-10-09 Thread Zameer Manji
Applying the patch, cleaning my reports directory and then running
`./gradlew clean build` produces no error for me. I can share my xml report
if needed.

On Thu, Oct 9, 2014 at 10:42 AM, Bill Farner wfar...@apache.org wrote:

 Can you post your xml report to a ticket? It might help get to the bottom
 of this flakiness.


 On Thursday, October 9, 2014, Joshua Cohen jco...@twopensource.com
 wrote:

 I can also verify that running the entire test suite in a debugger, a
 breakpoint set in OfferQueue$OfferQueueImpl$2 is not hit.

 On Thu, Oct 9, 2014 at 10:05 AM, Joshua Cohen jco...@twopensource.com
 wrote:

 Nope, it's passing for me, but it's similar to the behavior I was seeing
 with ApiBeta$2; even after cleaning the reports directory I was *not*
 getting the failure about it missing coverage even though it actually was.

 On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote:

This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26469/

 On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote:

 Ship It!

  Hmm, i'm now getting a different coverage error when preparing to submit 
 this:

 * What went wrong:
 Execution failed for task ':jacocoTestReport'. Thanks for adding the 
 first test coverage to: 
 org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please 
 remove it from the legacyClassesWithoutCoverage list. Expression: 
 (coverage == 0). Values: coverage = 6

  This remains after cleaning the reports directory. Do you see that
 one as well?


 - Bill

 On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote:
   Review request for Aurora, Bill Farner and Zameer Manji.
 By Joshua Cohen.

 *Updated Oct. 9, 2014, 12:15 a.m.*
  *Repository: * aurora
 Description

 Kill code to serve ApiBeta help pages that's no longer used now that the 
 content is served directly.

   Testing

 ./gradlew build -Pq

 $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html

 HTTP/1.1 200 OK

 Content-Type: text/html

 Vary: Accept-Encoding

 Content-Length: 0

 Server: Jetty(7.6.15.v20140411)

   Diffs

- build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6)
- src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java
(eaf63382f689a045f837847736ef24fa75dee874)
- src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java
(62154045f49c5b23949dc739d735c3e5d3680b89)

 View Diff https://reviews.apache.org/r/26469/diff/





 --
 -=Bill




-- 
Zameer Manji


Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread David Pan


 On Oct. 9, 2014, 2:53 p.m., Bill Farner wrote:
  src/main/python/apache/aurora/executor/common/health_checker.py, line 66
  https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
 
  FWIW i actually meant to suggest that the snooze has no concept of time 
  at all.  If the file is there, don't perform health checks.  When you want 
  to re-enable health checks, delete the file.  Happy to hear what others 
  think about that.
 
 Zameer Manji wrote:
 Doesn't this mean user error can disable health checks forever? I think 
 we should treat disabling health checking as an exceptional state (since the 
 proceses has opted in to health checking) and therefore require user action 
 (increasing mtime) to continue to stay in this state.
 
 Kevin Sweeney wrote:
 Presumably we can trust the user here - health checking is after all 
 opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch 
 greatly simplifies the code on our end.
 
 Zameer Manji wrote:
 I'm very worried about introducing a feature which can allow unhealthy 
 instances live forever. Furthermore this information isn't exposed on the 
 Aurora UI or Observer UI so there isn't an easy way to check if an instance 
 has health checking disabled or not. A user can be completely oblivious that 
 health checking of their instance has been disabled.

The motivation behind the health check disabler is for another project I am 
working on, which is the JVM Heap Dumper.  Long story short, the JVM Heap 
Dumper needs to disable health checks for the task that is being heap dumped.  
Under normal circumstances, the heap dumper can re-enable the health checks.  
However, if the heap dumper dies unexpectedly, the health checks will remain 
disabled forever if we don't implement time control.  We don't want the 
disabling and enabling of health checks to be a manual process.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---


On Oct. 9, 2014, 1:56 a.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 9, 2014, 1:56 a.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Re: Review Request 26469: Kill code to serve ApiBeta help pages that's no longer used now that the content is served directly.

2014-10-09 Thread Joshua Cohen
Filed https://issues.apache.org/jira/browse/AURORA-822 and attached my
report.

On Thu, Oct 9, 2014 at 11:29 AM, Zameer Manji zma...@twopensource.com
wrote:

 Applying the patch, cleaning my reports directory and then running
 `./gradlew clean build` produces no error for me. I can share my xml report
 if needed.

 On Thu, Oct 9, 2014 at 10:42 AM, Bill Farner wfar...@apache.org wrote:

 Can you post your xml report to a ticket? It might help get to the bottom
 of this flakiness.


 On Thursday, October 9, 2014, Joshua Cohen jco...@twopensource.com
 wrote:

 I can also verify that running the entire test suite in a debugger, a
 breakpoint set in OfferQueue$OfferQueueImpl$2 is not hit.

 On Thu, Oct 9, 2014 at 10:05 AM, Joshua Cohen jco...@twopensource.com
 wrote:

 Nope, it's passing for me, but it's similar to the behavior I was
 seeing with ApiBeta$2; even after cleaning the reports directory I was
 *not* getting the failure about it missing coverage even though it actually
 was.

 On Thu, Oct 9, 2014 at 8:32 AM, Bill Farner wfar...@apache.org wrote:

This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26469/

 On October 9th, 2014, 2:32 p.m. UTC, *Bill Farner* wrote:

 Ship It!

  Hmm, i'm now getting a different coverage error when preparing to submit 
 this:

 * What went wrong:
 Execution failed for task ':jacocoTestReport'. Thanks for adding the 
 first test coverage to: 
 org/apache/aurora/scheduler/async/OfferQueue$OfferQueueImpl$2 please 
 remove it from the legacyClassesWithoutCoverage list. Expression: 
 (coverage == 0). Values: coverage = 6

  This remains after cleaning the reports directory. Do you see that
 one as well?


 - Bill

 On October 9th, 2014, 12:15 a.m. UTC, Joshua Cohen wrote:
   Review request for Aurora, Bill Farner and Zameer Manji.
 By Joshua Cohen.

 *Updated Oct. 9, 2014, 12:15 a.m.*
  *Repository: * aurora
 Description

 Kill code to serve ApiBeta help pages that's no longer used now that the 
 content is served directly.

   Testing

 ./gradlew build -Pq

 $ curl -I http://192.168.33.7:8081/apibeta/help/method/setQuota.html

 HTTP/1.1 200 OK

 Content-Type: text/html

 Vary: Accept-Encoding

 Content-Length: 0

 Server: Jetty(7.6.15.v20140411)

   Diffs

- build.gradle (8f7eed0b58a00f8e5e0521512975087d5788b5b6)
- src/main/java/org/apache/aurora/scheduler/http/api/ApiBeta.java
(eaf63382f689a045f837847736ef24fa75dee874)
- src/test/java/org/apache/aurora/scheduler/http/api/ApiBetaTest.java
(62154045f49c5b23949dc739d735c3e5d3680b89)

 View Diff https://reviews.apache.org/r/26469/diff/





 --
 -=Bill




 --
 Zameer Manji



Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread Kevin Sweeney


 On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote:
  src/main/python/apache/aurora/executor/common/health_checker.py, line 66
  https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
 
  FWIW i actually meant to suggest that the snooze has no concept of time 
  at all.  If the file is there, don't perform health checks.  When you want 
  to re-enable health checks, delete the file.  Happy to hear what others 
  think about that.
 
 Zameer Manji wrote:
 Doesn't this mean user error can disable health checks forever? I think 
 we should treat disabling health checking as an exceptional state (since the 
 proceses has opted in to health checking) and therefore require user action 
 (increasing mtime) to continue to stay in this state.
 
 Kevin Sweeney wrote:
 Presumably we can trust the user here - health checking is after all 
 opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch 
 greatly simplifies the code on our end.
 
 Zameer Manji wrote:
 I'm very worried about introducing a feature which can allow unhealthy 
 instances live forever. Furthermore this information isn't exposed on the 
 Aurora UI or Observer UI so there isn't an easy way to check if an instance 
 has health checking disabled or not. A user can be completely oblivious that 
 health checking of their instance has been disabled.
 
 David Pan wrote:
 The motivation behind the health check disabler is for another project I 
 am working on, which is the JVM Heap Dumper.  Long story short, the JVM Heap 
 Dumper needs to disable health checks for the task that is being heap dumped. 
  Under normal circumstances, the heap dumper can re-enable the health checks. 
  However, if the heap dumper dies unexpectedly, the health checks will remain 
 disabled forever if we don't implement time control.  We don't want the 
 disabling and enabling of health checks to be a manual process.

I still agree with Bill here - the simpler implementation (skip health check if 
the file exists) is easier to reason about and a cleaner API.

A process running in the sandbox could implement some form of timeout logic 
itself (for example, the application could expose an endpoint that forks the 
equivalent of touch .healthchecksnooze; sleep 60; rm .healthchecksnooze) but 
I like the idea of keeping the executor API here simple to explain and reason 
about.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---


On Oct. 8, 2014, 6:56 p.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 8, 2014, 6:56 p.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Re: Review Request 26376: Implementing non-prod MTTA/R SLA metrics.

2014-10-09 Thread Maxim Khutornenko


 On Oct. 9, 2014, 2:46 p.m., Bill Farner wrote:
  src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java, 
  line 59
  https://reviews.apache.org/r/26376/diff/1/?file=714221#file714221line59
 
  Can you instead export these values into a MapString, AtomicLong and 
  validate the full map at the end?  That would firm up the value checking 
  here.
  
  Something like:
  
  expect(...makeGauge(..)).andAnswer(
  new IAnswer..() {
...
  AtomicLong value = new AtomicLong().
  map.put(key, value).
  return new StatImpl()...;
  }

Thought about that but it would require hardcoding all 27 of currently exposed 
metrics along with their values. I have modified a bit to fully validate metric 
names instead. That's the only validation suitable for this test. Values are 
asserted in SlaAlgorithmTest.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26376/#review55984
---


On Oct. 6, 2014, 7:10 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26376/
 ---
 
 (Updated Oct. 6, 2014, 7:10 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-774
 https://issues.apache.org/jira/browse/AURORA-774
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Implementing non-prod MTTA/R SLA metrics.
 
 
 Diffs
 -
 
   src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
 dca680415426be2bc760875d5774c9e9399ea94b 
   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
 33b6bbe4b39d516595276128bbfa0686d0bb9cbd 
   src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
 aeb90bbb822254cbe4691e45092b9581596ad800 
   src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
 96a04389e6b14c4d1c0bdb4d06abc046e7ea2151 
 
 Diff: https://reviews.apache.org/r/26376/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 26376: Implementing non-prod MTTA/R SLA metrics.

2014-10-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26376/
---

(Updated Oct. 9, 2014, 9:04 p.m.)


Review request for Aurora and Bill Farner.


Changes
---

CR comments.


Bugs: AURORA-774
https://issues.apache.org/jira/browse/AURORA-774


Repository: aurora


Description
---

Implementing non-prod MTTA/R SLA metrics.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/sla/MetricCalculator.java 
dca680415426be2bc760875d5774c9e9399ea94b 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
33b6bbe4b39d516595276128bbfa0686d0bb9cbd 
  src/test/java/org/apache/aurora/scheduler/sla/MetricCalculatorTest.java 
aeb90bbb822254cbe4691e45092b9581596ad800 
  src/test/java/org/apache/aurora/scheduler/sla/SlaTestUtil.java 
96a04389e6b14c4d1c0bdb4d06abc046e7ea2151 

Diff: https://reviews.apache.org/r/26376/diff/


Testing
---

./gradlew -Pq build


Thanks,

Maxim Khutornenko



Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread David Pan


 On Oct. 9, 2014, 2:53 p.m., Bill Farner wrote:
  src/main/python/apache/aurora/executor/common/health_checker.py, line 66
  https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
 
  FWIW i actually meant to suggest that the snooze has no concept of time 
  at all.  If the file is there, don't perform health checks.  When you want 
  to re-enable health checks, delete the file.  Happy to hear what others 
  think about that.
 
 Zameer Manji wrote:
 Doesn't this mean user error can disable health checks forever? I think 
 we should treat disabling health checking as an exceptional state (since the 
 proceses has opted in to health checking) and therefore require user action 
 (increasing mtime) to continue to stay in this state.
 
 Kevin Sweeney wrote:
 Presumably we can trust the user here - health checking is after all 
 opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch 
 greatly simplifies the code on our end.
 
 Zameer Manji wrote:
 I'm very worried about introducing a feature which can allow unhealthy 
 instances live forever. Furthermore this information isn't exposed on the 
 Aurora UI or Observer UI so there isn't an easy way to check if an instance 
 has health checking disabled or not. A user can be completely oblivious that 
 health checking of their instance has been disabled.
 
 David Pan wrote:
 The motivation behind the health check disabler is for another project I 
 am working on, which is the JVM Heap Dumper.  Long story short, the JVM Heap 
 Dumper needs to disable health checks for the task that is being heap dumped. 
  Under normal circumstances, the heap dumper can re-enable the health checks. 
  However, if the heap dumper dies unexpectedly, the health checks will remain 
 disabled forever if we don't implement time control.  We don't want the 
 disabling and enabling of health checks to be a manual process.
 
 Kevin Sweeney wrote:
 I still agree with Bill here - the simpler implementation (skip health 
 check if the file exists) is easier to reason about and a cleaner API.
 
 A process running in the sandbox could implement some form of timeout 
 logic itself (for example, the application could expose an endpoint that 
 forks the equivalent of touch .healthchecksnooze; sleep 60; rm 
 .healthchecksnooze) but I like the idea of keeping the executor API here 
 simple to explain and reason about.

Overall, wouldn't it be simpler for the executor API to handle the time 
control, rather than requiring an identical process in every task's sandbox to 
handle the time control?


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---


On Oct. 9, 2014, 1:56 a.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 9, 2014, 1:56 a.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread Joshua Cohen
I feel like we should err on the side of correctness here, rather than
simplicity? The dangers of someone accidentally leaving health checks
disabled indefinitely (on a service that has opted in to health checks) are
not insignificant.

On Thu, Oct 9, 2014 at 1:34 PM, Kevin Sweeney kevi...@apache.org wrote:



  On Oct. 9, 2014, 7:53 a.m., Bill Farner wrote:
   src/main/python/apache/aurora/executor/common/health_checker.py, line
 66
   
 https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
  
   FWIW i actually meant to suggest that the snooze has no concept of
 time at all.  If the file is there, don't perform health checks.  When you
 want to re-enable health checks, delete the file.  Happy to hear what
 others think about that.
 
  Zameer Manji wrote:
  Doesn't this mean user error can disable health checks forever? I
 think we should treat disabling health checking as an exceptional state
 (since the proceses has opted in to health checking) and therefore require
 user action (increasing mtime) to continue to stay in this state.
 
  Kevin Sweeney wrote:
  Presumably we can trust the user here - health checking is after all
 opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch
 greatly simplifies the code on our end.
 
  Zameer Manji wrote:
  I'm very worried about introducing a feature which can allow
 unhealthy instances live forever. Furthermore this information isn't
 exposed on the Aurora UI or Observer UI so there isn't an easy way to check
 if an instance has health checking disabled or not. A user can be
 completely oblivious that health checking of their instance has been
 disabled.
 
  David Pan wrote:
  The motivation behind the health check disabler is for another
 project I am working on, which is the JVM Heap Dumper.  Long story short,
 the JVM Heap Dumper needs to disable health checks for the task that is
 being heap dumped.  Under normal circumstances, the heap dumper can
 re-enable the health checks.  However, if the heap dumper dies
 unexpectedly, the health checks will remain disabled forever if we don't
 implement time control.  We don't want the disabling and enabling of health
 checks to be a manual process.

 I still agree with Bill here - the simpler implementation (skip health
 check if the file exists) is easier to reason about and a cleaner API.

 A process running in the sandbox could implement some form of timeout
 logic itself (for example, the application could expose an endpoint that
 forks the equivalent of touch .healthchecksnooze; sleep 60; rm
 .healthchecksnooze) but I like the idea of keeping the executor API here
 simple to explain and reason about.


 - Kevin


 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/#review55985
 ---


 On Oct. 8, 2014, 6:56 p.m., David Pan wrote:
 
  ---
  This is an automatically generated e-mail. To reply, visit:
  https://reviews.apache.org/r/26383/
  ---
 
  (Updated Oct. 8, 2014, 6:56 p.m.)
 
 
  Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
  Repository: aurora
 
 
  Description
  ---
 
  The health check disabler allows health checks for a job to be snoozed
 temporarily by touching a snooze file in the job's sandbox.  The
 appropriate unit tests were modified/added.
 
  The corresponding JIRA ticket is the following:
  https://issues.apache.org/jira/browse/AURORA-795
 
 
  Diffs
  -
 
src/main/python/apache/aurora/executor/common/health_checker.py
 4980411c847d12655cbb363404707ebd9f0bd163
src/test/python/apache/aurora/executor/common/BUILD
 c7f7a003c865d479ba6e3cd7b5349322f884f653
src/test/python/apache/aurora/executor/common/test_health_checker.py
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec
 
  Diff: https://reviews.apache.org/r/26383/diff/
 
 
  Testing
  ---
 
  On vagrant in ~/aurora, I ran
  ./pants src/test/python/apache/aurora/executor::
 
 
  Thanks,
 
  David Pan
 
 




Re: Review Request 26383: Health Check Disabler

2014-10-09 Thread Joshua Cohen


 On Oct. 9, 2014, 2:53 p.m., Bill Farner wrote:
  src/main/python/apache/aurora/executor/common/health_checker.py, line 66
  https://reviews.apache.org/r/26383/diff/3/?file=716355#file716355line66
 
  FWIW i actually meant to suggest that the snooze has no concept of time 
  at all.  If the file is there, don't perform health checks.  When you want 
  to re-enable health checks, delete the file.  Happy to hear what others 
  think about that.
 
 Zameer Manji wrote:
 Doesn't this mean user error can disable health checks forever? I think 
 we should treat disabling health checking as an exceptional state (since the 
 proceses has opted in to health checking) and therefore require user action 
 (increasing mtime) to continue to stay in this state.
 
 Kevin Sweeney wrote:
 Presumably we can trust the user here - health checking is after all 
 opt-in via {{thermos.ports[health]}}. Making it a simple on/off switch 
 greatly simplifies the code on our end.
 
 Zameer Manji wrote:
 I'm very worried about introducing a feature which can allow unhealthy 
 instances live forever. Furthermore this information isn't exposed on the 
 Aurora UI or Observer UI so there isn't an easy way to check if an instance 
 has health checking disabled or not. A user can be completely oblivious that 
 health checking of their instance has been disabled.
 
 David Pan wrote:
 The motivation behind the health check disabler is for another project I 
 am working on, which is the JVM Heap Dumper.  Long story short, the JVM Heap 
 Dumper needs to disable health checks for the task that is being heap dumped. 
  Under normal circumstances, the heap dumper can re-enable the health checks. 
  However, if the heap dumper dies unexpectedly, the health checks will remain 
 disabled forever if we don't implement time control.  We don't want the 
 disabling and enabling of health checks to be a manual process.
 
 Kevin Sweeney wrote:
 I still agree with Bill here - the simpler implementation (skip health 
 check if the file exists) is easier to reason about and a cleaner API.
 
 A process running in the sandbox could implement some form of timeout 
 logic itself (for example, the application could expose an endpoint that 
 forks the equivalent of touch .healthchecksnooze; sleep 60; rm 
 .healthchecksnooze) but I like the idea of keeping the executor API here 
 simple to explain and reason about.
 
 David Pan wrote:
 Overall, wouldn't it be simpler for the executor API to handle the time 
 control, rather than requiring an identical process in every task's sandbox 
 to handle the time control?

I feel like we should err on the side of correctness here, rather than 
simplicity? The dangers of someone accidentally leaving health checks disabled 
indefinitely (on a service that has opted in to health checks) are not 
insignificant.


- Joshua


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26383/#review55985
---


On Oct. 9, 2014, 1:56 a.m., David Pan wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26383/
 ---
 
 (Updated Oct. 9, 2014, 1:56 a.m.)
 
 
 Review request for Aurora, Joe Smith, Brian Wickman, and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 The health check disabler allows health checks for a job to be snoozed 
 temporarily by touching a snooze file in the job's sandbox.  The appropriate 
 unit tests were modified/added.
 
 The corresponding JIRA ticket is the following:
 https://issues.apache.org/jira/browse/AURORA-795
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 4980411c847d12655cbb363404707ebd9f0bd163 
   src/test/python/apache/aurora/executor/common/BUILD 
 c7f7a003c865d479ba6e3cd7b5349322f884f653 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 aa36415fa891fc523a3a376ffeca5d3cd5ceabec 
 
 Diff: https://reviews.apache.org/r/26383/diff/
 
 
 Testing
 ---
 
 On vagrant in ~/aurora, I ran
 ./pants src/test/python/apache/aurora/executor::
 
 
 Thanks,
 
 David Pan
 




Re: Review Request 26478: Add a flag to deduplicate storage snapshots

2014-10-09 Thread Kevin Sweeney


 On Oct. 9, 2014, 10:30 a.m., Maxim Khutornenko wrote:
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java,
   line 133
  https://reviews.apache.org/r/26478/diff/2/?file=716380#file716380line133
 
  There is a remote possibility of numOutputTasks to be zero in a 
  snapshot for an empty cluster.

hmm - there is, and that would result in NaN appearing in the log. Which is 
technically accurate


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26478/#review56010
---


On Oct. 8, 2014, 7:39 p.m., Kevin Sweeney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26478/
 ---
 
 (Updated Oct. 8, 2014, 7:39 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
 
 
 Bugs: AURORA-722
 https://issues.apache.org/jira/browse/AURORA-722
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Add a new format for deduplicated storage snapshots. Microbenchmarks show a 
 10x deduplication ratio on Twitter's production snapshots.
 
 This format is backwards-incompatible, so this patch introduces a flag to 
 control its use (defaulting off).
 
 This only changes the format used to write to the replicated log (where time 
 is of the essence since all writes are done holding the global storage lock) 
 - the format of backups written to disk is unchanged, as backups don't hold 
 the lock.
 
 
 Diffs
 -
 
   build.gradle 2e1bf78d7797f17afd51a94a22eff80e00aba464 
   src/main/java/org/apache/aurora/codec/ThriftBinaryCodec.java 
 65e986eaa2c4193431ca048425a1ed3ab60f5882 
   src/main/java/org/apache/aurora/scheduler/storage/log/EntrySerializer.java 
 7239a6a5eb5479e395e16423c83fdf80a77e5a83 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogManager.java 
 4b50e2069407dc263b4fc93f1827d3a8836253bf 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java 
 f806297d1d0700155c976743f936b2b8a3a390fb 
   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
 769348e6b8a5c701734afff391b1c77de35222c6 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
  PRE-CREATION 
   src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java 
 22db80eaf34fe736fa5a3a9289836c9ac9e59906 
   
 src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java 
 e5cfbf5cf43bf5bbc38c42fe685a7e9f0d03af2a 
   src/main/thrift/org/apache/aurora/gen/storage.thrift 
 5350ec945fbe028ee4641683815a068ce00b5efc 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java 
 39729b374fe4e383f9b5ada7d016923766df9af7 
   src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java 
 7a8c3b882633376a1bf6a78616d55aaa7401d13f 
   
 src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicatorImplTest.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/26478/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Kevin Sweeney
 




Re: Review Request 26431: Moving post_drain script execution into host_maintenance.py

2014-10-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26431/#review56076
---


Ping, Mark.

- Maxim Khutornenko


On Oct. 8, 2014, 11:46 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26431/
 ---
 
 (Updated Oct. 8, 2014, 11:46 p.m.)
 
 
 Review request for Aurora, Joe Smith and Mark Chu-Carroll.
 
 
 Bugs: AURORA-806
 https://issues.apache.org/jira/browse/AURORA-806
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Moving post_drain script functionality into host_maintenance.py to support 
 per batch execution.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/admin/host_maintenance.py 
 9c2a9f77109791da574e1624d27b6b7096a2678e 
   src/main/python/apache/aurora/client/commands/maintenance.py 
 e465d973e9f764076e18491e1691d44303c0f388 
   src/test/python/apache/aurora/admin/test_host_maintenance.py 
 40228df59e43bc6034f2dc651c166a0c4b78aea8 
 
 Diff: https://reviews.apache.org/r/26431/diff/
 
 
 Testing
 ---
 
 ./pants src/test/python:all
 
 
 Thanks,
 
 Maxim Khutornenko
 




Review Request 26531: Defining schema for the heartbeat RPC.

2014-10-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26531/
---

Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Bugs: AURORA-690
https://issues.apache.org/jira/browse/AURORA-690


Repository: aurora


Description
---

Defining the flag and the schema for the update coordination (aka heartbeats).


Diffs
-

  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c 
  
src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
 77032d84f91b149c01ce4ac62da7ca331a2b6445 
  src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
db6a8719d18fee720f74ebcd2079ad36201cc831 
  src/main/thrift/org/apache/aurora/gen/api.thrift 
8794731f4b3f1033588bdfa33c292e4796319a2a 
  
src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 
3871dae68fcdc6402cb61a7244b46114617eecff 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 
93f79d7cfed2ba3d56548b43b926ce7ddec16c9e 

Diff: https://reviews.apache.org/r/26531/diff/


Testing
---

./gradlew -Pq build


Thanks,

Maxim Khutornenko



Re: Review Request 26531: Defining schema for the heartbeat RPC.

2014-10-09 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26531/#review56080
---

Ship it!



src/main/thrift/org/apache/aurora/gen/api.thrift
https://reviews.apache.org/r/26531/#comment96435

s/at/within/



src/main/thrift/org/apache/aurora/gen/api.thrift
https://reviews.apache.org/r/26531/#comment96436

I think we decided not to do auto-resume once a job is paused due to lack 
of heartbeat, but I can't remember why. Maybe someone else remembers?


- David McLaughlin


On Oct. 9, 2014, 11:02 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26531/
 ---
 
 (Updated Oct. 9, 2014, 11:02 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-690
 https://issues.apache.org/jira/browse/AURORA-690
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Defining the flag and the schema for the update coordination (aka heartbeats).
 
 
 Diffs
 -
 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  77032d84f91b149c01ce4ac62da7ca331a2b6445 
   src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
 db6a8719d18fee720f74ebcd2079ad36201cc831 
   src/main/thrift/org/apache/aurora/gen/api.thrift 
 8794731f4b3f1033588bdfa33c292e4796319a2a 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  3871dae68fcdc6402cb61a7244b46114617eecff 
   src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 
 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e 
 
 Diff: https://reviews.apache.org/r/26531/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 26531: Defining schema for the heartbeat RPC.

2014-10-09 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26531/
---

(Updated Oct. 9, 2014, 11:23 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
---

Adding SessionKey into the heartbeat RPC and adding python test.


Bugs: AURORA-690
https://issues.apache.org/jira/browse/AURORA-690


Repository: aurora


Description
---

Defining the flag and the schema for the update coordination (aka heartbeats).


Diffs (updated)
-

  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c 
  
src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
 77032d84f91b149c01ce4ac62da7ca331a2b6445 
  src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
db6a8719d18fee720f74ebcd2079ad36201cc831 
  src/main/thrift/org/apache/aurora/gen/api.thrift 
8794731f4b3f1033588bdfa33c292e4796319a2a 
  
src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 
3871dae68fcdc6402cb61a7244b46114617eecff 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 
93f79d7cfed2ba3d56548b43b926ce7ddec16c9e 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
d78e7dca28d67997bc6c98cff619ab94a257c7dc 

Diff: https://reviews.apache.org/r/26531/diff/


Testing
---

./gradlew -Pq build


Thanks,

Maxim Khutornenko



Re: Review Request 26531: Defining schema for the heartbeat RPC.

2014-10-09 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26531/#review56083
---

Ship it!


Ship It!

- David McLaughlin


On Oct. 9, 2014, 11:23 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26531/
 ---
 
 (Updated Oct. 9, 2014, 11:23 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-690
 https://issues.apache.org/jira/browse/AURORA-690
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Defining the flag and the schema for the update coordination (aka heartbeats).
 
 
 Diffs
 -
 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  77032d84f91b149c01ce4ac62da7ca331a2b6445 
   src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
 db6a8719d18fee720f74ebcd2079ad36201cc831 
   src/main/thrift/org/apache/aurora/gen/api.thrift 
 8794731f4b3f1033588bdfa33c292e4796319a2a 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  3871dae68fcdc6402cb61a7244b46114617eecff 
   src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 
 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e 
   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
 d78e7dca28d67997bc6c98cff619ab94a257c7dc 
 
 Diff: https://reviews.apache.org/r/26531/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 26531: Defining schema for the heartbeat RPC.

2014-10-09 Thread Maxim Khutornenko


 On Oct. 9, 2014, 11:20 p.m., David McLaughlin wrote:
  src/main/thrift/org/apache/aurora/gen/api.thrift, line 619
  https://reviews.apache.org/r/26531/diff/1/?file=717117#file717117line619
 
  I think we decided not to do auto-resume once a job is paused due to 
  lack of heartbeat, but I can't remember why. Maybe someone else remembers?

I don't quite see a difference between user pausing/resuming an update or the 
HB service doing so. Differentiating between the two would require a pause 
reason stored somewhere. I'd rather avoid extra complexity unless absolutely 
necessary.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26531/#review56080
---


On Oct. 9, 2014, 11:23 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/26531/
 ---
 
 (Updated Oct. 9, 2014, 11:23 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-690
 https://issues.apache.org/jira/browse/AURORA-690
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Defining the flag and the schema for the update coordination (aka heartbeats).
 
 
 Diffs
 -
 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  137f97d33decd14bf2f6dcdd9cd18c3db2b7c89c 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  77032d84f91b149c01ce4ac62da7ca331a2b6445 
   src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
 db6a8719d18fee720f74ebcd2079ad36201cc831 
   src/main/thrift/org/apache/aurora/gen/api.thrift 
 8794731f4b3f1033588bdfa33c292e4796319a2a 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  3871dae68fcdc6402cb61a7244b46114617eecff 
   src/test/java/org/apache/aurora/scheduler/thrift/aop/ForwardingThrift.java 
 93f79d7cfed2ba3d56548b43b926ce7ddec16c9e 
   src/test/python/apache/aurora/client/api/test_scheduler_client.py 
 d78e7dca28d67997bc6c98cff619ab94a257c7dc 
 
 Diff: https://reviews.apache.org/r/26531/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko