Re: Review Request 68078: Updating aurora-packaging for compatibility with Pants 1.6.0 and to reflect Mesos 1.5.0 change

2018-07-27 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68078/#review206535
---


Ship it!




Thanks!

- Stephan Erb


On Juli 27, 2018, 4:35 vorm., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68078/
> ---
> 
> (Updated Juli 27, 2018, 4:35 vorm.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Repository: aurora-packaging
> 
> 
> Description
> ---
> 
> * Upgrading Trusty Box with python2.7 version that is has hmac.compare_digest 
> support.
> * Upgradinb Virtualenv to 16.0.0
> * Upgrading to Mesos 1.5.0 on all test boxes. 
> * Upgrading Centos 7 box from 7.4 to 7.5
> * Migrating from deprecated Pants --config-override to --config-files as per 
> https://github.com/pantsbuild/pants/commit/e0d5108ff75a41421a11321c73c817332f0a1c86
> 
> 
> Diffs
> -
> 
>   build-support/virtualenv d6484f58fbffd33ef61d6052c869c55153ec7313 
>   builder/deb/ubuntu-trusty/Dockerfile 
> b2b32c6be6f2b8fb61321bf700192ce4578ed2f8 
>   specs/debian/control 83b20b67518ecfb03380839e63d20fe94d2765aa 
>   specs/debian/rules 8e338b83fa9b6c4cad07e29a4bfeac2808a13c44 
>   specs/rpm/aurora.spec 035f049b07794722551035a99dfd014152e3396e 
>   test/deb/debian-jessie/provision.sh 
> e381613198b1a914ccc1bf8ca724bf5947d4bdb3 
>   test/deb/ubuntu-trusty/provision.sh 
> 1a622889c177105fa54fac6fc8c2fb66983eef87 
>   test/deb/ubuntu-xenial/provision.sh 
> c55af0c1e4f3f87c93357a04ce1ca14123ce03e2 
>   test/rpm/centos-7/Vagrantfile bea36d18e9b7aa1dac2d51edfcc789b9217550ce 
>   test/rpm/centos-7/provision.sh 103937610740332a9ce14253a0e3edc7a00b1710 
> 
> 
> Diff: https://reviews.apache.org/r/68078/diff/1/
> 
> 
> Testing
> ---
> 
> ./test/test-artifact.sh on all boxes and their respective releases.
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206214
---


Ship it!




Ship It!

- Stephan Erb


On July 18, 2018, 10:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 10:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Stephan Erb


> On July 18, 2018, 11:21 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py
> > Lines 159 (patched)
> > <https://reviews.apache.org/r/67967/diff/1/?file=2061542#file2061542line159>
> >
> > Should we use TASK_LOST here instead? Most users interpret TASK_FAILED 
> > as their responsibility whereas TASK_LOST is more of a misshap of 
> > Aurora/Mesos/Thermos. I would think an unknown exception in the runner is 
> > part of the latter category.
> 
> Santhosh Kumar Shanmugham wrote:
> Hmm. Then we can argue that failure to create sandbox or fork the process 
> etc also should be treated as TASK_LOST? At Twitter this is really not going 
> to help us, since we have platform wrapper that cause TASK_FAILED and it is 
> already hard to differentiate user configuration failures against platform 
> dependency failures.
> 
> I wanted to keep this consistent with the rest. If TASK_LOST makes more 
> sense for you I can update it.

You make a good point. Let's keep it at FAILED. If really necessary, I could 
always come back later with a more complete proposal.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
---


On July 18, 2018, 10:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 10:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67696: Enable SLA-aware updates

2018-07-18 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review206212
---


Ship it!




Ship It!

- Stephan Erb


On July 18, 2018, 12:15 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> ---
> 
> (Updated July 18, 2018, 12:15 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks may now specify custom 
> SLA policies that will be respected by the scheduler during maintenance. This 
> patch integrates into the same system to allow users to specify if they want 
> their updates to also respect SLA. Please see 
> https://docs.google.com/document/d/1lCoDyoX26qrGrptrgO7vJHqYR_L2CBRGFIywsAd8uQo/edit?usp=sharing
>  for a more detailed description.
> 
> This patch adds two optional Thrift fields, `slaAware` to `JobUpdateSettings` 
> and `message` to `JobInstanceUpdateEvent`. These should be forward and 
> backwards compatible.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
>   docs/features/job-updates.md b52eb35a1de9da40f8a00ef0b905df30069029d3 
>   docs/reference/configuration.md acab4c58d9ab3c04d156fed3636e77aed6d1faf4 
>   docs/reference/scheduler-configuration.md 
> 805e516689be019101f7c220c89fd9c391bb93b3 
>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java 
> 2e13aacf576e648d9fffe989e4fc05c8954e72d8 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 9aa51c3637df74cca088bd65c5539e1ebb8e5f0d 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  6a28bc274acdd6d3ac239166771ef2d45648d60f 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> 9fa68b2dd55b4e4f5436356c1b94af1393967679 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> a002d955c3bc7b7c39da5e130e8c10c536bdcebd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  ec577cccb86914ebd679ca235103f79dd7e7b79d 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> f2d33fb9ab6bd2c3ff199ab03dc75b1d6d618f3a 
>   src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 74ee1745e1fd7c308e2bdfa46aeb18a7ecfe14d2 
>   src/main/java/org/apache/aurora/scheduler/updater/Updates.java 
> f949fd54f524780672167e12fcadf268da08e679 
>   src/main/python/apache/aurora/client/api/updater_util.py 
> 4e3986220aaa4c9b138394b962120b176185af12 
>   src/main/python/apache/aurora/config/schema/base.py 
> 7baded79acdf863670afc183d740dcad602490c2 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> abee0951ca998894b29ee32c5362ef30da6421c7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> dcf58896f1e866c0369ba1b78060236e98d9d46b 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractJobUpdateStoreTest.java
>  3a06a451da4ef3acccb33b5495b9fae141557148 
>   
> src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 
> 0aea369d8a8f75291de9691b6d61f3d48895507c 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  aa1cb2b287642e87d787e160e04a17ad0e4690d9 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> 43f857d893a54e19e71b36f2f06fef3a3ef6e874 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 5667a1b59681a6de87149d7161d760bff5da3818 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 2c27ec7136ff22a3570a4ec278c73f7ee310f628 
>   
> src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java 
> PRE-CREATION 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> 2baba2aa55865ec298a4c9e5af3952b56cb9a910 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobInstanceUpdateEvent
>  48902d39b9d2cbeae1a52

Re: Review Request 67696: Enable SLA-aware updates

2018-07-18 Thread Stephan Erb


> On July 17, 2018, 10:52 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
> > Lines 413-417 (patched)
> > <https://reviews.apache.org/r/67696/diff/9/?file=2059744#file2059744line414>
> >
> > I am a bit confused about this. Can you please elaborate a bit more 
> > when/how you observed this problem?
> > 
> > I am mostly concerned about race conditions here. 
> > 
> > For example, say I trigger a job update from config A to B. Shortly 
> > before I update an instance the instance fails or transitions to LOST. 
> > Concurrently we might end up hopping into the `instanceChanged` method 
> > here, but bail due to the new guard without ever handling the switch from 
> > config A to B.
> > 
> > Could be that I am missing something entirely here. Would be great if 
> > you could elaborate.
> 
> Jordan Ly wrote:
> Good question.
> 
> When we determine what action we should perform on an instance, we use 
> the config/state we get from these task events. If we are using stale 
> information, we may be producing inaccurate actions to perform. This is a 
> rare scenario but it relies on tons of async events happening at the same 
> time so there is a large delay in event processing. The scenario that I 
> encountered that lead me to introduce this guard was the following:
> 
> 1. Task "ABC-old-1" is being scheduled and goes from ASSIGNED -> RUNNING 
> (RUNNING is now on queue to be evaluated as an update)
> 2. A different previous task finishes updating so instance 1 is added to 
> the working group to be updated. The initial evaluation causes "ABC-old-1" to 
> be moved to KILLING.
> 3. "ABC-old-1" is KILLED and "ABC-new-1" is created and running.
> 4. RUNNING from the async queue is finally evaluated so we see that this 
> instance number needs to be updated (since it is "ABC-old-1") so we kill the 
> task at 1 (which is now "ABC-new-1").
> 5. Before this patch, this would mean the task needs to be killed and 
> added again. Now, since there is an async aspect to killing, this could mean 
> an instance # is permanently killed post-update.
> 
> I am ensuring this condition is met within the evaluation of state 
> changes. From `StateEvaluator`:
> 
> ```
>* It is the responsibility of the caller to ensure that the {@code 
> actualState} is the latest
>* value.  Note: the caller should avoid calling this when a terminal 
> task is moving to another
>* terminal state.  It should also suppress deletion events for tasks 
> that have been replaced by
>* an active task.
> ```
> 
> In the scenario you provide, the condition will work as intended and the 
> update should still move to completion successfully. A write lock governs all 
> actions dealing with updates so we do not have race conditions.
> 
> 1. Instance A needs to be updated.
> 2. Instance A fails before being evaluated for an update.
> 3. Since A failed, it will be rescheduled into A'.
>   - If A is evaluated for the update before being rescheduled (e.g. we 
> evaluate the state change into PENDING or FAILED before rescheduling), then 
> we kill that task and update the configuration
>   - If A is rescheduled before being evaluated for the update, then we 
> will kill A' and update the configuration
> 
> Lots of moving parts for updates so hopefully what I said makes sense :/

Thanks for the detailed writeup. I can follow your argument and it makes sense 
to me. However, I also have to admit that I still haven't wrapped my head 
around all the details of the update procedure 100%. I trust your judgment and 
analysis though :)


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review206149
---


On July 18, 2018, 12:15 a.m., Jordan Ly wrote:
> 
> -------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> ---
> 
> (Updated July 18, 2018, 12:15 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks 

Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
---




src/main/python/apache/aurora/executor/aurora_executor.py
Lines 159 (patched)
<https://reviews.apache.org/r/67967/#comment289101>

Should we use TASK_LOST here instead? Most users interpret TASK_FAILED as 
their responsibility whereas TASK_LOST is more of a misshap of 
Aurora/Mesos/Thermos. I would think an unknown exception in the runner is part 
of the latter category.


- Stephan Erb


On July 18, 2018, 10:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 10:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67696: Enable SLA-aware updates

2018-07-17 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review206149
---



Looks good, thanks!

I have just one question that is currently confusing me and a few smaller 
nitpicks. (I would even be okay if you punt on the nitpicks since the patch has 
already seen quite a few iterations)


src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java
Line 145 (original), 145 (patched)
<https://reviews.apache.org/r/67696/#comment289081>

Nitpick: Please switch to the logger-builtin formatting. This reduces a bit 
of overhead for messages that never end up being logged.



src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java
Lines 178-184 (patched)
<https://reviews.apache.org/r/67696/#comment289051>

Nitpick: We normally give the parameters a consistent indentation 
indepentent of the length of the method/function name. This leads to code that 
is a bit less jagged. For an example, see `getReevaluationDelay` in this file. 

The same remark also applies to a few other places, e.g. 
`SlaKillController`.

Reference 
https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/styleguide.md#indent-style
 and 
https://github.com/twitter/commons/blob/master/src/python/twitter/common/styleguide.md#best-practices



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
Lines 413-417 (patched)
<https://reviews.apache.org/r/67696/#comment289052>

I am a bit confused about this. Can you please elaborate a bit more 
when/how you observed this problem?

I am mostly concerned about race conditions here. 

For example, say I trigger a job update from config A to B. Shortly before 
I update an instance the instance fails or transitions to LOST. Concurrently we 
might end up hopping into the `instanceChanged` method here, but bail due to 
the new guard without ever handling the switch from config A to B.

Could be that I am missing something entirely here. Would be great if you 
could elaborate.


- Stephan Erb


On July 17, 2018, 8:21 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> ---
> 
> (Updated July 17, 2018, 8:21 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks may now specify custom 
> SLA policies that will be respected by the scheduler during maintenance. This 
> patch integrates into the same system to allow users to specify if they want 
> their updates to also respect SLA. Please see 
> https://docs.google.com/document/d/1lCoDyoX26qrGrptrgO7vJHqYR_L2CBRGFIywsAd8uQo/edit?usp=sharing
>  for a more detailed description.
> 
> This patch adds two optional Thrift fields, `slaAware` to `JobUpdateSettings` 
> and `message` to `JobInstanceUpdateEvent`. These should be forward and 
> backwards compatible.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
>   docs/features/job-updates.md b52eb35a1de9da40f8a00ef0b905df30069029d3 
>   docs/reference/configuration.md acab4c58d9ab3c04d156fed3636e77aed6d1faf4 
>   docs/reference/scheduler-configuration.md 
> 805e516689be019101f7c220c89fd9c391bb93b3 
>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java 
> 2e13aacf576e648d9fffe989e4fc05c8954e72d8 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 9aa51c3637df74cca088bd65c5539e1ebb8e5f0d 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  6a28bc274acdd6d3ac239166771ef2d45648d60f 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> 9fa68b2dd55b4e4f5436356c1b94af1393967679 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> a002d955c3bc7b7c39da5e130e8c10c536bdcebd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  ec577cccb86914ebd679ca235103f79dd7e7b79d 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> f2d33fb9ab6bd2c3ff199ab03dc75b1d6d618f3a 
>   src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java 
> PRE-CREATION 
>   src/main/java/org/apach

Re: Review Request 67584: Update to Mesos 1.5

2018-06-15 Thread Stephan Erb


> On Juni 15, 2018, 8 nachm., David McLaughlin wrote:
> > Vagrantfile
> > Line 20 (original), 20 (patched)
> > <https://reviews.apache.org/r/67584/diff/1/?file=2040319#file2040319line20>
> >
> > Why this downgrade?

That is just the minimal version and it works fine with 2.0.2 as well. 

For super unfortunate reason I am stuck with 2.0.2 on a workstation of mine :-/


> On Juni 15, 2018, 8 nachm., David McLaughlin wrote:
> > Vagrantfile
> > Line 26 (original), 26 (patched)
> > <https://reviews.apache.org/r/67584/diff/1/?file=2040319#file2040319line26>
> >
> > Why this upgrade? :)

This is a new build of our vagrant box with Mesos 1.5.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67584/#review204855
-----------


On Juni 14, 2018, 12:08 vorm., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67584/
> ---
> 
> (Updated Juni 14, 2018, 12:08 vorm.)
> 
> 
> Review request for Aurora, David McLaughlin and Renan DelValle.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is an upgrade to Mesos 1.5.0. While 1.5.1 is already released, there are 
> no Debian
> packages and jars available yet. 
> 
> The mesos.interface Python package has a requirement on a newer protobuf 
> version.
> I applied the same update to Java for consistency.
> 
> * Mesos 1.5 changelog: https://mesos.apache.org/blog/mesos-1-5-0-released/
> * Mesos update instructions: 
> https://mesos.apache.org/documentation/latest/upgrades/#upgrading-from-1-4-x-to-1-5-x
> * Protobuf changelog: 
> https://github.com/google/protobuf/blob/3.5.x/CHANGES.txt
> 
> 
> Diffs
> -
> 
>   3rdparty/python/BUILD f06c3f346033ca8351d01a17836c6f3cb2a5fc08 
>   3rdparty/python/requirements.txt 72be82f5ff47d7989a604984dcd0d348e0cc76f8 
>   RELEASE-NOTES.md 0ef75d601ef7c170c81f574aad12de2c8afce7b0 
>   Vagrantfile 5b59dba6d305bcbcb6285e8cdfee337f4a584f99 
>   build-support/packer/aurora.json da279174bb7a48b51a67ee825dfbdca409648f79 
>   build-support/packer/build.sh 7c36f74ee8d498df9c282a8891e720f60cfd576a 
>   build.gradle 57355dceaec41cd97256d954337ecb3ca0c50ba1 
> 
> 
> Diff: https://reviews.apache.org/r/67584/diff/1/
> 
> 
> Testing
> ---
> 
> Finished successfully:
> 
> * `./build-support/jenkins/build.sh`
> * `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` against Mesos 
> 1.5.0 (new vagrant box)
> * `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` against Mesos 
> 1.4.0 (old vagrant box)
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66139: Speedup regular Thermos observer checkpoint refresh

2018-06-14 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66139/#review204770
---



As you have agreed with this patch in general, I will merge it now and follow 
up with an additional performance patch set in the future (hopefully in a week 
or two).

- Stephan Erb


On March 20, 2018, 11:41 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66139/
> ---
> 
> (Updated March 20, 2018, 11:41 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Renan DelValle, and Reza Motamedi.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Profiling indicates that a significant part of the refresh time os spend in 
> `os.path.realpath`. 
> This was introduced in https://reviews.apache.org/r/35580/ to properly handle 
> the `latest`
> symlink in the Mesos folder layout. 
> 
> This patch takes a slightly different approach to solve this problem based on 
> `os.path.islink`.
> The latter is faster as it just needs to look at a single folder rather than 
> an entire path.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/common/path_detector.py 
> ed264d74ef5a5a7aa681a56b340f9b16504a88ad 
>   src/test/python/apache/aurora/executor/common/test_path_detector.py 
> 7b5ef0cf552d22d4cfbf3357071de036551026dc 
> 
> 
> Diff: https://reviews.apache.org/r/66139/diff/2/
> 
> 
> Testing
> ---
> 
> I have tested this build on a node with 55 running tasks and 2004 finished 
> ones.
> 
> Before this patch:
> 
> D0320 22:20:44.887248 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.92s
> D0320 22:20:50.746316 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.93s
> D0320 22:20:56.590157 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.89s
>  
> With this patch:
> 
> D0320 22:18:53.545236 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.48s
> D0320 22:18:59.031919 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.49s
> D0320 22:19:04.512358 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.48s
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66139: Speedup regular Thermos observer checkpoint refresh

2018-06-14 Thread Stephan Erb


> On March 21, 2018, 6:47 p.m., Santhosh Kumar Shanmugham wrote:
> > Ship It!
> 
> Santhosh Kumar Shanmugham wrote:
> I am a little confused about the performance testing results that are 
> posted here, since the pervious results indicated gains from 2secs to 
> 0.2secs, while the current one is much lesser. Can you add a little bit of 
> context regarding the results? Thanks.

Yes definitely. I somehow missed that this one here was still open.

* Code on master, give list of filesystem paths: Parse each path into its 
components, do some basic validation, join components back to a path string. 
Afterwards filter all paths that point to the same sandbox. What is expensive 
here is the parsing/splitting and the realpath computation needed for the 
duplication check.

* First version of the patch: Eliminates both sources of a slowdown and removes 
some code that was now used in the Aurora repository. 

* Second version of the patch: Just removes the slowdown of realpath by 
switching to a simpler symlink check. The parsing/splitting is still there. I 
decided to keep it in there as Reza indicated that you are still using this.

What came on top was that my bechmarking environment was not stable so the 
overall time varied across both patches.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66139/#review199681
---


On March 20, 2018, 11:41 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66139/
> ---
> 
> (Updated March 20, 2018, 11:41 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Renan DelValle, and Reza Motamedi.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Profiling indicates that a significant part of the refresh time os spend in 
> `os.path.realpath`. 
> This was introduced in https://reviews.apache.org/r/35580/ to properly handle 
> the `latest`
> symlink in the Mesos folder layout. 
> 
> This patch takes a slightly different approach to solve this problem based on 
> `os.path.islink`.
> The latter is faster as it just needs to look at a single folder rather than 
> an entire path.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/common/path_detector.py 
> ed264d74ef5a5a7aa681a56b340f9b16504a88ad 
>   src/test/python/apache/aurora/executor/common/test_path_detector.py 
> 7b5ef0cf552d22d4cfbf3357071de036551026dc 
> 
> 
> Diff: https://reviews.apache.org/r/66139/diff/2/
> 
> 
> Testing
> ---
> 
> I have tested this build on a node with 55 running tasks and 2004 finished 
> ones.
> 
> Before this patch:
> 
> D0320 22:20:44.887248 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.92s
> D0320 22:20:50.746316 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.93s
> D0320 22:20:56.590157 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.89s
>  
> With this patch:
> 
> D0320 22:18:53.545236 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.48s
> D0320 22:18:59.031919 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.49s
> D0320 22:19:04.512358 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.48s
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 67584: Update to Mesos 1.5

2018-06-13 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67584/
---

(Updated June 14, 2018, 12:08 a.m.)


Review request for Aurora, David McLaughlin and Renan DelValle.


Repository: aurora


Description
---

This is an upgrade to Mesos 1.5.0. While 1.5.1 is already released, there are 
no Debian
packages and jars available yet. 

The mesos.interface Python package has a requirement on a newer protobuf 
version.
I applied the same update to Java for consistency.

* Mesos 1.5 changelog: https://mesos.apache.org/blog/mesos-1-5-0-released/
* Mesos update instructions: 
https://mesos.apache.org/documentation/latest/upgrades/#upgrading-from-1-4-x-to-1-5-x
* Protobuf changelog: https://github.com/google/protobuf/blob/3.5.x/CHANGES.txt


Diffs
-

  3rdparty/python/BUILD f06c3f346033ca8351d01a17836c6f3cb2a5fc08 
  3rdparty/python/requirements.txt 72be82f5ff47d7989a604984dcd0d348e0cc76f8 
  RELEASE-NOTES.md 0ef75d601ef7c170c81f574aad12de2c8afce7b0 
  Vagrantfile 5b59dba6d305bcbcb6285e8cdfee337f4a584f99 
  build-support/packer/aurora.json da279174bb7a48b51a67ee825dfbdca409648f79 
  build-support/packer/build.sh 7c36f74ee8d498df9c282a8891e720f60cfd576a 
  build.gradle 57355dceaec41cd97256d954337ecb3ca0c50ba1 


Diff: https://reviews.apache.org/r/67584/diff/1/


Testing (updated)
---

Finished successfully:

* `./build-support/jenkins/build.sh`
* `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` against Mesos 1.5.0 
(new vagrant box)
* `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` against Mesos 1.4.0 
(old vagrant box)


Thanks,

Stephan Erb



Review Request 67584: Update to Mesos 1.5

2018-06-13 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67584/
---

Review request for Aurora, David McLaughlin and Renan DelValle.


Repository: aurora


Description
---

This is an upgrade to Mesos 1.5.0. While 1.5.1 is already released, there are 
no Debian
packages and jars available yet. 

The mesos.interface Python package has a requirement on a newer protobuf 
version.
I applied the same update to Java for consistency.

* Mesos 1.5 changelog: https://mesos.apache.org/blog/mesos-1-5-0-released/
* Mesos update instructions: 
https://mesos.apache.org/documentation/latest/upgrades/#upgrading-from-1-4-x-to-1-5-x
* Protobuf changelog: https://github.com/google/protobuf/blob/3.5.x/CHANGES.txt


Diffs
-

  3rdparty/python/BUILD f06c3f346033ca8351d01a17836c6f3cb2a5fc08 
  3rdparty/python/requirements.txt 72be82f5ff47d7989a604984dcd0d348e0cc76f8 
  RELEASE-NOTES.md 0ef75d601ef7c170c81f574aad12de2c8afce7b0 
  Vagrantfile 5b59dba6d305bcbcb6285e8cdfee337f4a584f99 
  build-support/packer/aurora.json da279174bb7a48b51a67ee825dfbdca409648f79 
  build-support/packer/build.sh 7c36f74ee8d498df9c282a8891e720f60cfd576a 
  build.gradle 57355dceaec41cd97256d954337ecb3ca0c50ba1 


Diff: https://reviews.apache.org/r/67584/diff/1/


Testing
---

Finished successfully:
* `./build-support/jenkins/build.sh`
* `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` against Mesos 1.5.0 
(new vagrant box)

Running as we speak:
* `./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh` against Mesos 1.4.0 
(old vagrant box)


Thanks,

Stephan Erb



Re: Review Request 67326: Update Pants to 1.6.0 and Virtualenv to 16.0.0

2018-06-13 Thread Stephan Erb


> On June 13, 2018, 9:19 p.m., Santhosh Kumar Shanmugham wrote:
> > Ship It!
> 
> Santhosh Kumar Shanmugham wrote:
> Actullay minor comment - fix the commit message "Virutalenv version is 
> mis-spelled"

Oh good catch, thanks!


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67326/#review204723
---


On June 13, 2018, 9:41 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67326/
> ---
> 
> (Updated June 13, 2018, 9:41 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Beyond a regular version bump, this fixes the build on older versions of 
> MacOS.
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/build.sh a5975398929d01268841fa4c02aa360b309f6114 
>   build-support/python/checkstyle-check 
> 7e65dd97687d3bd5b586cec163f973d08decab6e 
>   build-support/thrift/thriftw 26b4f9c8087214222100a46d83b47dbce01a7471 
>   build-support/virtualenv d6484f58fbffd33ef61d6052c869c55153ec7313 
>   pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
>   pants.ini 8c71b144619f437175727dd2027f702ee749df11 
>   rbt 7531fcb2ed21d125bbd2adf5611db82a1a727545 
>   src/main/python/apache/aurora/executor/BUILD 
> 486230db34a22ea5dd0f68da911c0afb1afbcac0 
> 
> 
> Diff: https://reviews.apache.org/r/67326/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 67326: Update Pants to 1.6.0 and Virtualenv to 16.0.0

2018-06-13 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67326/
---

(Updated June 13, 2018, 9:41 p.m.)


Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.


Summary (updated)
-

Update Pants to 1.6.0 and Virtualenv to 16.0.0


Repository: aurora


Description
---

Beyond a regular version bump, this fixes the build on older versions of MacOS.


Diffs
-

  build-support/jenkins/build.sh a5975398929d01268841fa4c02aa360b309f6114 
  build-support/python/checkstyle-check 
7e65dd97687d3bd5b586cec163f973d08decab6e 
  build-support/thrift/thriftw 26b4f9c8087214222100a46d83b47dbce01a7471 
  build-support/virtualenv d6484f58fbffd33ef61d6052c869c55153ec7313 
  pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
  pants.ini 8c71b144619f437175727dd2027f702ee749df11 
  rbt 7531fcb2ed21d125bbd2adf5611db82a1a727545 
  src/main/python/apache/aurora/executor/BUILD 
486230db34a22ea5dd0f68da911c0afb1afbcac0 


Diff: https://reviews.apache.org/r/67326/diff/1/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Stephan Erb



Re: Review Request 67326: Update Pants to 1.6.0 and Virtualenv to 16.2.0

2018-06-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67326/#review204633
---



Friendly ping :)

- Stephan Erb


On May 25, 2018, 4:58 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67326/
> ---
> 
> (Updated May 25, 2018, 4:58 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Beyond a regular version bump, this fixes the build on older versions of 
> MacOS.
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/build.sh a5975398929d01268841fa4c02aa360b309f6114 
>   build-support/python/checkstyle-check 
> 7e65dd97687d3bd5b586cec163f973d08decab6e 
>   build-support/thrift/thriftw 26b4f9c8087214222100a46d83b47dbce01a7471 
>   build-support/virtualenv d6484f58fbffd33ef61d6052c869c55153ec7313 
>   pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
>   pants.ini 8c71b144619f437175727dd2027f702ee749df11 
>   rbt 7531fcb2ed21d125bbd2adf5611db82a1a727545 
>   src/main/python/apache/aurora/executor/BUILD 
> 486230db34a22ea5dd0f68da911c0afb1afbcac0 
> 
> 
> Diff: https://reviews.apache.org/r/67326/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-25 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203807
---


Ship it!




LGTM! Especially thanks for the thorough documentation.

I have just a bunch of questions below. Feel free to address those as you see 
fit. I am away for the next few days I don't want to block the patch any 
further.


docs/features/sla-requirements.md
Lines 115 (patched)
<https://reviews.apache.org/r/66716/#comment286124>

If we are unsure about some details of the interface, we could consider 
calling it experimental for now.



docs/features/sla-requirements.md
Lines 130 (patched)
<https://reviews.apache.org/r/66716/#comment286236>

Having a GET request with a body is somewhat strange (see 
https://stackoverflow.com/questions/978061/http-get-with-request-body). I am 
wondering if we should make this a POST instead?



docs/features/sla-requirements.md
Lines 131 (patched)
<https://reviews.apache.org/r/66716/#comment286237>

Do we really need the task parameter if everything is encoded in the json 
anyway?



examples/vagrant/systemd/aurora-scheduler.service
Lines 63 (patched)
<https://reviews.apache.org/r/66716/#comment286239>

Is this change related to the patch?



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 237 (patched)
<https://reviews.apache.org/r/66716/#comment286240>

The `preferred` tier exists in the default `tiers.json` but might not be 
available in general. 

Maybe just say that "Tier X does not support SlaPolicy".



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 395-406 (patched)
<https://reviews.apache.org/r/66716/#comment286243>

From a perspective of "single level of abstraction" it looks weird that 
`askCoordinatorThenAct` handles many exceptions but not the 
`InterruptedException`.  

I guess you had a reason for that. What am I missing?



src/main/python/apache/aurora/admin/host_maintenance.py
Line 173 (original), 173 (patched)
<https://reviews.apache.org/r/66716/#comment286244>

Should we already call out the old maintenance mechanism as deprecated in 
the RELEASE-NOTES?


- Stephan Erb


On May 25, 2018, 2:24 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 25, 2018, 2:24 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/operations/configuration.md 85a6fab54e03d52e42ba7d4ff47ab93f5b8293ee 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.j

Re: Review Request 66502: Update python virtualenv

2018-05-25 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66502/#review203882
---



There is an even newer virtualenv version available now. I have filed 
https://reviews.apache.org/r/67326/ as a replacement and will close this patch 
now. Sorry for letting this one go stale.

- Stephan Erb


On April 9, 2018, 11:10 a.m., se choi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66502/
> ---
> 
> (Updated April 9, 2018, 11:10 a.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update python virtualenv
> 
> 
> Diffs
> -
> 
>   pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
> 
> 
> Diff: https://reviews.apache.org/r/66502/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> se choi
> 
>



Review Request 67326: Update Pants to 1.6.0 and Virtualenv to 16.2.0

2018-05-25 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67326/
---

Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.


Repository: aurora


Description
---

Beyond a regular version bump, this fixes the build on older versions of MacOS.


Diffs
-

  build-support/jenkins/build.sh a5975398929d01268841fa4c02aa360b309f6114 
  build-support/python/checkstyle-check 
7e65dd97687d3bd5b586cec163f973d08decab6e 
  build-support/thrift/thriftw 26b4f9c8087214222100a46d83b47dbce01a7471 
  build-support/virtualenv d6484f58fbffd33ef61d6052c869c55153ec7313 
  pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
  pants.ini 8c71b144619f437175727dd2027f702ee749df11 
  rbt 7531fcb2ed21d125bbd2adf5611db82a1a727545 
  src/main/python/apache/aurora/executor/BUILD 
486230db34a22ea5dd0f68da911c0afb1afbcac0 


Diff: https://reviews.apache.org/r/67326/diff/1/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Stephan Erb



Re: Review Request 67077: Remove resource properties from ResourceAggregate

2018-05-24 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67077/#review203818
---


Fix it, then Ship it!




Thanks! Looks good to me overall.

I raise a concern about the place of the validation logic below. I don't feel 
super strongly about this, so I leave this decision up to you.


api/src/main/thrift/org/apache/aurora/gen/api.thrift
Lines 290-298 (original)
<https://reviews.apache.org/r/67077/#comment286135>

Please add a changelog entry that explains which fields were removed from 
the public API.



src/main/java/org/apache/aurora/scheduler/quota/QuotaManager.java
Lines 171-172 (original), 172-176 (patched)
<https://reviews.apache.org/r/67077/#comment286131>

The previous check ensured _all_ resources were non-negative. The new check 
only ensures that no negative value is in the list. It could however be the 
case that one is missing entirely.

Thinking about it, this is probably prevented by the validation in the 
ThriftBackfill class.



src/main/java/org/apache/aurora/scheduler/storage/durability/ThriftBackfill.java
Lines 127-152 (original), 114-127 (patched)
<https://reviews.apache.org/r/67077/#comment286133>

Technically we are no longer doing a backfill of deprecated fields. Ideally 
we would move the validation to somwhere else now (e.g. 
`QuotaManagerImpl.saveQuota`).

I am happy about the refactoring overall. It is therefore fine with me to 
let this one slide.


- Stephan Erb


On May 11, 2018, 5:18 a.m., Jing Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67077/
> ---
> 
> (Updated May 11, 2018, 5:18 a.m.)
> 
> 
> Review request for Aurora, Renan DelValle and Stephan Erb.
> 
> 
> Bugs: AURORA-1975
> https://issues.apache.org/jira/browse/AURORA-1975
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Remove resource properties from ResourceAggregate
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   src/main/java/org/apache/aurora/scheduler/http/Quotas.java 
> aa68c446af7cf52f37cb72d0f115b1be39457988 
>   src/main/java/org/apache/aurora/scheduler/quota/QuotaManager.java 
> e2750d7343a1ed83feb9f96014e9e61dc6dda1f0 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/ThriftBackfill.java
>  4425d025b92425b19eead30ceac0bc2466bc606a 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/config/resource.py 
> b2ebd399ccf069b9914ac4bc624685b2cc3679b0 
>   src/test/java/org/apache/aurora/scheduler/resources/ResourceTestUtil.java 
> 567586fcfb4dda58dc6308fe3d7fe2b7b433db98 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  3dd9ce4039b223cb6156462d089f7062a1cde772 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/ThriftBackfillTest.java
>  219576baf331c44535a4e2f95fde5c906bdabe4f 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  2cf66d8154ad3795989ee9026e45af1be509f244 
>   src/test/python/apache/aurora/admin/test_admin.py 
> ebe89b504be57a4258eeea7e3edd053daaa30ce9 
>   src/test/python/apache/aurora/client/cli/test_quota.py 
> b56629671035ce6bf1c3bf45c9adeb2c4f108ee8 
>   src/test/python/apache/aurora/config/test_resources.py 
> 3ac54909991a123b73e00c59d7269431036cda06 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveQuota
>  6b0d8003a23b9a0e9de78eed42aed14c6b7dd492 
> 
> 
> Diff: https://reviews.apache.org/r/67077/diff/2/
> 
> 
> Testing
> ---
> 
> ./pants test src/test/python/apache/aurora/config
> ./pants test src/test/python/apache/aurora/client/cli:cli
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Jing Chen
> 
>



Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-21 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/#review203528
---


Ship it!




Ship It!

- Stephan Erb


On May 17, 2018, 4:07 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67141/
> ---
> 
> (Updated May 17, 2018, 4:07 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1977
> https://issues.apache.org/jira/browse/AURORA-1977
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add `SlaPolicy` and `HostMaintenanceRequest` structs
> to the thrift definition and introduce a new `HostMaintenanceStore`
> for tracking maintenance requests. These changes will be used in
> https://reviews.apache.org/r/66716 for implementing custom SLA
> and scheduler driven maintenance.
> 
> This RB splits the storage related changes from 
> https://reviews.apache.org/r/66716
> for better rollback story.
> 
> Tested rollback on the vagrant.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
>  PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
>  31f9545d83a950064df646ef6ba8a95234cf89ec 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  3dd9ce4039b223cb6156462d089f7062a1cde772 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
>  27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
> be07361a27afefa21cc2ba76ce82531a418d9814 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
>  PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  d59118be13342da9003b0bcb97e12e477d9edf8f 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> e4f43d0573c7862adc9bc679f4cea40cc76eac38 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
>  88e1c36a1aa2d192b95963f7aa36e243a447e4af 
>   
> src/test/resources/

Re: Review Request 66192: [WIP] Variable group size updates

2018-05-16 Thread Stephan Erb
checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:109:50:
> >  '-' is not followed by whitespace. [WhitespaceAround]
> > [ant:checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:117:7:
> >  'if' is not followed by whitespace. [WhitespaceAround]
> > [ant:checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:127:
> >  Line is longer than 100 characters (found 101). [LineLength]
> >  FAILED
> > 
> > FAILURE: Build failed with an exception.
> > 
> > * What went wrong:
> > Execution failed for task ':checkstyleMain'.
> > > Checkstyle rule violations were found. See the report at: 
> > > file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html
> > 
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or 
> > --debug option to get more log output.
> > 
> > * Get more help at https://help.gradle.org
> > 
> > BUILD FAILED in 5m 13s
> > 32 actionable tasks: 26 executed, 6 up-to-date
> > 
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"
> 
> Santhosh Kumar Shanmugham wrote:
> Can you clean up the style issues?
> 
> Also, will this be configurable via the pystachio config?
> 
> Renan DelValle wrote:
> Definitely, just wanted to get the latest version out to get feedback on 
> the Thrift Schema changes. This will indeed be configurable via pystachio, 
> just wanted to settle on the Thrift Schema before making those changes.
> 
> Santhosh Kumar Shanmugham wrote:
> Ack.

The changes in `api.thrift` look good to me!


- Stephan


-----------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review203093
---


On May 15, 2018, 4:19 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated May 15, 2018, 4:19 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66192: [WIP] Variable group size updates

2018-05-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review203292
---




src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
Lines 784-786 (original), 788-828 (patched)
<https://reviews.apache.org/r/66192/#comment285401>

Doing such a conversion and validation in `SchedulerThriftInterface` will 
only adapt newly submitted tasks. Tasks read from storage will still have the 
old configuration layout - this adds unnecessary complexity.

Please have a look at the `validateAndPopulate` method and the 
`ThriftBackfill` class used in there for guidance of how we normally handle 
this.



src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java
Lines 122 (patched)
<https://reviews.apache.org/r/66192/#comment285402>

With the current backfill mechanism, this will break for tasks read from 
storage.


- Stephan Erb


On May 15, 2018, 4:19 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated May 15, 2018, 4:19 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-16 Thread Stephan Erb


> On May 16, 2018, 10:22 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
> > Lines 257-258 (patched)
> > <https://reviews.apache.org/r/67141/diff/1/?file=2023580#file2023580line257>
> >
> > If we use a feature toggle here operators can enable the backwards 
> > incompatible change after they have vetted the release (i.e. once they are 
> > sure they don't need to do a rollback for unrelated issues).
> > 
> > We can then simply enable the feature toggle by default after next 
> > release.
> > 
> > @Jordan Ly, would this address your backwards incompatibility concerns?
> 
> Jordan Ly wrote:
> I realize that they can just remove all `HostMaintenanceRequest` and 
> perform a snapshot if they want to rollback to the previous version (kinda 
> like what they did for 0.14 GPU resources 
> https://github.com/apache/aurora/blob/master/RELEASE-NOTES.md#0140). I don't 
> have any concerns going forward with what is present.

Ah good catch!


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/#review203269
---


On May 15, 2018, 11:15 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67141/
> ---
> 
> (Updated May 15, 2018, 11:15 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1977
> https://issues.apache.org/jira/browse/AURORA-1977
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add `SlaPolicy` and `HostMaintenanceRequest` structs
> to the thrift definition and introduce a new `HostMaintenanceStore`
> for tracking maintenance requests. These changes will be used in
> https://reviews.apache.org/r/66716 for implementing custom SLA
> and scheduler driven maintenance.
> 
> This RB splits the storage related changes from 
> https://reviews.apache.org/r/66716
> for better rollback story.
> 
> Tested rollback on the vagrant.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
>  PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
>  31f9545d83a950064df646ef6ba8a95234cf89ec 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  3dd9ce4039

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203276
---



I have done a first quick pass. I will have a second closer look once the 
storage patch has landed.


src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 74 (patched)
<https://reviews.apache.org/r/66716/#comment285381>

I think we should make this value configurable for operators.



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 248-259 (patched)
<https://reviews.apache.org/r/66716/#comment285385>

Both this validation and the other `instance`-based validation below have a 
conceptual problem: Users can freely adjust the number of available instances 
via `kill` and `addInstance`. This implies that the policy can become invalid 
long after we have checked it here.

I think the most robust way would be if we just print warnings in the 
client, and change the scheduler so that it can ignore invalid policies, 
similar to how it ignores them after a timeout.



src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
Lines 276 (patched)
<https://reviews.apache.org/r/66716/#comment285383>

`"CountSlaPolicy: count=5 must be less than 3"` can be hard to understand 
as it lacks context. 

If you add that the second number refers to the number of instances, it it 
will be easier to users to reason about the error.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 280 (patched)
<https://reviews.apache.org/r/66716/#comment285399>

Would you be open to the idea of passing the host and (health?) port of the 
instance here as well? I have a usecase in mind that would be simplified by 
this quite a bit as I can query a running instance for additional information 
without having to query the the service discover ZK first.

In addition, have you considered just passing each information individually 
(job, environment, role,...) rather than as a joined string? Many usescases 
will probably require string splitting within the coordinator otherwise.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 303 (patched)
<https://reviews.apache.org/r/66716/#comment285393>

What usecase do you have in mind for the statuskey?

I am mostly wondering if a protocol purely based on HTTP status codes would 
be sufficient: e.g. `200 OK` means ready to drain and `428 PRECONDITION 
REQUIRED` asks us to come back again later.



src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
Lines 384 (patched)
<https://reviews.apache.org/r/66716/#comment285387>

isProduction is deprecated. You will need to check the appropriate tier 
config here.


- Stephan Erb


On May 15, 2018, 7:16 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 15, 2018, 7:16 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c

Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/#review203269
---



In general, this looks good to me.


api/src/main/thrift/org/apache/aurora/gen/api.thrift
Lines 263 (patched)
<https://reviews.apache.org/r/67141/#comment285372>

Just by looking at this it is not clear what `statusKey` means. Maybe you 
can extend the doc string?



api/src/main/thrift/org/apache/aurora/gen/api.thrift
Lines 901 (patched)
<https://reviews.apache.org/r/67141/#comment285373>

In most other places in the API timestamps are called out as such. Maybe 
for consistency we should rename `creationTimeMs` to `createdTimestampMs` (e.g. 
as in `JobUpdateState`)? 

(I don't feel strongly about this though)



src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java
Lines 57-65 (original), 58-66 (patched)
<https://reviews.apache.org/r/67141/#comment285375>

This is missing the `HostMaintenance` store



src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
Lines 257-258 (patched)
<https://reviews.apache.org/r/67141/#comment285378>

If we use a feature toggle here operators can enable the backwards 
incompatible change after they have vetted the release (i.e. once they are sure 
they don't need to do a rollback for unrelated issues).

We can then simply enable the feature toggle by default after next release.

@Jordan Ly, would this address your backwards incompatibility concerns?


- Stephan Erb


On May 15, 2018, 11:15 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67141/
> ---
> 
> (Updated May 15, 2018, 11:15 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1977
> https://issues.apache.org/jira/browse/AURORA-1977
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add `SlaPolicy` and `HostMaintenanceRequest` structs
> to the thrift definition and introduce a new `HostMaintenanceStore`
> for tracking maintenance requests. These changes will be used in
> https://reviews.apache.org/r/66716 for implementing custom SLA
> and scheduler driven maintenance.
> 
> This RB splits the storage related changes from 
> https://reviews.apache.org/r/66716
> for better rollback story.
> 
> Tested rollback on the vagrant.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
>  PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> ba03ff94bb5f

Re: Review Request 66186: Upgrade to psutil with optimized Process.children()

2018-04-12 Thread Stephan Erb


> On April 12, 2018, 1:31 a.m., Reza Motamedi wrote:
> > Did you notice any real performance improvements?

No, not really :-/. But I guess that updating still does not harm.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66186/#review200944
---


On April 12, 2018, 12:13 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66186/
> ---
> 
> (Updated April 12, 2018, 12:13 a.m.)
> 
> 
> Review request for Aurora, Reza Motamedi and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The changelog claims: `Process.children() is 2x faster on UNIX and 2.4x 
> faster on Linux.`
> 
> This is needed for all stats retrieved via `ProcessTreeCollector`. An update 
> therefore
> seems worthwhile. 
> 
> https://github.com/giampaolo/psutil/blob/master/HISTORY.rst
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py 
> 93ff878be578fa7a63d25b65e7d915790dc9ccc6 
> 
> 
> Diff: https://reviews.apache.org/r/66186/diff/1/
> 
> 
> Testing
> ---
> 
> Successfully verified in vagrant that CPU and memory are reported as expected.
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66573: Add initial interval before searching for preemption slots

2018-04-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66573/#review200974
---


Ship it!




Ship It!

- Stephan Erb


On April 12, 2018, 3:05 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66573/
> ---
> 
> (Updated April 12, 2018, 3:05 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Between failovers, tasks that normally would not require preemption could be 
> in a PENDING state for an extended period of time and become eligible for 
> preemption. Thus, when the scheduler starts, offers could not have been 
> processed yet and the tasks can preempt other tasks needlessly.
> 
> Added an initial delay to preemption slot searching on scheduler startup so 
> PENDING tasks have a chance to be scheduled before preempting.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
> 7618efc2c0cb46e96119accd2c7962ea8ee7a05e 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 0a16d3c95d3f262686936330ac7d7dc332d759d5 
> 
> 
> Diff: https://reviews.apache.org/r/66573/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> 
> Will deploy on a cluster to ensure preemption does not start for the initial 
> interval.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 66186: Upgrade to psutil with optimized Process.children()

2018-04-11 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66186/
---

(Updated April 12, 2018, 12:13 a.m.)


Review request for Aurora, Reza Motamedi and Santhosh Kumar Shanmugham.


Repository: aurora


Description
---

The changelog claims: `Process.children() is 2x faster on UNIX and 2.4x faster 
on Linux.`

This is needed for all stats retrieved via `ProcessTreeCollector`. An update 
therefore
seems worthwhile. 

https://github.com/giampaolo/psutil/blob/master/HISTORY.rst


Diffs
-

  3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
  src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py 
93ff878be578fa7a63d25b65e7d915790dc9ccc6 


Diff: https://reviews.apache.org/r/66186/diff/1/


Testing (updated)
---

Successfully verified in vagrant that CPU and memory are reported as expected.


Thanks,

Stephan Erb



Re: Review Request 66502: Update python virtualenv

2018-04-11 Thread Stephan Erb


> On April 10, 2018, 11:37 a.m., Stephan Erb wrote:
> > Thanks for the patch! We also have another virtulenv version specified 
> > here: 
> > https://github.com/apache/aurora/blob/master/build-support/virtualenv#L17. 
> > Do you think you can bump that one as well?
> 
> se choi wrote:
> Homebrew has 5.2.0, wondering if it will work in 5.1.0?

Aurora will sometimes need a virtualenv and it will use that script above to 
download the virtualenv in the specified version. Updating to 15.2.0 should be 
safe. Homebrew should not be concerned.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66502/#review200803
---


On April 9, 2018, 11:10 a.m., se choi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66502/
> ---
> 
> (Updated April 9, 2018, 11:10 a.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update python virtualenv
> 
> 
> Diffs
> -
> 
>   pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
> 
> 
> Diff: https://reviews.apache.org/r/66502/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> se choi
> 
>



Re: Review Request 66536: Add more preemption metrics (jobs preempted, preemptors) and logging statements

2018-04-11 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66536/#review200925
---


Ship it!




Looks useful, thanks!


src/main/java/org/apache/aurora/scheduler/preemptor/Preemptor.java
Line 86 (original), 88 (patched)
<https://reviews.apache.org/r/66536/#comment281825>

Witht the removal of the iterator this is a bit outdated.


- Stephan Erb


On April 11, 2018, 12:47 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66536/
> ---
> 
> (Updated April 11, 2018, 12:47 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Added additional metrics:
> ```
> 1. preemptor_tasks_preempted_[JOB_NAME] - The number of times [JOB_NAME] has 
> been preempted for another task.
> 2. preemptor_tasks_preemptor_[JOB_NAME] - The number of times [JOB_NAME] has 
> preempted another task.
> 3. preemptor_slot_search_[success|failed]_for_[JOB_NAME] - The number of 
> times [JOB_NAME] has or hasn't found a slot for preemption.
> 4. preemptor_slot_validation_[success|failed]_for_[JOB_NAME] - The number of 
> times [JOB_NAME] succeeded to or failed to validate a slot before preemption.
> ```
> 
> Additionally, added some `LOG.info` statements for better visibility into 
> preemption/preemption slot finding.
> 
> Did a little bit of code refactoring as well.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java 
> ef06471d007b1d36300eea30cdea059c1ba231b0 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java
>  569cfe6b04e6b7bf0dca7625b00698e9d8e47daf 
>   src/main/java/org/apache/aurora/scheduler/preemptor/Preemptor.java 
> 293d106eee383dd5352a629780b897d58c9dd439 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorMetrics.java 
> 87305774db0ce6fb7ebed060ab4dc99be6c2df4c 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImpl.java 
> edab03dfd7fdbb24891565ba755212f03d6ea3b8 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorTest.java
>  ba775f4688dc57504e2def0dc4b5dcd00da448e1 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  b3ffb0d4fc9b9b52bb49225765bd14fb8105169a 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 0ef29d598784ce529bcaac7017dc0f2cc5055938 
> 
> 
> Diff: https://reviews.apache.org/r/66536/diff/1/
> 
> 
> Testing
> ---
> 
> Added unit tests, `./gradlew test` passes.
> Manually ensured new metrics are exported.
> Tested at scale.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 66502: Update python virtualenv

2018-04-10 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66502/#review200803
---



Thanks for the patch! We also have another virtulenv version specified here: 
https://github.com/apache/aurora/blob/master/build-support/virtualenv#L17. Do 
you think you can bump that one as well?

- Stephan Erb


On April 9, 2018, 11:10 a.m., se choi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66502/
> ---
> 
> (Updated April 9, 2018, 11:10 a.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update python virtualenv
> 
> 
> Diffs
> -
> 
>   pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
> 
> 
> Diff: https://reviews.apache.org/r/66502/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> se choi
> 
>



Re: Review Request 66491: Chaning default reviewers as well as reflecting the new address of gorealis.

2018-04-07 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66491/#review200700
---


Ship it!




Ship It!

- Stephan Erb


On April 7, 2018, 2:18 vorm., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66491/
> ---
> 
> (Updated April 7, 2018, 2:18 vorm.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Chaning default reviewers as well as reflecting the new address of gorealis.
> 
> 
> Diffs
> -
> 
>   CONTRIBUTING.md bdd27822e265bbd43d0d8378559b6b3080210fd9 
>   docs/additional-resources/tools.md 678e5504d00c657e563437032e7210ac1e8cd227 
> 
> 
> Diff: https://reviews.apache.org/r/66491/diff/1/
> 
> 
> Testing
> ---
> 
> Doc change only.
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66482: Fixing issue where OpenJDK PPA would not install in Ubuntu Trusty in packaging tests.

2018-04-06 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66482/#review200649
---


Ship it!




Ship it!

Btw, what is our plan for trusty? The end-of-live is near 
https://www.ubuntu.com/info/release-end-of-life, so we could ask on the 
mailinglist who is using it and stop building packages for it after 0.20.

- Stephan Erb


On April 6, 2018, 1:58 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66482/
> ---
> 
> (Updated April 6, 2018, 1:58 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Repository: aurora-packaging
> 
> 
> Description
> ---
> 
> * Reinstalled certificates in Ubuntu trusty upon provisioning in order to fix 
> an issue where the OpenJDK PPA could not be installed.
> * Updated RBT to lastest version 0.7.11
> 
> 
> Diffs
> -
> 
>   rbt 5b94e53357e8934df040e603efc9ad2378ff6575 
>   test/deb/ubuntu-trusty/provision.sh 
> 2d886bf3e3e1e46870e87df8eec7c2018f5bd4e2 
> 
> 
> Diff: https://reviews.apache.org/r/66482/diff/1/
> 
> 
> Testing
> ---
> 
> Tested Trusty packages using the modified provisioning script.
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66186: Upgrade to psutil with optimized Process.children()

2018-03-31 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66186/#review200265
---



@ReviewBot retry

- Stephan Erb


On March 21, 2018, 12:15 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66186/
> ---
> 
> (Updated March 21, 2018, 12:15 a.m.)
> 
> 
> Review request for Aurora, Reza Motamedi and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The changelog claims: `Process.children() is 2x faster on UNIX and 2.4x 
> faster on Linux.`
> 
> This is needed for all stats retrieved via `ProcessTreeCollector`. An update 
> therefore
> seems worthwhile. 
> 
> https://github.com/giampaolo/psutil/blob/master/HISTORY.rst
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py 
> 93ff878be578fa7a63d25b65e7d915790dc9ccc6 
> 
> 
> Diff: https://reviews.apache.org/r/66186/diff/1/
> 
> 
> Testing
> ---
> 
> TODO: I still want to double check this on a real running observer
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66269: End to end tests misc. fixes

2018-03-26 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66269/#review199955
---


Ship it!




*** OK (All tests passed) ***

Great job!

- Stephan Erb


On March 26, 2018, 4:59 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66269/
> ---
> 
> (Updated March 26, 2018, 4:59 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Stephan Erb.
> 
> 
> Bugs: AURORA-1974
> https://issues.apache.org/jira/browse/AURORA-1974
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Excluding kerberos unit file from being copied on provision as it's later 
> copied and deleted by the end to end test.
> 
> Bypass leader redirect changed from upstart to systemd. This test wasn't 
> being run because the kerberos test was failing.
> 
> Fixing kerberos end to end test. Previous version had it's signing key 
> revoked resulting in the test failing.
> 
> Chaning docker image to slim-stretch in docker aurora tests to address 
> AURORA-1974.
> 
> Added daemon-reload to aurorabuild whenever the daemons are restarted.
> 
> 
> Diffs
> -
> 
>   examples/jobs/hello_docker_engine.aurora 
> 99d99a26844f2f2f473626b16cfbf91aa70031ff 
>   examples/jobs/hello_docker_image.aurora 
> 049a147749876f795636827ea5e5485fa72a0930 
>   examples/vagrant/aurorabuild.sh c39388f46ea4718117889a5c67aec9afcc7f5d2e 
>   examples/vagrant/provision-dev-cluster.sh 
> fe3281f6b1f6adee021e534b230221efb86a5d3c 
>   examples/vagrant/systemd/aurora-scheduler-kerberos.service 
> 10e4f2c355c10b8204518af0a49c9d90be4f6ef8 
>   src/test/sh/org/apache/aurora/e2e/test_bypass_leader_redirect_end_to_end.sh 
> 5c0f12b56a30eef35c1903d5f4a96591d3c74471 
>   src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
> 646c213ea105e32f2d37df29832aa1009481b6d1 
> 
> 
> Diff: https://reviews.apache.org/r/66269/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66103: Introduce mesos disk collector

2018-03-25 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199934
---


Ship it!




Ship It!

- Stephan Erb


On March 23, 2018, 6:39 p.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> ---
> 
> (Updated March 23, 2018, 6:39 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Franck Cuny, 
> Jordan Ly, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   RELEASE-NOTES.md 51ab6c724694244bf616b29e9beace4a4a3f5252 
>   docs/reference/observer-configuration.md 
> 8a443c94f7f37f9454989781f722101a97c99f15 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d3ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> e577e552d4ee1807096a15401851bb9fd95fa426 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/10/
> 
> 
> Testing
> ---
> 
> - I added unit tests.
> - Tested in vagrant and it works as intenced.
> - I also built and deployed in our test enviroment. In order to measure 
> imporoved performance I created jobs with nested folders and noticed 
> reduction in CPU utilization of the Observer process, by at least 60%. (1.5 
> CPU cores to 0.4 CPU cores)
> 
> Here is one specific test setup: On two hosts I created a two tasks. Each 
> task creates identical nested directory structures and files in them. The 
> overall size is 30GB. test_host_1 runs the current version of observer and 
> test_host_2 runs Observer with this patch and also has mesos_disk_collection 
> enabled. The results are as follows:
> 
> ```
> rezam[7]TEST_HOST_1 ~ $ while true; do echo `date`; curl localhost:1338/vars 
> -s | grep cpu; sleep 10; done
> Thu Mar 22 04:36:17 UTC 2018
> observer.observer_cpu 108.9
> Thu Mar 22 04:36:27 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:38 UTC 2018
> observer.observer_cpu 123.2
&g

Re: Review Request 66192: [WIP] Variable group size updates

2018-03-23 Thread Stephan Erb


> On March 21, 2018, 10:33 p.m., Jordan Ly wrote:
> > I am mostly concerned about the UX. Users will be able to specify both 
> > batch size and variable batch size and must know that variable batch sizing 
> > takes precedent over other strategies.
> > 
> > Is it worth it to make a larger investment into the Thrift interface and 
> > avoid ambiguity? Or refactor the current batching strategy to use the new 
> > variable codepath (a single batch size specified to the variable strategy 
> > should behave the same as the current implementation).
> 
> Santhosh Kumar Shanmugham wrote:
> +1 I think it should be an either-or. There should be logic in the API to 
> clearly message this case.
> 
> Renan DelValle wrote:
> I've thought about refactoring the batching strategy, and I'm willing to 
> travel down that path as the current batch stategy is a specific case (single 
> step, repeated over and over as Jordan pointed out) of the functionality that 
> I'm trying to achieve with this patch but I'll have to do something like wrap 
> around a single item list which might look funky. I'll give it a shot.
> 
> Changing the Thrift interface is always tricky because it almost always 
> results in some yak shaving but if the consensus is that this is the better, 
> more clear approach, then I'm game to implement it that way.
> 
> Appreciate all the feedback!

Just an idea: Would it help to cap `step` at the length of `maxActive`, so that 
the last batch size is used for all remaining instances? This would also give 
us a somewhat nice deprecation path for the existing fixed batch size.

In addition, I believe you will need to account for that case anyway. A user 
might accidentally have a list of batches whose sum is smaller than the 
instance count of the update. We need to make sure that either the update 
request gets reject at submission or that it runs through properly. The 
proposal above would allow the update to run through.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review199707
---


On March 21, 2018, 3:10 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated March 21, 2018, 3:10 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
> b25f3831cecc58c90375c90b16142421f8f09e38 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java 
> 9e86b9e276ea90a249284a824705b5bbf19dcbce 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  f8be8058f3a80a18b999d2666e2adb33e1e55fef 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc30

Re: Review Request 66103: Introduce mesos disk collector

2018-03-23 Thread Stephan Erb


> On March 22, 2018, 11:31 p.m., Stephan Erb wrote:
> > src/test/python/apache/thermos/monitoring/BUILD
> > Lines 21 (patched)
> > <https://reviews.apache.org/r/66103/diff/7/?file=1985807#file1985807line21>
> >
> > Requests has a few dependencies. I believe you need to list those here 
> > as well in order to ensure pants pulls in the correct versions.
> 
> Reza Motamedi wrote:
> I have built observer with this patch a couple of times on Jenkins boxes 
> as well. 
> 
> 
> Are you referring to requests-kerberos and requests-mock? I will include 
> them here as well.

Please see this review https://reviews.apache.org/r/64382/. The `requests` 
package depends on `certifi`, `chardet`, `idna`, and `urllib3` and for proper 
pinning of transitive dependencies we must list those as well.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199812
---


On March 23, 2018, 2:11 a.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> ---
> 
> (Updated March 23, 2018, 2:11 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Franck Cuny, 
> Jordan Ly, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   RELEASE-NOTES.md 51ab6c724694244bf616b29e9beace4a4a3f5252 
>   docs/reference/observer-configuration.md 
> 8a443c94f7f37f9454989781f722101a97c99f15 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d3ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> e577e552d4ee1807096a15401851bb9fd95fa426 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/9/
> 
> 
> Testing
> ---
> 
> - I added unit tests.
> - Tested in vagrant and it works as intenced.
> - I also built and deployed in our test enviroment. In order to measure 
> imporoved performance I created jobs 

Re: Review Request 66103: Introduce mesos disk collector

2018-03-22 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199812
---



Querying information from Mesos rather than re-doing it in Thermos is a step 
into the right direction. Overall this looks good to me. Below are just a few 
nitpicks.


src/main/python/apache/thermos/monitoring/disk.py
Lines 99-105 (patched)
<https://reviews.apache.org/r/66103/#comment280280>

Please mark the attributes that are only used internally with `_` so that 
it is easier to understand which parts are part of the public interface.



src/main/python/apache/thermos/monitoring/disk.py
Lines 112-114 (patched)
<https://reviews.apache.org/r/66103/#comment280281>

The indentation here and in a few other places is slightly off. Aurora 
tends to follow this style guide here 
https://github.com/twitter/commons/blob/master/src/python/twitter/common/styleguide.md#best-practices



src/main/python/apache/thermos/monitoring/disk.py
Lines 174 (patched)
<https://reviews.apache.org/r/66103/#comment280284>

Should we do the jmespath compilation here? If there is an error it will 
throw early and only once.



src/test/python/apache/thermos/monitoring/BUILD
Lines 21 (patched)
<https://reviews.apache.org/r/66103/#comment280283>

Requests has a few dependencies. I believe you need to list those here as 
well in order to ensure pants pulls in the correct versions.


- Stephan Erb


On March 22, 2018, 10:52 p.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> ---
> 
> (Updated March 22, 2018, 10:52 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Franck Cuny, 
> Jordan Ly, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   RELEASE-NOTES.md 51ab6c724694244bf616b29e9beace4a4a3f5252 
>   docs/reference/observer-configuration.md 
> 8a443c94f7f37f9454989781f722101a97c99f15 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d3ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438

Re: Review Request 66190: Fix 'PreemptorSlotSearchBenchmark', remove 'isProduction' references in benchmark

2018-03-22 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66190/#review199774
---


Ship it!




Ship It!

- Stephan Erb


On March 21, 2018, 9:23 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66190/
> ---
> 
> (Updated March 21, 2018, 9:23 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This benchmark was using the deprecated `production` flag when building the 
> tasks for the cluster. `PendingTaskProcessor` depends on `tier` instead, so 
> this benchmark ended up not testing the correct codepath.
> 
> Removed references to `production` and added `tier` instead. Additionally, 
> removed some unused options.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/BenchmarkSettings.java 
> ddab2eccb22a93ecb67481f399707d2d82df5db2 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 1f9a5764b502f939f0345ff99fb0fc2830b4c2f0 
>   src/jmh/java/org/apache/aurora/benchmark/Tasks.java 
> 60c62bbf3061650a5dd8654045dc8189293d0190 
> 
> 
> Diff: https://reviews.apache.org/r/66190/diff/4/
> 
> 
> Testing
> ---
> 
> Old:
> ```
> # Run complete. Total time: 00:08:32
> 
> Benchmark 
>(numPendingTasks)   Mode  Cnt Score Error  
>  Units
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>1  thrpt   1057.670 ±  20.451  
>  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.alloc.rate 
>1  thrpt   10   595.374 ± 211.805  
> MB/sec
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.alloc.rate.norm
>1  thrpt   10  10830342.916 ± 380.919
> B/op
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Eden_Space
>1  thrpt   10   593.530 ± 222.002  MB/sec
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Eden_Space.norm
>   1  thrpt   10  10717947.102 ± 1280229.296B/op
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Survivor_Space
>1  thrpt   10 0.305 ±   1.264  MB/sec
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Survivor_Space.norm
>   1  thrpt   10 13552.434 ±   61403.918B/op
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.count  
>1  thrpt   1060.000
> counts
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.time   
>1  thrpt   10   202.000
> ms
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·stack 
>1  thrptNaN
>---
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
>   10  thrpt   1052.161 ±   8.526  
>  ops/s
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.alloc.rate 
>   10  thrpt   10   550.771 ±  89.939  
> MB/sec
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.alloc.rate.norm
>   10  thrpt   10  11074211.352 ± 318.376
> B/op
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Eden_Space
>   10  thrpt   10   550.125 ± 107.470  MB/sec
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Eden_Space.norm
>  10  thrpt   10  11073792.311 ± 1621636.993B/op
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Survivor_Space
>   10  thrpt   10 0.038 ±   0.049  MB/sec
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark:·gc.churn.PS_Survivor_Space.norm
>  10  thrpt   10   737.753 ± 919.460B/op
> SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runB

Re: Review Request 66199: Remove unused LOST_LOCK_MESSAGE variable in JobUpdateControllerImpl

2018-03-22 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66199/#review199771
---


Ship it!




Ship It!

- Stephan Erb


On March 21, 2018, 10:38 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66199/
> ---
> 
> (Updated March 21, 2018, 10:38 p.m.)
> 
> 
> Review request for Aurora, Renan DelValle and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We no longer use locks for updates (context: 
> https://reviews.apache.org/r/63130/). This was a legacy variable.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  f8be8058f3a80a18b999d2666e2adb33e1e55fef 
> 
> 
> Diff: https://reviews.apache.org/r/66199/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 66186: Upgrade to psutil with optimized Process.children()

2018-03-20 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66186/#review199603
---




src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py
Line 22 (original), 22 (patched)
<https://reviews.apache.org/r/66186/#comment279928>

`process_iter` used to be an implementation detail of `Process.children()` 
but got removed in https://github.com/giampaolo/psutil/pull/1185/files


- Stephan Erb


On March 21, 2018, 12:15 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66186/
> ---
> 
> (Updated March 21, 2018, 12:15 a.m.)
> 
> 
> Review request for Aurora, Reza Motamedi and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The changelog claims: `Process.children() is 2x faster on UNIX and 2.4x 
> faster on Linux.`
> 
> This is needed for all stats retrieved via `ProcessTreeCollector`. An update 
> therefore
> seems worthwhile. 
> 
> https://github.com/giampaolo/psutil/blob/master/HISTORY.rst
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py 
> 93ff878be578fa7a63d25b65e7d915790dc9ccc6 
> 
> 
> Diff: https://reviews.apache.org/r/66186/diff/1/
> 
> 
> Testing
> ---
> 
> TODO: I still want to double check this on a real running observer
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Review Request 66186: Upgrade to psutil with optimized Process.children()

2018-03-20 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66186/
---

Review request for Aurora, Reza Motamedi and Santhosh Kumar Shanmugham.


Repository: aurora


Description
---

The changelog claims: `Process.children() is 2x faster on UNIX and 2.4x faster 
on Linux.`

This is needed for all stats retrieved via `ProcessTreeCollector`. An update 
therefore
seems worthwhile. 

https://github.com/giampaolo/psutil/blob/master/HISTORY.rst


Diffs
-

  3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
  src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py 
93ff878be578fa7a63d25b65e7d915790dc9ccc6 


Diff: https://reviews.apache.org/r/66186/diff/1/


Testing
---

TODO: I still want to double check this on a real running observer


Thanks,

Stephan Erb



Re: Review Request 66139: Speedup regular Thermos observer checkpoint refresh

2018-03-20 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66139/
---

(Updated March 20, 2018, 11:41 p.m.)


Review request for Aurora, Jordan Ly, Renan DelValle, and Reza Motamedi.


Changes
---

Optimize path detection without changing or removing existing code of 
`ExecutorDetector`.


Repository: aurora


Description (updated)
---

Profiling indicates that a significant part of the refresh time os spend in 
`os.path.realpath`. 
This was introduced in https://reviews.apache.org/r/35580/ to properly handle 
the `latest`
symlink in the Mesos folder layout. 

This patch takes a slightly different approach to solve this problem based on 
`os.path.islink`.
The latter is faster as it just needs to look at a single folder rather than an 
entire path.


Diffs (updated)
-

  src/main/python/apache/aurora/executor/common/path_detector.py 
ed264d74ef5a5a7aa681a56b340f9b16504a88ad 
  src/test/python/apache/aurora/executor/common/test_path_detector.py 
7b5ef0cf552d22d4cfbf3357071de036551026dc 


Diff: https://reviews.apache.org/r/66139/diff/2/

Changes: https://reviews.apache.org/r/66139/diff/1-2/


Testing (updated)
---

I have tested this build on a node with 55 running tasks and 2004 finished ones.

Before this patch:

D0320 22:20:44.887248 25771 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.92s
D0320 22:20:50.746316 25771 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.93s
D0320 22:20:56.590157 25771 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.89s
 
With this patch:

D0320 22:18:53.545236 16250 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.48s
D0320 22:18:59.031919 16250 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.49s
D0320 22:19:04.512358 16250 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.48s


Thanks,

Stephan Erb



Re: Review Request 66139: Speedup regular Thermos observer checkpoint refresh

2018-03-20 Thread Stephan Erb


> On March 20, 2018, 9:20 p.m., Reza Motamedi wrote:
> > src/main/python/apache/aurora/executor/common/executor_detector.py
> > Lines 28-85 (original), 26-38 (patched)
> > <https://reviews.apache.org/r/66139/diff/1/?file=1982446#file1982446line29>
> >
> > We are using the functionalities here. Is the performance tunning 
> > related to this code delete?

Oh interesting. Do you use all removed functionality? 

The performance improvement is only partially related. There are cycles wasted 
in the translation from `path` to the parsing result of `ScanfParser` and back 
again, but the dominant factor is the removed `realpath` in `path_detector.py`.

I should be able to rewrite the patch so that `executor_detector.py` remains 
unchanged.


> On March 20, 2018, 9:20 p.m., Reza Motamedi wrote:
> > src/main/python/apache/aurora/executor/common/path_detector.py
> > Lines 31-33 (original), 31-33 (patched)
> > <https://reviews.apache.org/r/66139/diff/1/?file=1982447#file1982447line31>
> >
> > Curious if this generator pattern is really useful. Don't we just 
> > consume everything it generates?

We need the result of `os.path.join(path, self._sandbox_path)` as a variable, 
in order to both return it and check via `os.path.exists(path)`. An alternative 
would be to use an explicit for loop here.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66139/#review199564
-----------


On March 20, 2018, 12:22 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66139/
> ---
> 
> (Updated March 20, 2018, 12:22 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Renan DelValle, and Reza Motamedi.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Profiling indicated that roughly 70% of the refresh time was spend in 
> `os.path.realpath`. 
> This was introduced in https://reviews.apache.org/r/35580/ to properly handle 
> the `latest`
> symlink in the Mesos folder layout. 
> 
> This patch takes a slightly different approach to solve this problem based on 
> `os.path.islink`.
> The latter is faster as it just needs to look at a single folder rather than 
> an entire path.
> 
> To facilitate this optimization the patch removes unused functionality from 
> ExecutorDetector.
> Most probably this  was used by former GC executor that got removed a few 
> years ago.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/BUILD 
> 486230db34a22ea5dd0f68da911c0afb1afbcac0 
>   src/main/python/apache/aurora/executor/common/executor_detector.py 
> a07bfc34caa5f86153ace8184b061e253c39e92e 
>   src/main/python/apache/aurora/executor/common/path_detector.py 
> ed264d74ef5a5a7aa681a56b340f9b16504a88ad 
>   src/main/python/apache/thermos/monitoring/detector.py 
> 6e5a620e9f49e0960a814f4c1f701a21cc7678fd 
>   src/test/python/apache/aurora/executor/common/test_executor_detector.py 
> d2a948f2e95cbc1723b63462787c4256cc56731d 
>   src/test/python/apache/aurora/executor/common/test_path_detector.py 
> 7b5ef0cf552d22d4cfbf3357071de036551026dc 
> 
> 
> Diff: https://reviews.apache.org/r/66139/diff/1/
> 
> 
> Testing
> ---
> 
> I have tested this build on a node with 75 running tasks and 4500 finished 
> ones.
> 
> Before this patch:
> 
> D0319 09:05:07.219057 14514 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 2.26s
> D0319 09:05:14.412610 14514 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 2.39s
> D0319 09:05:21.600985 14514 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 2.29s
>  
> With this patch:
> 
> D0319 09:00:24.084657 5427 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.77s
> D0319 09:00:29.869699 5427 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.78s
> D0319 09:00:35.643407 5427 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.77s
> 
> The regular 2-3 second freezes when navigating the Thermos UI are now almost 
> gone for me.
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66139: Speedup regular Thermos observer checkpoint refresh

2018-03-19 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66139/#review199490
---



@ReviewBot retry

- Stephan Erb


On March 19, 2018, 5:24 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66139/
> ---
> 
> (Updated March 19, 2018, 5:24 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Renan DelValle.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Profiling indicated that roughly 70% of the refresh time was spend in 
> `os.path.realpath`. 
> This was introduced in https://reviews.apache.org/r/35580/ to properly handle 
> the `latest`
> symlink in the Mesos folder layout. 
> 
> This patch takes a slightly different approach to solve this problem based on 
> `os.path.islink`.
> The latter is faster as it just needs to look at a single folder rather than 
> an entire path.
> 
> To facilitate this optimization the patch removes unused functionality from 
> ExecutorDetector.
> Most probably this  was used by former GC executor that got removed a few 
> years ago.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/BUILD 
> 486230db34a22ea5dd0f68da911c0afb1afbcac0 
>   src/main/python/apache/aurora/executor/common/executor_detector.py 
> a07bfc34caa5f86153ace8184b061e253c39e92e 
>   src/main/python/apache/aurora/executor/common/path_detector.py 
> ed264d74ef5a5a7aa681a56b340f9b16504a88ad 
>   src/main/python/apache/thermos/monitoring/detector.py 
> 6e5a620e9f49e0960a814f4c1f701a21cc7678fd 
>   src/test/python/apache/aurora/executor/common/test_executor_detector.py 
> d2a948f2e95cbc1723b63462787c4256cc56731d 
>   src/test/python/apache/aurora/executor/common/test_path_detector.py 
> 7b5ef0cf552d22d4cfbf3357071de036551026dc 
> 
> 
> Diff: https://reviews.apache.org/r/66139/diff/1/
> 
> 
> Testing
> ---
> 
> I have tested this build on a node with 75 running tasks and 4500 finished 
> ones.
> 
> Before this patch:
> 
> D0319 09:05:07.219057 14514 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 2.26s
> D0319 09:05:14.412610 14514 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 2.39s
> D0319 09:05:21.600985 14514 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 2.29s
>  
> With this patch:
> 
> D0319 09:00:24.084657 5427 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.77s
> D0319 09:00:29.869699 5427 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.78s
> D0319 09:00:35.643407 5427 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.77s
> 
> The regular 2-3 second freezes when navigating the Thermos UI are now almost 
> gone for me.
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66136: Switch Thermos to lazy log formatting

2018-03-19 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66136/#review199421
---



@ReviewBot retry

- Stephan Erb


On March 19, 2018, 3:55 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66136/
> ---
> 
> (Updated March 19, 2018, 3:55 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Renan DelValle.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is the first part of a small series of Thermos observer performance 
> improvements.
> 
> As a first iteration, this switches all logging to use the logger-embedded 
> formatting
> rather than doing it eager up front. This has the advantage that we produce 
> less garbage
> if debug logging is disabled.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/thermos/common/ckpt.py 
> e79ec6ac42483bfab4ccd94390d0f2af95903a95 
>   src/main/python/apache/thermos/core/helper.py 
> 0811e846ce8e39a22d55732604876c37d2caa239 
>   src/main/python/apache/thermos/core/muxer.py 
> 47e77f7685891545b2526e71e016ffcd454f938c 
>   src/main/python/apache/thermos/core/process.py 
> 4a4678ff39c84cb87836aca19365c5b2aabc4fa4 
>   src/main/python/apache/thermos/core/runner.py 
> 1b63c083246f8ec1a43cb1686968c2b1bb4a14b5 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 52c5d74fd70b5942ea3ef5101ba3f27bfc98fc21 
>   src/main/python/apache/thermos/monitoring/monitor.py 
> d77703e9027e6c23ffb67d936cf8beef0384b4a6 
>   src/main/python/apache/thermos/monitoring/process_collector_psutil.py 
> 3000e95e2930a01f57ac855960b5db7aabbfc0ca 
>   src/main/python/apache/thermos/monitoring/resource.py 
> f5e3849ca6682c6d4720698be869ca6b9f703b94 
>   src/main/python/apache/thermos/observer/http/file_browser.py 
> d0996651312a763ed68a68ae358524cfb4a94d29 
>   src/main/python/apache/thermos/observer/http/http_observer.py 
> 5bfc4f28bfbbcb654a059a3579844f5dbe082a9b 
>   src/main/python/apache/thermos/observer/http/static_assets.py 
> 83adeb3e699274483cb72910b0d25f7b30098f24 
>   src/main/python/apache/thermos/observer/observed_task.py 
> 08540e11523b0db990de57fe1ce7517047e2632b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> 4bb5d239e81fe4659397f899760c0e8853e93786 
>   src/main/python/apache/thermos/runner/thermos_runner.py 
> fa4f0fb7452bf8b85f9f3fa27283f54d7c0fe4f2 
> 
> 
> Diff: https://reviews.apache.org/r/66136/diff/1/
> 
> 
> Testing
> ---
> 
> Manually verified that the observer debug logs still contain useful output.
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Review Request 66139: Speedup regular Thermos observer checkpoint refresh

2018-03-19 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66139/
---

Review request for Aurora, Jordan Ly and Renan DelValle.


Repository: aurora


Description
---

Profiling indicated that roughly 70% of the refresh time was spend in 
`os.path.realpath`. 
This was introduced in https://reviews.apache.org/r/35580/ to properly handle 
the `latest`
symlink in the Mesos folder layout. 

This patch takes a slightly different approach to solve this problem based on 
`os.path.islink`.
The latter is faster as it just needs to look at a single folder rather than an 
entire path.

To facilitate this optimization the patch removes unused functionality from 
ExecutorDetector.
Most probably this  was used by former GC executor that got removed a few years 
ago.


Diffs
-

  src/main/python/apache/aurora/executor/BUILD 
486230db34a22ea5dd0f68da911c0afb1afbcac0 
  src/main/python/apache/aurora/executor/common/executor_detector.py 
a07bfc34caa5f86153ace8184b061e253c39e92e 
  src/main/python/apache/aurora/executor/common/path_detector.py 
ed264d74ef5a5a7aa681a56b340f9b16504a88ad 
  src/main/python/apache/thermos/monitoring/detector.py 
6e5a620e9f49e0960a814f4c1f701a21cc7678fd 
  src/test/python/apache/aurora/executor/common/test_executor_detector.py 
d2a948f2e95cbc1723b63462787c4256cc56731d 
  src/test/python/apache/aurora/executor/common/test_path_detector.py 
7b5ef0cf552d22d4cfbf3357071de036551026dc 


Diff: https://reviews.apache.org/r/66139/diff/1/


Testing
---

I have tested this build on a node with 75 running tasks and 4500 finished ones.

Before this patch:

D0319 09:05:07.219057 14514 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 2.26s
D0319 09:05:14.412610 14514 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 2.39s
D0319 09:05:21.600985 14514 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 2.29s
 
With this patch:

D0319 09:00:24.084657 5427 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.77s
D0319 09:00:29.869699 5427 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.78s
D0319 09:00:35.643407 5427 task_observer.py:142] TaskObserver: finished 
checkpoint refresh in 0.77s

The regular 2-3 second freezes when navigating the Thermos UI are now almost 
gone for me.


Thanks,

Stephan Erb



Review Request 66136: Switch Thermos to lazy log formatting

2018-03-19 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66136/
---

Review request for Aurora, Jordan Ly and Renan DelValle.


Repository: aurora


Description
---

This is the first part of a small series of Thermos observer performance 
improvements.

As a first iteration, this switches all logging to use the logger-embedded 
formatting
rather than doing it eager up front. This has the advantage that we produce 
less garbage
if debug logging is disabled.


Diffs
-

  src/main/python/apache/thermos/common/ckpt.py 
e79ec6ac42483bfab4ccd94390d0f2af95903a95 
  src/main/python/apache/thermos/core/helper.py 
0811e846ce8e39a22d55732604876c37d2caa239 
  src/main/python/apache/thermos/core/muxer.py 
47e77f7685891545b2526e71e016ffcd454f938c 
  src/main/python/apache/thermos/core/process.py 
4a4678ff39c84cb87836aca19365c5b2aabc4fa4 
  src/main/python/apache/thermos/core/runner.py 
1b63c083246f8ec1a43cb1686968c2b1bb4a14b5 
  src/main/python/apache/thermos/monitoring/disk.py 
52c5d74fd70b5942ea3ef5101ba3f27bfc98fc21 
  src/main/python/apache/thermos/monitoring/monitor.py 
d77703e9027e6c23ffb67d936cf8beef0384b4a6 
  src/main/python/apache/thermos/monitoring/process_collector_psutil.py 
3000e95e2930a01f57ac855960b5db7aabbfc0ca 
  src/main/python/apache/thermos/monitoring/resource.py 
f5e3849ca6682c6d4720698be869ca6b9f703b94 
  src/main/python/apache/thermos/observer/http/file_browser.py 
d0996651312a763ed68a68ae358524cfb4a94d29 
  src/main/python/apache/thermos/observer/http/http_observer.py 
5bfc4f28bfbbcb654a059a3579844f5dbe082a9b 
  src/main/python/apache/thermos/observer/http/static_assets.py 
83adeb3e699274483cb72910b0d25f7b30098f24 
  src/main/python/apache/thermos/observer/observed_task.py 
08540e11523b0db990de57fe1ce7517047e2632b 
  src/main/python/apache/thermos/observer/task_observer.py 
4bb5d239e81fe4659397f899760c0e8853e93786 
  src/main/python/apache/thermos/runner/thermos_runner.py 
fa4f0fb7452bf8b85f9f3fa27283f54d7c0fe4f2 


Diff: https://reviews.apache.org/r/66136/diff/1/


Testing
---

Manually verified that the observer debug logs still contain useful output.


Thanks,

Stephan Erb



Re: Review Request 65769: Remove unused module in RecoveryTool, move TaskTestUtil to test folder

2018-03-18 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65769/#review199394
---


Ship it!




Ship It!

- Stephan Erb


On Feb. 23, 2018, 7:04 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65769/
> ---
> 
> (Updated Feb. 23, 2018, 7:04 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Removing an unused `TierModule` from `RecoveryTool`.
> 
> Additionally, resolved an old TODO and moved `TaskTestUtil` to the test 
> folder. It seems that the old version of JMH could not see test sources but 
> https://github.com/melix/jmh-gradle-plugin/issues/31 and the upgrade to 0.4.4 
> seems to fix that.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 2b61c273860a33472890fc22e06b3c53d2086999 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/RecoveryTool.java
>  7cb4c52bc6e890f48a8bc9fe21c54cea38644808 
> 
> 
> Diff: https://reviews.apache.org/r/65769/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> `./gradlew jmh`
> 
> end-to-end tests pass.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 65769: Remove unused module in RecoveryTool, move TaskTestUtil to test folder

2018-03-18 Thread Stephan Erb


> On March 16, 2018, 7:02 p.m., Stephan Erb wrote:
> > Is this RecoveryTool just a prototype or used for real? If it is the 
> > preferred backup way now, we have to update 
> > https://github.com/apache/aurora/blob/master/docs/operations/backup-restore.md.
> 
> Jordan Ly wrote:
> I would call it a prototype for now. It has been tested in ad-hoc 
> environments but not at scale or production yet. I've been meaning to 
> properly vet it but I have not gotten a chance yet.

Ok, thanks for the clarification.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65769/#review199333
---


On Feb. 23, 2018, 7:04 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65769/
> ---
> 
> (Updated Feb. 23, 2018, 7:04 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Removing an unused `TierModule` from `RecoveryTool`.
> 
> Additionally, resolved an old TODO and moved `TaskTestUtil` to the test 
> folder. It seems that the old version of JMH could not see test sources but 
> https://github.com/melix/jmh-gradle-plugin/issues/31 and the upgrade to 0.4.4 
> seems to fix that.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 2b61c273860a33472890fc22e06b3c53d2086999 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/RecoveryTool.java
>  7cb4c52bc6e890f48a8bc9fe21c54cea38644808 
> 
> 
> Diff: https://reviews.apache.org/r/65769/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> `./gradlew jmh`
> 
> end-to-end tests pass.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 66074: Refactor ClusterState to more appropriate package, move binding to StateModule

2018-03-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66074/#review199334
---


Ship it!




Ship It!

- Stephan Erb


On March 14, 2018, 11:07 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66074/
> ---
> 
> (Updated March 14, 2018, 11:07 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Santhosh 
> Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Browsing through the code and I noticed that if preemption is turned off, the 
> `/state` endpoint will not work since `ClusterState` is not bound.
> 
> I moved `ClusterState` and `ClusterStateImpl` to a more suitable package, and 
> bind `ClusterState` in `StateModule` no matter what.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 54b6ed9d474065bab1da7512a42fc38264b892bc 
>   src/jmh/java/org/apache/aurora/benchmark/StatusUpdateBenchmark.java 
> 37374dcf3471d5eadf543d93e5a6a3e07a389dc1 
>   src/main/java/org/apache/aurora/scheduler/http/State.java 
> 6d1b400c286a63dffba85fd5c14cb76f6f3a45a4 
>   src/main/java/org/apache/aurora/scheduler/preemptor/ClusterState.java 
> ce3bc7e6da3f86625c690e26c28ccef67ed9021a 
>   src/main/java/org/apache/aurora/scheduler/preemptor/ClusterStateImpl.java 
> 5574e9ba916513e8f5dc3ef78ddeb084330c 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java 
> 056e4668490accab9b70e0634393580e40514f86 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
> 4de5ef8be76ead1970b27d91eee90198858e53c8 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 76c327758f88ad23942ea3b388830cb4a31728dd 
>   src/test/java/org/apache/aurora/scheduler/http/StateTest.java 
> 0685d6ee6148d0a22c07e18a808f2816299b10d3 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/ClusterStateImplTest.java 
> 881bb20feaa83797831828a3cdd7a2f507621387 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorTest.java
>  35e9348e6ab0c14e6ae20af4076358d7e12681c7 
> 
> 
> Diff: https://reviews.apache.org/r/66074/diff/1/
> 
> 
> Testing
> ---
> 
> Manually tested state endpoint returns jobs when preemptor is both on and off.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 65769: Remove unused module in RecoveryTool, move TaskTestUtil to test folder

2018-03-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65769/#review199333
---



Is this RecoveryTool just a prototype or used for real? If it is the preferred 
backup way now, we have to update 
https://github.com/apache/aurora/blob/master/docs/operations/backup-restore.md.

- Stephan Erb


On Feb. 23, 2018, 7:04 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65769/
> ---
> 
> (Updated Feb. 23, 2018, 7:04 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Removing an unused `TierModule` from `RecoveryTool`.
> 
> Additionally, resolved an old TODO and moved `TaskTestUtil` to the test 
> folder. It seems that the old version of JMH could not see test sources but 
> https://github.com/melix/jmh-gradle-plugin/issues/31 and the upgrade to 0.4.4 
> seems to fix that.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 2b61c273860a33472890fc22e06b3c53d2086999 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/RecoveryTool.java
>  7cb4c52bc6e890f48a8bc9fe21c54cea38644808 
> 
> 
> Diff: https://reviews.apache.org/r/65769/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> `./gradlew jmh`
> 
> end-to-end tests pass.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 63685: RFC: Use new scheduler UI as landing page

2018-03-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63685/#review199332
---



I won't be able to work in this in the near future. Closing.

- Stephan Erb


On Nov. 8, 2017, 11:32 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63685/
> ---
> 
> (Updated Nov. 8, 2017, 11:32 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Joshua Cohen.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Is this something the community would be interested in?
> 
> Known issues in this minimal viable prototype:
> 
> * I did not manage to convince Jetty to perform the redirecting. Using the 
> HTML feels hacky but works.
> * An operator will have difficulties to navigate to /admin of a non-leading 
> instance. In particular if there is no leader at all.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/http/JettyServerModule.java 
> 0f8528c3403b5f51f082aa54c16358f7568f439a 
>   src/main/resources/scheduler/assets/index.html 
> 533091395547a6b067dbf5f53e42a13a560971da 
>   ui/src/main/js/components/Navigation.js 
> 50881a9914cee2812807624cd28f24fa64207296 
>   ui/src/main/js/index.js 9f94d4bd6f649d74bdd9c3aa99783e01cae25d93 
>   ui/src/main/js/pages/Admin.js PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/63685/diff/1/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> Screen Shot 2017-11-08 at 23.18.06.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/08/80ae2c5f-e431-4f86-8811-13bfe6a8627b__Screen_Shot_2017-11-08_at_23.18.06.png
> Screen Shot 2017-11-08 at 23.18.20.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/08/9fb0f2e0-49c9-427f-8b84-d3d1fc057a59__Screen_Shot_2017-11-08_at_23.18.20.png
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 65941: Avoid scheduling on the same host the ancestor of a task recently failed on

2018-03-07 Thread Stephan Erb


> On March 7, 2018, 7:48 p.m., David McLaughlin wrote:
> > So what happens if there are two bad hosts? :)
> 
> Jordan Ly wrote:
> This does not scale past n=1
> 
> We can make this more generic by getting the list of hosts the task has 
> previously failed on and looking through offers for a host the task did not 
> fail on for some operator defined value (something like 
> `-failure_avoidance_factor`)
> 
> Santhosh Kumar Shanmugham wrote:
> Note making this more generic is still incumbent on the amount of task 
> history we have on the scheduler.
> 
> Jordan Ly wrote:
> Discussed offline:
> 
> Going to go a different route -- this method is very domain-specific and 
> does not allow for preemption to kick in since if there is only one host 
> matching and it is bad you can still be repeatedly scheduled on it. Instead, 
> going to go a more generic solution involving banning scheduling on a host 
> temporarily if the task fails on that host via `SchedulingFilter`. This would 
> be enabled through a operator-defined option.

Different idea: If the ancestor was LOST or FAILED, use a coin-flip to decide 
if we want to use a matching offer or not. This does not require additional 
state and gives sufficient chance for the task to come up in one of the future 
scheduling rounds. As it would be only used for re-scheduled tasks, it does not 
lead to a performance impact in the normal case.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65941/#review198803
---


On March 7, 2018, 6:50 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65941/
> ---
> 
> (Updated March 7, 2018, 6:50 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If a task fails on a host, we should try to avoid rescheduling the task on 
> the same host if possible. This is done in order to avoid a potentially bad 
> host. This issue generally comes up when you are bin-packing hosts (i.e. 
> using the `-offer_order` option).
> 
> If there are no other offers to schedule the task on, we will still use the 
> offer.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java 
> fcafecf63040f9c410458dedfd3d87b0d669d205 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImplTest.java
>  864538b6730d7318385494818276ba370124b8e9 
> 
> 
> Diff: https://reviews.apache.org/r/65941/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> 
> Benchmarks and live-cluster testing coming soon.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 65873: Upgrade RBT to 0.7.11

2018-03-01 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65873/#review198508
---


Ship it!




Ship It!

- Stephan Erb


On März 2, 2018, 1:14 vorm., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65873/
> ---
> 
> (Updated März 2, 2018, 1:14 vorm.)
> 
> 
> Review request for Aurora, Jordan Ly and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Upgrade RBT to 0.7.11
> 
> 
> Diffs
> -
> 
>   rbt 5b94e53357e8934df040e603efc9ad2378ff6575 
> 
> 
> Diff: https://reviews.apache.org/r/65873/diff/1/
> 
> 
> Testing
> ---
> 
> Ran ./rbt and submitted this review request using it.
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 65598: Disable pytest-fast mode as a workaround for failing health checker tests

2018-02-15 Thread Stephan Erb


> On Feb. 14, 2018, 7:28 p.m., Renan DelValle wrote:
> > build-support/jenkins/build.sh
> > Lines 30 (patched)
> > <https://reviews.apache.org/r/65598/diff/1/?file=1956913#file1956913line30>
> >
> > Do we still need --junit-xml-dir="$PWD/dist/test-results" to make the 
> > logs go to the right place?

Pleae see my comment above (probably auto-collapsed by now): For unknown 
reasons, disabling the fast mode breaks the junit-xml generation and I 
therefore removed it. As far as I know we don't read the files anywhere, not 
even on Jenkins.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65598/#review197571
-------


On Feb. 12, 2018, 9:36 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65598/
> ---
> 
> (Updated Feb. 12, 2018, 9:36 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Renan DelValle.
> 
> 
> Bugs: AURORA-1972
> https://issues.apache.org/jira/browse/AURORA-1972
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Disable pytest-fast mode as a workaround for failing health checker tests
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/build.sh 4c25a3fa315ab965df8908fe7c28be48ceb002bc 
> 
> 
> Diff: https://reviews.apache.org/r/65598/diff/1/
> 
> 
> Testing
> ---
> 
> repeated git clean -xfd && ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 65650: Add GPG key for jorda...@apache.org

2018-02-14 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65650/#review197546
---


Ship it!




Ship It!

- Stephan Erb


On Feb. 14, 2018, 9:36 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65650/
> ---
> 
> (Updated Feb. 14, 2018, 9:36 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Renan DelValle.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add GPG key for jorda...@apache.org
> 
> 
> Diffs
> -
> 
>   KEYS 843e955e8f69e4d36b61df66aa20a1c914cb77c9 
> 
> 
> Diff: https://reviews.apache.org/r/65650/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 65649: Adding support for a Thrift JSON request which defines UTF-8 as the charset for the Content-Type in the Request Headers

2018-02-14 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65649/#review197520
---


Ship it!




Ship It!

- Stephan Erb


On Feb. 14, 2018, 6:05 a.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65649/
> ---
> 
> (Updated Feb. 14, 2018, 6:05 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support in the scheduler to receive a JSON request which defines a 
> UTF-8 charset in the Content-Type inside of the HTTP Request Headers.
> 
> This fixes the current UI brakage as Thrift is incorrectly rejected by the 
> scheduler servlet as an unsupported media type.
> 
> Adding a test to prevent regressions.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/http/api/ApiModule.java 
> 91ff8d32be1dacd35c7ba74b666479a61aede296 
>   src/test/java/org/apache/aurora/scheduler/http/api/ApiIT.java 
> 10da43bdc113791d0644dcc480476323f86cc66e 
> 
> 
> Diff: https://reviews.apache.org/r/65649/diff/1/
> 
> 
> Testing
> ---
> 
> End to end testing
> Loading up UI in different browsers in order to make sure thrift.js was not 
> cached.
> vagrant destroy && vagrant up
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 65598: Disable pytest-fast mode as a workaround for failing health checker tests

2018-02-11 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65598/#review197238
---




build-support/jenkins/build.sh
Line 28 (original), 28-30 (patched)
<https://reviews.apache.org/r/65598/#comment277371>

Note to reviewers: This is not a real fix but mostly meant to stop the 
bleeding right now.

For unknown reasons, disabling the fast mode breaks the junit-xml 
generation and I therefore removed it. As far as I know we don't read the files 
anywhere, not even on Jenkins.


- Stephan Erb


On Feb. 11, 2018, 11:44 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65598/
> ---
> 
> (Updated Feb. 11, 2018, 11:44 p.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Bugs: AURORA-1972
> https://issues.apache.org/jira/browse/AURORA-1972
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Disable pytest-fast mode as a workaround for failing health checker tests
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/build.sh 4c25a3fa315ab965df8908fe7c28be48ceb002bc 
> 
> 
> Diff: https://reviews.apache.org/r/65598/diff/1/
> 
> 
> Testing
> ---
> 
> repeated git clean -xfd && ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Review Request 65598: Disable pytest-fast mode as a workaround for failing health checker tests

2018-02-11 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65598/
---

Review request for Aurora.


Bugs: AURORA-1972
https://issues.apache.org/jira/browse/AURORA-1972


Repository: aurora


Description
---

Disable pytest-fast mode as a workaround for failing health checker tests


Diffs
-

  build-support/jenkins/build.sh 4c25a3fa315ab965df8908fe7c28be48ceb002bc 


Diff: https://reviews.apache.org/r/65598/diff/1/


Testing
---

repeated git clean -xfd && ./build-support/jenkins/build.sh


Thanks,

Stephan Erb



Re: Review Request 65476: Add PartitionPolicy to config summary when defined

2018-02-02 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65476/#review196700
---



@ReviewBot retry

- Stephan Erb


On Feb. 2, 2018, 8:46 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65476/
> ---
> 
> (Updated Feb. 2, 2018, 8:46 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add PartitionPolicy to config summary when defined
> 
> 
> Diffs
> -
> 
>   ui/src/main/js/components/TaskConfigSummary.js 
> 01902cdd90192113b93cdf5872cd560c431f2bf0 
> 
> 
> Diff: https://reviews.apache.org/r/65476/diff/1/
> 
> 
> Testing
> ---
> 
> Testing in vagrant (see screenshot).
> 
> 
> File Attachments
> 
> 
> Screen Shot 2018-02-01 at 11.46.17 PM.png
>   
> https://reviews.apache.org/media/uploaded/files/2018/02/02/1353e8ee-bb7b-4546-adf6-9f1ececb36b4__Screen_Shot_2018-02-01_at_11.46.17_PM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65477: Fix UI table layout issue on Config Summaries

2018-02-02 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65477/#review196701
---



@ReviewBot retry

- Stephan Erb


On Feb. 2, 2018, 8:52 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65477/
> ---
> 
> (Updated Feb. 2, 2018, 8:52 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Fix UI table layout issue when configuration metadata (or any other cell) is 
> longer than the max width of the table.
> 
> 
> Diffs
> -
> 
>   ui/src/main/sass/components/_job-page.scss 
> 2ae4e73c5d8d4e653cd1c550e699888b42554aec 
> 
> 
> Diff: https://reviews.apache.org/r/65477/diff/1/
> 
> 
> Testing
> ---
> 
> See screenshots.
> 
> 
> File Attachments
> 
> 
> before
>   
> https://reviews.apache.org/media/uploaded/files/2018/02/02/dcfa3542-10cb-42d2-9a3f-7fd8e5d54ddf__Screen_Shot_2018-02-01_at_11.50.04_PM.png
> after
>   
> https://reviews.apache.org/media/uploaded/files/2018/02/02/ff3cf950-e26f-4593-9169-aa8b489d7d49__Screen_Shot_2018-02-01_at_11.49.51_PM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65476: Add PartitionPolicy to config summary when defined

2018-02-02 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65476/#review196699
---


Ship it!




Ship It!

- Stephan Erb


On Feb. 2, 2018, 8:46 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65476/
> ---
> 
> (Updated Feb. 2, 2018, 8:46 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add PartitionPolicy to config summary when defined
> 
> 
> Diffs
> -
> 
>   ui/src/main/js/components/TaskConfigSummary.js 
> 01902cdd90192113b93cdf5872cd560c431f2bf0 
> 
> 
> Diff: https://reviews.apache.org/r/65476/diff/1/
> 
> 
> Testing
> ---
> 
> Testing in vagrant (see screenshot).
> 
> 
> File Attachments
> 
> 
> Screen Shot 2018-02-01 at 11.46.17 PM.png
>   
> https://reviews.apache.org/media/uploaded/files/2018/02/02/1353e8ee-bb7b-4546-adf6-9f1ececb36b4__Screen_Shot_2018-02-01_at_11.46.17_PM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65477: Fix UI table layout issue on Config Summaries

2018-02-02 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65477/#review196698
---


Ship it!




Ship It!

- Stephan Erb


On Feb. 2, 2018, 8:52 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65477/
> ---
> 
> (Updated Feb. 2, 2018, 8:52 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Fix UI table layout issue when configuration metadata (or any other cell) is 
> longer than the max width of the table.
> 
> 
> Diffs
> -
> 
>   ui/src/main/sass/components/_job-page.scss 
> 2ae4e73c5d8d4e653cd1c550e699888b42554aec 
> 
> 
> Diff: https://reviews.apache.org/r/65477/diff/1/
> 
> 
> Testing
> ---
> 
> See screenshots.
> 
> 
> File Attachments
> 
> 
> before
>   
> https://reviews.apache.org/media/uploaded/files/2018/02/02/dcfa3542-10cb-42d2-9a3f-7fd8e5d54ddf__Screen_Shot_2018-02-01_at_11.50.04_PM.png
> after
>   
> https://reviews.apache.org/media/uploaded/files/2018/02/02/ff3cf950-e26f-4593-9169-aa8b489d7d49__Screen_Shot_2018-02-01_at_11.49.51_PM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-31 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196583
---


Ship it!




Ship It!

- Stephan Erb


On Jan. 31, 2018, 7:12 nachm., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 31, 2018, 7:12 nachm.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
> 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here 
> is an abridged version.  It shows that task fetch throughput universally 
> improves by ~2x (mod error margins), and heap allocation reduces by at least 
> the same factor.  Overall GC time increases slightly as captured here, but 
> the stddev was anecdotally high across runs.  I chose to present this output 
> as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1   481.529 ± 
> 184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1334970.771 ±   
> 33544.960B/op
> 
> FetchAll.run  578.652 ±  
> 20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   3991107.524 ±  
> 701585.657B/op
> 
> FetchAll.run 1038.371 ±  
> 11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10  13487028.139 ± 
> 3369614.510B/op
> 
> IndexedFetchAndFilter.run 1   296.557 ± 
> 198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1655319.005 ±   
> 98138.360B/op
> 
> IndexedFetchAndFilter.run 550.300 ±   
> 5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 5   6671548.381 ±  
> 452020.849B/op
> 
> IndexedFetchAndFilter.run1017.637 ±   
> 3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm10  28100173.458 ± 
> 4486308.188B/op
> ```
> 
> With this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1  1653.572 ± 
> 799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1155426.052 ±   
> 10345.657B/op
> 
> FetchAll.run  5   210.454 ±  
> 54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   1457560.505 ±  
> 228631.547B/op
> 
> FetchAll.run 1097.783 

Re: Review Request 65434: Ensure primary_port warning respects announcer portmap

2018-01-31 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65434/#review196563
---



@ReviewBot retry

- Stephan Erb


On Jan. 31, 2018, 11:57 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65434/
> ---
> 
> (Updated Jan. 31, 2018, 11:57 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Bugs: AURORA-1233
> https://issues.apache.org/jira/browse/AURORA-1233
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This eliminates false-positive warnings in the client: It used to complain 
> about unbound primary ports if those where bound via the portmap.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/config.py 
> adc8db0290d80a0ebfed983f6518d063f355d81f 
>   src/test/python/apache/aurora/client/test_config.py 
> 3d5289adcb2d53506644604380797ff64227fecd 
> 
> 
> Diff: https://reviews.apache.org/r/65434/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 65434: Ensure primary_port warning respects announcer portmap

2018-01-31 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65434/#review196561
---




src/main/python/apache/aurora/client/config.py
Lines 47-49 (original), 49-50 (patched)
<https://reviews.apache.org/r/65434/#comment276262>

Note to reviewers: `config.raw().has_announce()` is already checked above 
so I could simplify the condition here. This is just a refactoring and 
independent of the behaviour change of this patch.


- Stephan Erb


On Jan. 31, 2018, 11:57 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65434/
> ---
> 
> (Updated Jan. 31, 2018, 11:57 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Bugs: AURORA-1233
> https://issues.apache.org/jira/browse/AURORA-1233
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This eliminates false-positive warnings in the client: It used to complain 
> about unbound primary ports if those where bound via the portmap.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/config.py 
> adc8db0290d80a0ebfed983f6518d063f355d81f 
>   src/test/python/apache/aurora/client/test_config.py 
> 3d5289adcb2d53506644604380797ff64227fecd 
> 
> 
> Diff: https://reviews.apache.org/r/65434/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Review Request 65433: Update Javascript Thrift to 0.10

2018-01-31 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65433/
---

Review request for Aurora, David McLaughlin and Bill Farner.


Repository: aurora


Description
---

We missed this change when bumping the Thrift version used by Python and
Java. https://reviews.apache.org/r/64290/

List of changes: 
https://github.com/apache/thrift/commits/master/lib/js/src/thrift.js


Diffs
-

  3rdparty/javascript/scheduler/assets/js/thrift.js 
111518739a7d353433ad59b7e48c84c2dbb546db 


Diff: https://reviews.apache.org/r/65433/diff/1/


Testing
---

Manually verified that the UI is still working


Thanks,

Stephan Erb



Re: Review Request 65338: Fix error handling logic for launch failures

2018-01-25 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65338/#review196248
---




src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java
Line 135 (original), 135 (patched)
<https://reviews.apache.org/r/65338/#comment275796>

To prevent future mistakes of the same kind, should we remove the 
`casState` state here and just transition to `LOST` unconditionally?

I cannot think of a scenario right now where launching would fail, but we 
would still like the task to live on.


- Stephan Erb


On Jan. 25, 2018, 9:43 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65338/
> ---
> 
> (Updated Jan. 25, 2018, 9:43 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Discovered while debugging https://issues.apache.org/jira/browse/AURORA-1966. 
> Before we attempt to launch a task, we  move the task to ASSIGNED state. 
> However, the code to deal with launch failures expects the task to be in 
> PENDING state. So the ASSIGNED -> LOST state change fails, and instead we 
> rely on the transient task timeout for correctness. This means errors that 
> can be recovered from in seconds instead take at least five minutes (by 
> default).
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImpl.java 
> 916908bbf635a261c01777cd3a357ca457dd9726 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskAssignerImplTest.java
>  533bb44953163e2148fa18c394a4338938dae205 
> 
> 
> Diff: https://reviews.apache.org/r/65338/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> Also tested in Vagrant.
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-24 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196107
---




src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
Line 234 (original), 235 (patched)
<https://reviews.apache.org/r/65303/#comment275620>

Have you considered passing in the predicate filter in here? For index 
scans this should help to eliminate a large amount of allocations.


- Stephan Erb


On Jan. 24, 2018, 1:32 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 24, 2018, 1:32 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
> the bottom, but here is an abridged version.  It shows that task fetch 
> throughput universally improves by at least 2x, and heap allocation reduces 
> by at least the same factor.  Overall GC time increases slightly as captured 
> here, but the stddev was anecdotally high across runs.  I chose to present 
> this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark (numTasks)Score Error   Units
> 
>   1  1066.632 ± 266.924   ops/s
> ·gc.alloc.rate.norm   1289227.205 ±.051B/op
> ·gc.count 124.000counts
> ·gc.time  1   103.000ms
> 
>   584.444 ±  32.620   ops/s
> ·gc.alloc.rate.norm   5   3831210.967 ±  840844.713B/op
> ·gc.count 521.000counts
> ·gc.time  5  1407.000ms
> 
>  1038.645 ±  20.557   ops/s
> ·gc.alloc.rate.norm  10  13555430.931 ± 6787344.701B/op
> ·gc.count1052.000counts
> ·gc.time 10  3304.000ms
> ```
> 
> With this patch:
> ```console
> Benchmark   (numTasks)   Score Error   Units
> 
>  12851.288 ± 481.472   ops/s
> ·gc.alloc.rate.norm  1  145281.908 ±2223.621B/op
> ·gc.count1  39.000counts
> ·gc.time 1 130.000ms
> 
>  5 297.380 ±  35.681   ops/s
> ·gc.alloc.rate.norm  5 1183791.866 ±   77487.278B/op
> ·gc.count5  25.000counts
> ·gc.time 51821.000ms
> 
> 10 122.211 ±  81.618   ops/s  
>   
> ·gc.alloc.rate.norm 10 4364450.973 ± 2856586.882B/op
> ·gc.count   10  52.000

Re: Review Request 64288: Add a SQL persistence implementation

2018-01-20 Thread Stephan Erb


> On Dec. 12, 2017, 11:15 p.m., Stephan Erb wrote:
> > The code looks fine and reasonable to me. I would still recommend proper 
> > scale testing though.
> > 
> > At my company, we operate at a small scale and the Mesos replicated log 
> > works still well for us. Operating a HA MySQL or PostgreSQL cluster would 
> > come with a higher operational burden. If possible, I would therefore like 
> > to see that Aurora continues to support its simple replicated log 
> > deployment mode for now. I believe that other memobers of the community 
> > might feel similar. I can totally see though that this does not work for 
> > Twitter
> 
> Mohit Jaggi wrote:
> +1 to this. Our scale is high (not due to cluster size but due to a lot 
> of update/rollback/failure activity) and I like this option. But I don't want 
> to depend on external entities. The dependency on Zookeeper and Mesos is 
> already a problem.
> 
> Bill Farner wrote:
> Noted.  I had a weak inclination to replace the replicated log with a 
> `ZooKeeperPersistence` implementation for OSS use.  Would that be of interest 
> to either of you as an alternative?

I have not looked very deeply into ZK persistence yet. From what I have heared, 
Marathon has been struggerling with its ZK storage backend for years. On the 
upside, removing the replicated log from Aurora would be one further steps 
towards removing the libmesos dependency from the scheduler.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64288/#review193234
---


On Jan. 5, 2018, 1:30 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64288/
> ---
> 
> (Updated Jan. 5, 2018, 1:30 a.m.)
> 
> 
> Review request for Aurora, Daniel Knightly, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Introduces a `Persistence` implementation that uses a SQL database via JDBC.  
> I've opted to lean towards MySQL SQL dialect, as unfortunately there are 
> vendor differences for even the very simple SQL used here.
> 
> I chose [HikariCP](https://github.com/brettwooldridge/HikariCP) to serve as a 
> `DataSource` (connection pool) implementation.  We don't really need much out 
> of a connection pool aside from general connection lifecycle management (i.e. 
> not for concurrency).  I chose this library based on recent development 
> activity and several positive comparisons to other pools.
> 
> Note that the implementation is not yet wired into the scheduler application. 
>  That will come in a follow-up.
> 
> 
> Diffs
> -
> 
>   build.gradle af119910e84c48f75f2573ababcfa287c3b986fc 
>   config/spotbugs/excludeFilter.xml 51790cce8d9047e40741f05ee55af15dbdc3065e 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> dbbe1d1689ed3e455a95f529f914dc6823427d37 
>   src/dist/etc/h2-database.properties PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 7ffcf4f471b97e32426c82972472115c5c5c4d02 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> b7f43e0d6efbddcac640c3d39c7bc56400e68e68 
>   
> src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java 
> a4984a95f938396c82244f91e4a3d592df1c1539 
>   src/main/java/org/apache/aurora/scheduler/storage/backup/BackupReader.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  85b2113631586f43d854c4d2812f43b7b864d452 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/Persistence.java 
> 9eb862c01bf451252bfbcc7a2eac60d2c965c9f0 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogPersistence.java 
> e70e6051582ca90ae72014626b983bbf4b8d5b48 
>   
> src/main/java/org/apache/aurora/scheduler/storage/sql/DisabledDistributedSnapshotStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/sql/Mode.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistence.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistenceModule.java
>  PRE-CREATION 
>   src/main/resources/org/apache/aurora/scheduler/storage/sql/schema.sql 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 5cb5310ed096ca1fb47b980401e3712948271ac4 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durab

Re: Review Request 65233: Allow for injection of custom OfferSets, removed OfferOrder and OfferSelector

2018-01-20 Thread Stephan Erb


> On Jan. 19, 2018, 8:27 p.m., Stephan Erb wrote:
> > The patch itself looks fine! However, I have difficulties assesing it 
> > properly as I am missing a bit of context. What is the main customization 
> > that you aim to implement using the new interface?
> > 
> > I am also curious what is your take on the removed offer sorting logic. 
> > Does it still work for Twitter as intended originally? We are currently 
> > using it as well, so I would need to re-implement if it gets removed. (Main 
> > issue we have noticed so far was that the number of tasks per agent can 
> > shift drastically, sometimes even up to a point where the Thermos observer 
> > is completely unusable).
> 
> Jordan Ly wrote:
> Some context for the new interface:
> 
> For performance, it is helpful to control the data structure we are 
> holding offers in and how we can take advantage of it to narrow down the list 
> of candidates to schedule on. For example, in the case of CPU ordering, if we 
> call `getOrdered(TaskGroupKey, ResourceRequest)` -- since we know the size of 
> the ResourceRequest and since we have access to the `offers` NavigableSet, we 
> might use `subMap` to return only candidates that we know will fit thus 
> reducing the number of offers we have to evaluate right away.
> 
> Stephan Erb wrote:
> Yeah that makes sense, but would have also been possible without the set 
> of recent scheduling refactorings (either by using the `NavigableSet` as-is 
> or via something like 
> https://github.com/wfarner/aurora/commit/f77d79a2d01c3b5b34f11b812d5dcff2789e0766
>  ). On top, this could even be done by default. 
> 
> If I am reading correctly between the lines of this RB here and 
> https://reviews.apache.org/r/64954/ it sounds like you are moving towards a 
> custom, heavy-weight scheduling logic. My question essentially boils down to: 
> Do you go down the custom route because your usecase is so specific? Or is it 
> just more involved doing it in the open (lack of reviews, increases 
> turnaround times, ...)?
> 
> I don't intend to block your patch, as it looks good. I just want to 
> understand where this is heading and why.
> 
> David McLaughlin wrote:
> Twitter has a 85/15 split between production (preferred) and preemtible 
> tier tasks. The idea we're pursuing is to try and split the cluster logically 
> into hosts that have preferred tasks and hosts that have preemtible tasks. 
> Why is this useful? Well, the machines with preemptible tasks are effectively 
> "empty" with preemption enabled, in terms of available slots that other jobs 
> can grow into. It means if we commit 15% of our budget to the preemptible 
> tier, we can also guarantee that we can support that % of growth in the 
> cluster. This will enable larger instances, and temporary bursts, etc. 
> 
> Bin-packing alone doesn't solve this, because if you end up with every 
> host 85% prod and 15% non-prod, and you are low on capacity (forcing 
> preemption) only slots 15% the size of a host are available. This excludes 
> many jobs from growing, and it forces us to put unnecessary "core limits" on 
> jobs. 
> 
> I think this type of scheduling optimization is very specific to Twitter. 
> As is the scale we're working at, which meant that score-based scheduling 
> using just OfferSelector didn't work for us. But there are some neat 
> optimizations we can make for just the condition we want, but it requires 
> access to the underlying data structures (basically to support reverse 
> iteration in a performant way). So all of this motivates this patch.
> 
> Jordan Ly wrote:
> @StephanErb I now realize that you meant you were still utilizing the 
> `offer_order` option and not the custom `offer_order_modules` one.
> 
> I will re-add this in a follow-up patch since it can still co-exist with 
> the default `OfferSetImpl`.

@David Thanks a lot for sharing! That is indeed a problem I have not 
encountered or considered before.

@Jordan Thanks!


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65233/#review195844
---


On Jan. 20, 2018, 2:31 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65233/
> ---
> 
> (Updated Jan. 20, 2018, 2:31 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 

Re: Review Request 65233: Allow for injection of custom OfferSets, removed OfferOrder and OfferSelector

2018-01-19 Thread Stephan Erb


> On Jan. 19, 2018, 8:27 p.m., Stephan Erb wrote:
> > The patch itself looks fine! However, I have difficulties assesing it 
> > properly as I am missing a bit of context. What is the main customization 
> > that you aim to implement using the new interface?
> > 
> > I am also curious what is your take on the removed offer sorting logic. 
> > Does it still work for Twitter as intended originally? We are currently 
> > using it as well, so I would need to re-implement if it gets removed. (Main 
> > issue we have noticed so far was that the number of tasks per agent can 
> > shift drastically, sometimes even up to a point where the Thermos observer 
> > is completely unusable).
> 
> Jordan Ly wrote:
> Some context for the new interface:
> 
> For performance, it is helpful to control the data structure we are 
> holding offers in and how we can take advantage of it to narrow down the list 
> of candidates to schedule on. For example, in the case of CPU ordering, if we 
> call `getOrdered(TaskGroupKey, ResourceRequest)` -- since we know the size of 
> the ResourceRequest and since we have access to the `offers` NavigableSet, we 
> might use `subMap` to return only candidates that we know will fit thus 
> reducing the number of offers we have to evaluate right away.

Yeah that makes sense, but would have also been possible without the set of 
recent scheduling refactorings (either by using the `NavigableSet` as-is or via 
something like 
https://github.com/wfarner/aurora/commit/f77d79a2d01c3b5b34f11b812d5dcff2789e0766
 ). On top, this could even be done by default. 

If I am reading correctly between the lines of this RB here and 
https://reviews.apache.org/r/64954/ it sounds like you are moving towards a 
custom, heavy-weight scheduling logic. My question essentially boils down to: 
Do you go down the custom route because your usecase is so specific? Or is it 
just more involved doing it in the open (lack of reviews, increases turnaround 
times, ...)?

I don't intend to block your patch, as it looks good. I just want to understand 
where this is heading and why.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65233/#review195844
---


On Jan. 19, 2018, 10:40 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65233/
> -----------
> 
> (Updated Jan. 19, 2018, 10:40 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The goal of this patch is to provide a more reasonable abstraction for custom 
> scheduling.
> 
> Currently, we have `OfferSelector`, `OfferOrder`, and recently I proposed 
> `FilterableOfferCollection` (https://reviews.apache.org/r/65225/). These were 
> all created in order to provide more customization within the scheduling 
> loop. However, they suffer from being leaky and too disparate. This patch 
> hopes to combine all of those components into a single `OfferSet` which can 
> be injected and used by HostOffers. This interface allows for custom 
> scheduling logic to be co-located with custom data structures for `offers` as 
> opposed to being passed around as different components.
> 
> The following options will be removed from the last 0.19 to now:
> ```
> -offer_order
> -offer_order_modules
> ```
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 3053e543793669601ec9132adae376bcdce64c32 
>   docs/reference/scheduler-configuration.md 
> f697b6f1f5332ac9a9feade48317ecf5818e0c92 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> f44cb61f5d1805aeb0f98135453dbc82f3bc9ad4 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> e4e535824adb86c98c6173a20f97a9a90b08483a 
>   src/main/java/org/apache/aurora/scheduler/offers/HostOffers.java 
> 2ea7a01085b87c8ed6765537a8005e2349784ab0 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 8f9e33d81be9087821784a8a08079a1736d9cb63 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManagerModule.java 
> de16c14c887d4225e0629b1580eb5e740798f1f7 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferOrder.java 
> 7c99c309e0e97519719905f725bb9f6a59d2a7f5 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferOrderBuilder.java 
> 1260ef19506acb8e8a937d4fd7b7152361bd3c40 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSet.java PRE-CREA

Re: Review Request 64825: Update packaging to latest Thrift, Mesos, and virtualenv

2018-01-10 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64825/
---

(Updated Jan. 10, 2018, 8:58 p.m.)


Review request for Aurora, Santhosh Kumar Shanmugham and Bill Farner.


Repository: aurora-packaging


Description (updated)
---

* Remove the hardwired Thrift build dependency from the RPM and DEB packages.
  With the update to Thrift 0.10 and pants 1.4.0.dev23, the pants build script 
will
  download an appropriate Thrift compiler automatically.
* Add missing system dependencies missing by the latest pants version.
* Use latest virtualenv script from the main Aurora repo


Diffs
-

  build-support/virtualenv 3ed5b8c26007dc8ac2db3fc3157ba323b33664f2 
  builder/deb/debian-jessie/Dockerfile 1983fb603c3216d2ace726ec133b5876343cc2df 
  builder/deb/ubuntu-trusty/Dockerfile 99c5481d367e9f1049bc2ecd331929bc73204442 
  builder/deb/ubuntu-xenial/Dockerfile dfdce150e38d081c02b991d5a24bec768c070c63 
  builder/rpm/centos-7/Dockerfile c12a61ffb0b835970e88d724ca3f70e7269f7638 
  specs/debian/control df863062e53034a6f04f457cd4a204daaa2cf80b 
  specs/rpm/aurora.spec e1eff9c1f6092273b95104f9f4dd106846639a8d 
  test/deb/debian-jessie/provision.sh 331a26c9c60a808f941fd54ee136db817bab2789 
  test/deb/ubuntu-trusty/provision.sh b9cbd438715049a0afdc85da17f79e59b870c113 
  test/deb/ubuntu-xenial/provision.sh 2adb879ec967bb0f07b3b535b0707972b4ac8d1e 
  test/rpm/centos-7/Vagrantfile 8342b091eeefae35a592c79cf30ab3db82c349ca 
  test/rpm/centos-7/provision.sh f41ffd17aaa3eaa314d746980c4f8e56f0373f11 


Diff: https://reviews.apache.org/r/64825/diff/1/


Testing
---

./build-artifact.sh builder/rpm/centos-7/ snapshot.tar.gz 0.20.0-snapshot
./test/test-artifact.sh test/rpm/centos-7 
/repo/artifacts/aurora-centos-7/dist/rpmbuild/RPMS/x86_64


Thanks,

Stephan Erb



Re: Review Request 64825: Update packaging to latest Thrift, Mesos, and virtualenv

2018-01-10 Thread Stephan Erb


> On Jan. 3, 2018, 10:43 p.m., Santhosh Kumar Shanmugham wrote:
> > builder/deb/ubuntu-trusty/Dockerfile
> > Lines 40-45 (original)
> > <https://reviews.apache.org/r/64825/diff/1/?file=1927256#file1927256line40>
> >
> > Explain the reason for removing these in the commit message.

done


> On Jan. 3, 2018, 10:43 p.m., Santhosh Kumar Shanmugham wrote:
> > builder/deb/ubuntu-xenial/Dockerfile
> > Line 40 (original)
> > <https://reviews.apache.org/r/64825/diff/1/?file=1927257#file1927257line41>
> >
> > Also here.

done


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64825/#review194721
-----------


On Dec. 23, 2017, 3:48 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64825/
> ---
> 
> (Updated Dec. 23, 2017, 3:48 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Bill Farner.
> 
> 
> Repository: aurora-packaging
> 
> 
> Description
> ---
> 
> * A few new system deps seem to be needed since the pants update.
> * Instead of installing Thrift out of band, we can now rely on the
>   bootstrapping via pants.
> * The virtualenv script is copied from the main Aurora repo.
> 
> 
> Diffs
> -
> 
>   build-support/virtualenv 3ed5b8c26007dc8ac2db3fc3157ba323b33664f2 
>   builder/deb/debian-jessie/Dockerfile 
> 1983fb603c3216d2ace726ec133b5876343cc2df 
>   builder/deb/ubuntu-trusty/Dockerfile 
> 99c5481d367e9f1049bc2ecd331929bc73204442 
>   builder/deb/ubuntu-xenial/Dockerfile 
> dfdce150e38d081c02b991d5a24bec768c070c63 
>   builder/rpm/centos-7/Dockerfile c12a61ffb0b835970e88d724ca3f70e7269f7638 
>   specs/debian/control df863062e53034a6f04f457cd4a204daaa2cf80b 
>   specs/rpm/aurora.spec e1eff9c1f6092273b95104f9f4dd106846639a8d 
>   test/deb/debian-jessie/provision.sh 
> 331a26c9c60a808f941fd54ee136db817bab2789 
>   test/deb/ubuntu-trusty/provision.sh 
> b9cbd438715049a0afdc85da17f79e59b870c113 
>   test/deb/ubuntu-xenial/provision.sh 
> 2adb879ec967bb0f07b3b535b0707972b4ac8d1e 
>   test/rpm/centos-7/Vagrantfile 8342b091eeefae35a592c79cf30ab3db82c349ca 
>   test/rpm/centos-7/provision.sh f41ffd17aaa3eaa314d746980c4f8e56f0373f11 
> 
> 
> Diff: https://reviews.apache.org/r/64825/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-artifact.sh builder/rpm/centos-7/ snapshot.tar.gz 0.20.0-snapshot
> ./test/test-artifact.sh test/rpm/centos-7 
> /repo/artifacts/aurora-centos-7/dist/rpmbuild/RPMS/x86_64
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 64824: Packaging workaround for empty strings parsed as non-empty lists

2018-01-10 Thread Stephan Erb


> On Dec. 27, 2017, 3:19 a.m., Bill Farner wrote:
> > I have mixed feelings about this, but am leaning towards -1 on declaring 
> > this a fix.  Skimming the scheduler parameters, i can't say for sure 
> > whether all `List` parameters are invalid when empty.  To me, this suggests 
> > a code fix is the only answer.
> > 
> > We don't need to wait for my jcommander PR to fix this up; we can use a 
> > custom value for `@Parameter(splitter=CustomSplitter.class)` for  `List` 
> > parameters.  In this case, `CustomSplitter.class` can cargo cult 
> > `CommaParameterSplitter.java` from my jcommander PR.

My main motivation for this patch was that we don't have to go through another 
0.19.1 release cycle to get the packages out of the door.

The bug only applies if one actually adds an argument via the command line, but 
not if the default is used. That is also why most people that build their own 
packages did not run into the problem, yet.

I can follow your reasoning though. Closing.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64824/#review194522
-----------


On Dec. 23, 2017, 3:45 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64824/
> ---
> 
> (Updated Dec. 23, 2017, 3:45 p.m.)
> 
> 
> Review request for Aurora and Bill Farner.
> 
> 
> Bugs: AURORA-1962
> https://issues.apache.org/jira/browse/AURORA-1962
> 
> 
> Repository: aurora-packaging
> 
> 
> Description
> ---
> 
> Ensure we can package 0.19.0 without having to wait for 
> https://github.com/cbeust/jcommander/pull/422
> 
> 
> Diffs
> -
> 
>   specs/debian/aurora-scheduler.startup.sh 
> 3f97a73249cd6601465be8a0031067715290cb9c 
>   specs/debian/aurora-scheduler.upstart 
> 3c3206887070f72061095b342bfbf8b535c82cff 
> 
> 
> Diff: https://reviews.apache.org/r/64824/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 65068: update discovery info documentation, when using mesos-dns

2018-01-10 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65068/#review195135
---


Ship it!




Ship It!

- Stephan Erb


On Jan. 10, 2018, 11:44 a.m., Ruben D. Porras wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65068/
> ---
> 
> (Updated Jan. 10, 2018, 11:44 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> update discovery info documentation, when using mesos-dns
> 
> 
> Diffs
> -
> 
>   docs/features/service-discovery.md 36823c8ee8fc872a3e28fe8861d489a45ad9021b 
> 
> 
> Diff: https://reviews.apache.org/r/65068/diff/1/
> 
> 
> Testing
> ---
> 
> no tests needed. it's a documentation update.
> 
> 
> Thanks,
> 
> Ruben D. Porras
> 
>



Review Request 64825: Update packaging to latest Thrift, Mesos, and virtualenv

2017-12-23 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64825/
---

Review request for Aurora, Santhosh Kumar Shanmugham and Bill Farner.


Repository: aurora-packaging


Description
---

* A few new system deps seem to be needed since the pants update.
* Instead of installing Thrift out of band, we can now rely on the
  bootstrapping via pants.
* The virtualenv script is copied from the main Aurora repo.


Diffs
-

  build-support/virtualenv 3ed5b8c26007dc8ac2db3fc3157ba323b33664f2 
  builder/deb/debian-jessie/Dockerfile 1983fb603c3216d2ace726ec133b5876343cc2df 
  builder/deb/ubuntu-trusty/Dockerfile 99c5481d367e9f1049bc2ecd331929bc73204442 
  builder/deb/ubuntu-xenial/Dockerfile dfdce150e38d081c02b991d5a24bec768c070c63 
  builder/rpm/centos-7/Dockerfile c12a61ffb0b835970e88d724ca3f70e7269f7638 
  specs/debian/control df863062e53034a6f04f457cd4a204daaa2cf80b 
  specs/rpm/aurora.spec e1eff9c1f6092273b95104f9f4dd106846639a8d 
  test/deb/debian-jessie/provision.sh 331a26c9c60a808f941fd54ee136db817bab2789 
  test/deb/ubuntu-trusty/provision.sh b9cbd438715049a0afdc85da17f79e59b870c113 
  test/deb/ubuntu-xenial/provision.sh 2adb879ec967bb0f07b3b535b0707972b4ac8d1e 
  test/rpm/centos-7/Vagrantfile 8342b091eeefae35a592c79cf30ab3db82c349ca 
  test/rpm/centos-7/provision.sh f41ffd17aaa3eaa314d746980c4f8e56f0373f11 


Diff: https://reviews.apache.org/r/64825/diff/1/


Testing
---

./build-artifact.sh builder/rpm/centos-7/ snapshot.tar.gz 0.20.0-snapshot
./test/test-artifact.sh test/rpm/centos-7 
/repo/artifacts/aurora-centos-7/dist/rpmbuild/RPMS/x86_64


Thanks,

Stephan Erb



Re: Review Request 64824: Packaging workaround for empty strings parsed as non-empty lists

2017-12-23 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64824/
---

(Updated Dec. 23, 2017, 3:45 p.m.)


Review request for Aurora and Bill Farner.


Bugs: AURORA-1962
https://issues.apache.org/jira/browse/AURORA-1962


Repository: aurora-packaging


Description (updated)
---

Ensure we can package 0.19.0 without having to wait for 
https://github.com/cbeust/jcommander/pull/422


Diffs
-

  specs/debian/aurora-scheduler.startup.sh 
3f97a73249cd6601465be8a0031067715290cb9c 
  specs/debian/aurora-scheduler.upstart 
3c3206887070f72061095b342bfbf8b535c82cff 


Diff: https://reviews.apache.org/r/64824/diff/1/


Testing
---


Thanks,

Stephan Erb



Review Request 64824: Packaging workaround for empty strings parsed as non-empty lists

2017-12-23 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64824/
---

Review request for Aurora and Bill Farner.


Bugs: AURORA-1962
https://issues.apache.org/jira/browse/AURORA-1962


Repository: aurora-packaging


Description
---

Ensure we can package 0.19.0 as is, withou having to wait for 
https://github.com/cbeust/jcommander/pull/422


Diffs
-

  specs/debian/aurora-scheduler.startup.sh 
3f97a73249cd6601465be8a0031067715290cb9c 
  specs/debian/aurora-scheduler.upstart 
3c3206887070f72061095b342bfbf8b535c82cff 


Diff: https://reviews.apache.org/r/64824/diff/1/


Testing
---


Thanks,

Stephan Erb



Re: Review Request 64519: Add a test to detect incompatible storage changes

2017-12-14 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64519/#review193802
---



I am traveling for the next 5 days, so my review will be delayed. Feel free to 
ship without me though.

- Stephan Erb


On Dec. 12, 2017, 2:35 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64519/
> ---
> 
> (Updated Dec. 12, 2017, 2:35 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is intended as a safeguard against future compatibility regressions like 
> [AURORA-1959](https://issues.apache.org/jira/browse/AURORA-1959).
> 
> I approached this with a few goals:
> 
>   - golden files should be text-based and human-readable.  This allows for 
> non-opaque code reviews, and simpler remedy when it's necessary to update the 
> goldens (i.e. copy-pasteable)
>   - guidance for schema evolution should be included directly in test failures
>   - separate detection of 'what the scheduler _can_ read' and 'what the 
> scheduler writes'
>   - reasonably-complete schema coverage with minimal manual labor.  These 
> tests auto-generate structs to mitigate maintenance burden of test code as 
> schemas evolve.
> 
> This is not a replacement for vigilance with data compatibility, but it 
> should at least
> 
> 1. mitigate unintentional breakages in compatibility, especially for new 
> contributors
> 2. draw code reviewer attention to compatibility changes in a patch (signaled 
> by changes to golden files)
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 2bf7e7ba414d36c99a49599fdc8cf3abdc945dc9 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> b7f43e0d6efbddcac640c3d39c7bc56400e68e68 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorageModule.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
> 75ec42aad0b822d6c3dcd5b1307a4fcb86caa5c0 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 4929ecdec90a1ccbcafa4857dea83cec1e2d7fd4 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
>  PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/storage/durability/Generator.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/NonVolatileStorageTest.java
>  eb966d722dc01d1760566bc57358afac722d5fec 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/pruneJobUpdateHistory
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeJob
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeJobUpdate
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeLock
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeQuota
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeTasks
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveFrameworkId
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostAttributes
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobInstanceUpdateEvent
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdateEvent
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveLock
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveQuota
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
>  PRE-CREATION 
>   
> src/test/resources/org/apache/aurora/scheduler/storage

Re: Review Request 64341: Add metadata field to Job object in DSL

2017-12-14 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64341/#review193801
---



I am traveling for the next 5 days, so my review will be delayed. Feel free to 
ship without me though.

- Stephan Erb


On Dec. 11, 2017, 12:13 p.m., Jing Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64341/
> ---
> 
> (Updated Dec. 11, 2017, 12:13 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Bill Farner.
> 
> 
> Bugs: AURORA-1898
> https://issues.apache.org/jira/browse/AURORA-1898
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add metadata field to Job object in DSL
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 54dcc75ef4f4fbf52c4dc60ec0dcbdd7ff3926a2 
>   docs/reference/configuration.md 67d9914f6b43f5eb73fe05b547981737f665489c 
>   src/main/python/apache/aurora/config/schema/base.py 
> a466e78f85d980dc11689ab252b8c70e9cfd3d57 
>   src/main/python/apache/aurora/config/thrift.py 
> eb0014422f99fdc8dcea9b0b75b86cdf78cf2dbb 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> ecefc1845350242fe5113e5c85c4e3db09937ec1 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 76d0ad681689e7aa03401ecea2d2b3123268d7da 
> 
> 
> Diff: https://reviews.apache.org/r/64341/diff/2/
> 
> 
> Testing
> ---
> 
> Test with `.aurora` file as:
> ```
> pkg_path = '/vagrant/hello.py'
> 
> install = Process(
> name='fetch package',
> cmdline='cp {} . && chmod u+x hello.py'.format(pkg_path))
> 
> runner = Process(
> name='hello world',
> cmdline='python -u hello.py')
> 
> hello_task = SequentialTask(
> processes=[install, runner],
> resources=Resources(cpu=1, ram=1*MB, disk=8*MB))
> 
> jobs = [
> Service(cluster='devcluster',
> role='www-data',
> environment='devel',
> name='metadata_test',
> task=hello_task,
> metadata=[Metadata(key='state', value='CA'), Metadata(key='stat', 
> value='NY')]
> )
> ]
> ```
> 
> Metadata information can be found in **job configuration** of UI as:
> ```
> METADATA  
> state: CA
> stat: NY  
> ```
> 
> 
> Thanks,
> 
> Jing Chen
> 
>



Re: Review Request 64286: Recover snapshots via the Op stream

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64286/#review193625
---


Ship it!




As the patch is pretty large and mostly a refactoring that still passes the 
tests, I tried to focus on the bigger picture. This looks OK to me.

- Stephan Erb


On Dec. 12, 2017, 6:46 p.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64286/
> ---
> 
> (Updated Dec. 12, 2017, 6:46 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This cleans up the various interfaces around persisting and recovering from 
> `Snapshot`s.  Most importantly, `LogPersistence` no longer bypasses the 
> `recover()` `Op` stream to apply snapshots.  As a result, it should be 
> straightforward to build a migration utility that clones `LogPersistence` 
> state into another `Persistence` implementation.
> 
> **Reviewers**
> Apologies for the large patch.  These components were circular, and i did not 
> find a clean way to break the important pieces of this change apart.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SnapshotBenchmarks.java 
> 755582da0e6ae9fcbfb191c195f86a003d00eb2a 
>   
> src/main/java/org/apache/aurora/scheduler/storage/DistributedSnapshotStore.java
>  0c6a955279f4cadac87c765d91602ac307162f7d 
>   src/main/java/org/apache/aurora/scheduler/storage/SnapshotStore.java 
> 6b5e5dd5b4d1e28f6d604bcc87c983453574adbf 
>   src/main/java/org/apache/aurora/scheduler/storage/Snapshotter.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/backup/BackupModule.java 
> 7eaae89d63bf45d6ddc3cffd8ee62d4549e0cc10 
>   src/main/java/org/apache/aurora/scheduler/storage/backup/Recovery.java 
> 3a62f02212971f1f73933018ac11702a7041b7d4 
>   src/main/java/org/apache/aurora/scheduler/storage/backup/StorageBackup.java 
> 2d61678c76d350286415265fef5056d5377d9788 
>   
> src/main/java/org/apache/aurora/scheduler/storage/backup/TemporaryStorage.java
>  18296b0d9674c05ce4b6b36dbcdc060bc2d17b0a 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  6a7c0add489ab1599fabe676e73582d575aa8c6e 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/Persistence.java 
> 9eb862c01bf451252bfbcc7a2eac60d2c965c9f0 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteAheadStorage.java
>  667db062e791a916853edb7fabd6235a2e9bc272 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogPersistence.java 
> e70e6051582ca90ae72014626b983bbf4b8d5b48 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
> 75ec42aad0b822d6c3dcd5b1307a4fcb86caa5c0 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImpl.java 
> 739fad75c4d569a7f2ec01875a4ae63893ed7ac8 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  159fb29cd575f1d7bdba2fe0c98ae5381097a451 
>   src/test/java/org/apache/aurora/scheduler/app/local/LocalSchedulerMain.java 
> aeb8685e4e2d65c70636965a3a615e426c7a5ec2 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 5cb5310ed096ca1fb47b980401e3712948271ac4 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> 09560f4929a252568ed487fbf27aac5a83e62c79 
>   
> src/test/java/org/apache/aurora/scheduler/storage/backup/StorageBackupTest.java
>  fff376fabba37ea450ade469e48c5e85d6b3f96a 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  3ad40adf58128ff41c5b88e3c5a7814b139e7d50 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/WriteAheadStorageTest.java
>  e8b564b0c7fb82d4057f6ee8dcf7b327f1c5e18c 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/LogPersistenceTest.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/NonVolatileStorageTest.java
>  eb966d722dc01d1760566bc57358afac722d5fec 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotServiceTest.java
>  PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotStoreImplIT.java
>  5634f921cc3d9b1773af0a7ab5a5150756501df3 
&

Re: Review Request 64290: Update to Thrift 0.10.0

2017-12-12 Thread Stephan Erb


> On Dec. 13, 2017, 12:11 a.m., Aurora ReviewBot wrote:
> > Master (65c2288) is green with this patch.
> >   ./build-support/jenkins/build.sh
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"

Finally :)


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64290/#review193623
---


On Dec. 9, 2017, 5:32 p.m., John Sirois wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64290/
> ---
> 
> (Updated Dec. 9, 2017, 5:32 p.m.)
> 
> 
> Review request for Aurora, Stephan Erb and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update to Thrift 0.10.0 (https://github.com/apache/thrift/blob/master/CHANGES)
> 
> Included changes:
> 
> * Update to the latest pants version. This was necesary to get ./pants gen 
> working.
> * The Java hashcode option has been removed as it is now the default.
> 
> 
> Diffs
> -
> 
>   .isort.cfg f53ccba7d1d45adcb4753cfd05c53bbec5c89dde 
>   3rdparty/python/requirements.txt 85a23d7770d82bfc74b1fc847170a8779dd7249f 
>   api/src/main/thrift/org/apache/aurora/gen/BUILD 
> 835b0cc6f51a592f7d5b6129dbe9519455eb6555 
>   api/src/main/thrift/org/apache/thermos/BUILD 
> b58338e76bdee3481071d8f27235a9dbb6bf07b5 
>   build-support/jenkins/build.sh f59fe278aea55fc3ffa1b5dff7b686cfcfae1ee7 
>   build-support/packer/build.sh 85444125abc0c7e600a09933411e57c0d74051ac 
>   build-support/python/checkstyle-check 
> 0c8800232e20347b2be9a538ffed1d47a7815241 
>   build-support/python/isort 37e6cd23d2437055a15abfec6924c1bdd051de20 
>   build-support/python/isort-check a1bb94c99250f54f8fc261fb311cd34ce8355c78 
>   build-support/python/isort-run 1b39d41212d6e870c55d420a9d844fd67dd40eae 
>   build-support/release/make-python-sdists 
> 51e11dbc1929609374031c9b23a96f035eba902b 
>   build-support/thrift/.gitignore 82c5bc5cd773ecf3ec194de41a340a3ef9b83aa5 
>   build-support/thrift/AURORA-1727.compiler.cpp.Makefile.in.patch 
> 77c966caa3d1f644241bcc2b1968bc9306c56689 
>   
> build-support/thrift/AURORA-1727.compiler.cpp.src.generate.t_java_generator.cc.patch
>  42300b43a8f72e45c96b975e5d3a6a7bd0283529 
>   build-support/thrift/AURORA-1727.lib.py.setup.py.patch 
> 11c7b13341e156f3686511cb40ab13c1256203a6 
>   build-support/thrift/BUILD ab19f1f68682d88f731a463c15591e45a317e760 
>   build-support/thrift/Makefile f440b610afe321af663e393a29eebda7af7bd7a8 
>   build-support/thrift/prepare_binary.sh 
> 4ad997bf039294f7940b93a76ebf014689f8f618 
>   build-support/thrift/thriftw c8debd07bc9da97fb58db795e67c9ac82cc30bc1 
>   build.gradle 76dcc040a822db2e209646e62b71ae45bf3f4793 
>   buildSrc/src/main/groovy/org/apache/aurora/build/ThriftPlugin.groovy 
> fc2bc9082dae2c63aa578c05dc89feb346260a67 
>   pants.ini 0671d9ab6381e5b9c324dc09a891a639cbfb2ccc 
>   src/main/python/apache/aurora/client/cli/BUILD 
> 80a6dec9acf5b64cff64143929916676b79be686 
>   src/main/python/apache/thermos/observer/BUILD 
> 95b8dcd7d123d3a2bcb22df7efc60374ef919bf7 
>   src/test/python/apache/aurora/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/admin/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/api/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/binding_helpers/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/cli/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/docker/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/common/health_check/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/executor/common/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/thermos/core/test_process.py 
> 6cb9176e14eccbe7ed10501199a34e5e67d6fe44 
> 
> 
> Diff: https://reviews.apache.org/r/64290/diff/4/
> 
> 
> Testing
> ---
> 
> ```
> $ ./build-support/jenkins/build.sh
> $ ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> ```
> 
> 
> Thanks,
> 
> John Sirois
> 
>



Re: Review Request 64341: Add metadata field to Job object in DSL

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64341/#review193618
---



@ReviewBot retry

- Stephan Erb


On Dec. 11, 2017, 12:13 p.m., Jing Chen wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64341/
> ---
> 
> (Updated Dec. 11, 2017, 12:13 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Stephan Erb, and Bill Farner.
> 
> 
> Bugs: AURORA-1898
> https://issues.apache.org/jira/browse/AURORA-1898
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add metadata field to Job object in DSL
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 54dcc75ef4f4fbf52c4dc60ec0dcbdd7ff3926a2 
>   docs/reference/configuration.md 67d9914f6b43f5eb73fe05b547981737f665489c 
>   src/main/python/apache/aurora/config/schema/base.py 
> a466e78f85d980dc11689ab252b8c70e9cfd3d57 
>   src/main/python/apache/aurora/config/thrift.py 
> eb0014422f99fdc8dcea9b0b75b86cdf78cf2dbb 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> ecefc1845350242fe5113e5c85c4e3db09937ec1 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 76d0ad681689e7aa03401ecea2d2b3123268d7da 
> 
> 
> Diff: https://reviews.apache.org/r/64341/diff/2/
> 
> 
> Testing
> ---
> 
> Test with `.aurora` file as:
> ```
> pkg_path = '/vagrant/hello.py'
> 
> install = Process(
> name='fetch package',
> cmdline='cp {} . && chmod u+x hello.py'.format(pkg_path))
> 
> runner = Process(
> name='hello world',
> cmdline='python -u hello.py')
> 
> hello_task = SequentialTask(
> processes=[install, runner],
> resources=Resources(cpu=1, ram=1*MB, disk=8*MB))
> 
> jobs = [
> Service(cluster='devcluster',
> role='www-data',
> environment='devel',
> name='metadata_test',
> task=hello_task,
> metadata=[Metadata(key='state', value='CA'), Metadata(key='stat', 
> value='NY')]
> )
> ]
> ```
> 
> Metadata information can be found in **job configuration** of UI as:
> ```
> METADATA  
> state: CA
> stat: NY  
> ```
> 
> 
> Thanks,
> 
> Jing Chen
> 
>



Re: Review Request 64290: Update to Thrift 0.10.0

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64290/#review193617
---



@ReviewBot retry

- Stephan Erb


On Dec. 9, 2017, 5:32 p.m., John Sirois wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64290/
> ---
> 
> (Updated Dec. 9, 2017, 5:32 p.m.)
> 
> 
> Review request for Aurora, Stephan Erb and Bill Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update to Thrift 0.10.0 (https://github.com/apache/thrift/blob/master/CHANGES)
> 
> Included changes:
> 
> * Update to the latest pants version. This was necesary to get ./pants gen 
> working.
> * The Java hashcode option has been removed as it is now the default.
> 
> 
> Diffs
> -
> 
>   .isort.cfg f53ccba7d1d45adcb4753cfd05c53bbec5c89dde 
>   3rdparty/python/requirements.txt 85a23d7770d82bfc74b1fc847170a8779dd7249f 
>   api/src/main/thrift/org/apache/aurora/gen/BUILD 
> 835b0cc6f51a592f7d5b6129dbe9519455eb6555 
>   api/src/main/thrift/org/apache/thermos/BUILD 
> b58338e76bdee3481071d8f27235a9dbb6bf07b5 
>   build-support/jenkins/build.sh f59fe278aea55fc3ffa1b5dff7b686cfcfae1ee7 
>   build-support/packer/build.sh 85444125abc0c7e600a09933411e57c0d74051ac 
>   build-support/python/checkstyle-check 
> 0c8800232e20347b2be9a538ffed1d47a7815241 
>   build-support/python/isort 37e6cd23d2437055a15abfec6924c1bdd051de20 
>   build-support/python/isort-check a1bb94c99250f54f8fc261fb311cd34ce8355c78 
>   build-support/python/isort-run 1b39d41212d6e870c55d420a9d844fd67dd40eae 
>   build-support/release/make-python-sdists 
> 51e11dbc1929609374031c9b23a96f035eba902b 
>   build-support/thrift/.gitignore 82c5bc5cd773ecf3ec194de41a340a3ef9b83aa5 
>   build-support/thrift/AURORA-1727.compiler.cpp.Makefile.in.patch 
> 77c966caa3d1f644241bcc2b1968bc9306c56689 
>   
> build-support/thrift/AURORA-1727.compiler.cpp.src.generate.t_java_generator.cc.patch
>  42300b43a8f72e45c96b975e5d3a6a7bd0283529 
>   build-support/thrift/AURORA-1727.lib.py.setup.py.patch 
> 11c7b13341e156f3686511cb40ab13c1256203a6 
>   build-support/thrift/BUILD ab19f1f68682d88f731a463c15591e45a317e760 
>   build-support/thrift/Makefile f440b610afe321af663e393a29eebda7af7bd7a8 
>   build-support/thrift/prepare_binary.sh 
> 4ad997bf039294f7940b93a76ebf014689f8f618 
>   build-support/thrift/thriftw c8debd07bc9da97fb58db795e67c9ac82cc30bc1 
>   build.gradle 76dcc040a822db2e209646e62b71ae45bf3f4793 
>   buildSrc/src/main/groovy/org/apache/aurora/build/ThriftPlugin.groovy 
> fc2bc9082dae2c63aa578c05dc89feb346260a67 
>   pants.ini 0671d9ab6381e5b9c324dc09a891a639cbfb2ccc 
>   src/main/python/apache/aurora/client/cli/BUILD 
> 80a6dec9acf5b64cff64143929916676b79be686 
>   src/main/python/apache/thermos/observer/BUILD 
> 95b8dcd7d123d3a2bcb22df7efc60374ef919bf7 
>   src/test/python/apache/aurora/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/admin/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/api/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/binding_helpers/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/cli/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/docker/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/common/health_check/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/executor/common/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/thermos/core/test_process.py 
> 6cb9176e14eccbe7ed10501199a34e5e67d6fe44 
> 
> 
> Diff: https://reviews.apache.org/r/64290/diff/4/
> 
> 
> Testing
> ---
> 
> ```
> $ ./build-support/jenkins/build.sh
> $ ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> ```
> 
> 
> Thanks,
> 
> John Sirois
> 
>



Re: Review Request 64523: Attempt #2 to fix flaky Webhook test

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193616
---


Ship it!




Ship It!

- Stephan Erb


On Dec. 12, 2017, 8:15 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> ---
> 
> (Updated Dec. 12, 2017, 8:15 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Attempt #2 to fix the flaky Webhook test.
> 
> Along the same lines of Stephan's change 
> (https://reviews.apache.org/r/64482/), but waiting for `onThrowable` to 
> finish since it is called asyncronously to the future being completed.
> 
> Overall, this flakiness has seemed to increase in volume over the past month. 
> I've been running into it with a noticable fraction of internal/test builds.
> 
> 
> Diffs
> -
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
> 
> 
> Diff: https://reviews.apache.org/r/64523/diff/2/
> 
> 
> Testing
> ---
> 
> This change seems pretty hard to test considering the differences in 
> environment and the unknown cause of the actual errors. Maybe we run the 
> reviewbot on this repo repeatedly? Obviously not the most scientifically 
> sound solution.
> 
> EDIT 12/12: I tested this by putting a long sleep in `onThrowable` which 
> causes the issue in master and is fixed with this patch.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 64288: Add a SQL persistence implementation

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64288/#review193234
---


Ship it!




The code looks fine and reasonable to me. I would still recommend proper scale 
testing though.

At my company, we operate at a small scale and the Mesos replicated log works 
still well for us. Operating a HA MySQL or PostgreSQL cluster would come with a 
higher operational burden. If possible, I would therefore like to see that 
Aurora continues to support its simple replicated log deployment mode for now. 
I believe that other memobers of the community might feel similar. I can 
totally see though that this does not work for Twitter


build.gradle
Lines 385 (patched)
<https://reviews.apache.org/r/64288/#comment271792>

License is Apache 2.0 so all good.

https://github.com/brettwooldridge/HikariCP/blob/dev/LICENSE



src/main/resources/org/apache/aurora/scheduler/storage/sql/schema.sql
Lines 4-8 (patched)
<https://reviews.apache.org/r/64288/#comment272174>

Just to check we stay within the comfort zone of a DBMS: How many entries 
would you expect in this table?

(From a quick uninformed back-of-the-envelope calculation I would expect 
roughtly 5 million entries for a Twitter style cluster, which would be pretty 
reasonable)


- Stephan Erb


On Dec. 7, 2017, 7:27 p.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64288/
> ---
> 
> (Updated Dec. 7, 2017, 7:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Introduces a `Persistence` implementation that uses a SQL database via JDBC.  
> I've opted to lean towards MySQL SQL dialect, as unfortunately there are 
> vendor differences for even the very simple SQL used here.
> 
> I chose [HikariCP](https://github.com/brettwooldridge/HikariCP) to serve as a 
> `DataSource` (connection pool) implementation.  We don't really need much out 
> of a connection pool aside from general connection lifecycle management (i.e. 
> not for concurrency).  I chose this library based on recent development 
> activity and several positive comparisons to other pools.
> 
> Note that the implementation is not yet wired into the scheduler application. 
>  That will come in a follow-up.
> 
> 
> Diffs
> -
> 
>   build.gradle af119910e84c48f75f2573ababcfa287c3b986fc 
>   config/spotbugs/excludeFilter.xml 51790cce8d9047e40741f05ee55af15dbdc3065e 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> dbbe1d1689ed3e455a95f529f914dc6823427d37 
>   src/dist/etc/h2-database.properties PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/app/SchedulerMain.java 
> 7ffcf4f471b97e32426c82972472115c5c5c4d02 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> b7f43e0d6efbddcac640c3d39c7bc56400e68e68 
>   
> src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java 
> a4984a95f938396c82244f91e4a3d592df1c1539 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  85b2113631586f43d854c4d2812f43b7b864d452 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorageModule.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/Persistence.java 
> 9eb862c01bf451252bfbcc7a2eac60d2c965c9f0 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogPersistence.java 
> e70e6051582ca90ae72014626b983bbf4b8d5b48 
>   src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java 
> 75ec42aad0b822d6c3dcd5b1307a4fcb86caa5c0 
>   
> src/main/java/org/apache/aurora/scheduler/storage/sql/DisabledDistributedSnapshotStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/sql/Mode.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistence.java 
> PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/storage/sql/SqlPersistenceModule.java
>  PRE-CREATION 
>   src/main/resources/org/apache/aurora/scheduler/storage/sql/schema.sql 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 4929ecdec90a1ccbcafa4857dea83cec1e2d7fd4 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 5cb5310ed096ca1fb47b980401e3712948271ac4 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/NonVolatileStorageTest.java
>  eb966d722dc01d1760566bc57358afac722d5

Re: Review Request 64523: Attempt #2 to fix flaky Webhook test

2017-12-12 Thread Stephan Erb


> On Dec. 12, 2017, 10:46 p.m., Aurora ReviewBot wrote:
> > Master (301f066) is red with this patch.
> >   ./build-support/jenkins/build.sh
> > 
> >  # Create file stdout for capturing output. 
> > We can't use StringIO mock
> >  # because TestProcess is running fork.
> >  with open(os.path.join(td, 'sys_stdout'), 
> > 'w+') as stdout:
> >    with open(os.path.join(td, 
> > 'sys_stderr'), 'w+') as stderr:
> >  with mutable_sys():
> >    sys.stdout, sys.stderr = stdout, 
> > stderr
> >  
> >    p = TestProcess('process', 'echo 
> > hello world; echo >&2 hello stderr', 0,
> >    taskpath, sandbox, 
> > logger_destination=LoggerDestination.BOTH)
> >    p.start()
> >    rc = 
> > wait_for_rc(taskpath.getpath('process_checkpoint'))
> >  
> >    assert rc == 0
> >    # Check log files were created in 
> > std path with correct content
> >  > assert_log_content(taskpath, 
> > 'stdout', 'hello world\n')
> >  
> >  
> > src/test/python/apache/thermos/core/test_process.py:538: 
> >  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> >  
> >  taskpath =  > at 0x7f9138423590>
> >  log_name = 'stdout'
> >  expected_content = 'hello world\n'
> >  
> >  def assert_log_content(taskpath, log_name, 
> > expected_content):
> >    log = 
> > taskpath.with_filename(log_name).getpath('process_logdir')
> >    assert os.path.exists(log)
> >    with open(log, 'r') as fp:
> >  >   assert fp.read() == expected_content
> >  E   assert '' == 'hello world\n'
> >  E + hello world
> >  
> >  
> > src/test/python/apache/thermos/core/test_process.py:364: AssertionError
> >   generated xml file: 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/aaf4d108c31293299a0839bdc404a91802f80937.xml
> >  
> >   1 failed, 794 passed, 6 skipped, 1 warnings 
> > in 461.64 seconds 
> >  
> > FAILURE
> > 
> > 
> > 21:46:32 08:16   [complete]
> >FAILURE
> > 
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"
> 
> Jordan Ly wrote:
> I've never seen this fail before :(

This is just not our week :)


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193608
-------


On Dec. 12, 2017, 8:15 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> ---
> 
> (Updated Dec. 12, 2017, 8:15 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Attempt #2 to fix the flaky Webhook test.
> 
> Along the same lines of Stephan's change 
> (https://reviews.apache.org/r/64482/), but waiting for `onThrowable` to 
> finish since it is called asyncronously to the future being completed.
> 
> Overall, this flakiness has seemed to increase in volume over the past month. 
> I've been running into it with a noticable fraction of internal/test builds.
> 
> 
> Diffs
> -
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
> 
> 
> Diff: https://reviews.apache.org/r/64523/diff/2/
> 
> 
> Testing
> ---
> 
> This change seems pretty hard to test considering the differences in 
> environment and the unknown cause of the actual errors. Maybe we run the 
> reviewbot on this repo repeatedly? Obviously not the most scientifically 
> sound solution.
> 
> EDIT 12/12: I tested this by putting a long sleep in `onThrowable` which 
> causes the issue in master and is fixed with this patch.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 64523: Attempt #2 to fix flaky Webhook test

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193607
---



@ReviewBot retry

- Stephan Erb


On Dec. 12, 2017, 8:15 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> ---
> 
> (Updated Dec. 12, 2017, 8:15 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Attempt #2 to fix the flaky Webhook test.
> 
> Along the same lines of Stephan's change 
> (https://reviews.apache.org/r/64482/), but waiting for `onThrowable` to 
> finish since it is called asyncronously to the future being completed.
> 
> Overall, this flakiness has seemed to increase in volume over the past month. 
> I've been running into it with a noticable fraction of internal/test builds.
> 
> 
> Diffs
> -
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
> 
> 
> Diff: https://reviews.apache.org/r/64523/diff/2/
> 
> 
> Testing
> ---
> 
> This change seems pretty hard to test considering the differences in 
> environment and the unknown cause of the actual errors. Maybe we run the 
> reviewbot on this repo repeatedly? Obviously not the most scientifically 
> sound solution.
> 
> EDIT 12/12: I tested this by putting a long sleep in `onThrowable` which 
> causes the issue in master and is fixed with this patch.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 64523: Attempt #2 to fix flaky Webhook test

2017-12-12 Thread Stephan Erb


> On Dec. 12, 2017, 8:56 p.m., Aurora ReviewBot wrote:
> > Master (301f066) is green with this patch.
> >   ./build-support/jenkins/build.sh
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"

let's give it one mor try :)


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193603
---


On Dec. 12, 2017, 8:15 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> ---
> 
> (Updated Dec. 12, 2017, 8:15 p.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Attempt #2 to fix the flaky Webhook test.
> 
> Along the same lines of Stephan's change 
> (https://reviews.apache.org/r/64482/), but waiting for `onThrowable` to 
> finish since it is called asyncronously to the future being completed.
> 
> Overall, this flakiness has seemed to increase in volume over the past month. 
> I've been running into it with a noticable fraction of internal/test builds.
> 
> 
> Diffs
> -
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
> 
> 
> Diff: https://reviews.apache.org/r/64523/diff/2/
> 
> 
> Testing
> ---
> 
> This change seems pretty hard to test considering the differences in 
> environment and the unknown cause of the actual errors. Maybe we run the 
> reviewbot on this repo repeatedly? Obviously not the most scientifically 
> sound solution.
> 
> EDIT 12/12: I tested this by putting a long sleep in `onThrowable` which 
> causes the issue in master and is fixed with this patch.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 64523: Attempt #2 to fix flaky Webhook test

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193590
---



@ReviewBot retry

- Stephan Erb


On Dec. 12, 2017, 3:29 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> ---
> 
> (Updated Dec. 12, 2017, 3:29 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Attempt #2 to fix the flaky Webhook test.
> 
> Along the same lines of Stephan's change 
> (https://reviews.apache.org/r/64482/), but opting to never start the Jetty 
> server and forcing a connect exception.
> 
> Overall, this flakiness has seemed to increase in volume over the past month. 
> I've been running into it with a noticable fraction of internal/test builds.
> 
> 
> Diffs
> -
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
> 
> 
> Diff: https://reviews.apache.org/r/64523/diff/1/
> 
> 
> Testing
> ---
> 
> This change seems pretty hard to test considering the differences in 
> environment and the unknown cause of the actual errors. Maybe we run the 
> reviewbot on this repo repeatedly? Obviously not the most scientifically 
> sound solution.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 64523: Attempt #2 to fix flaky Webhook test

2017-12-12 Thread Stephan Erb
eCalledField.compareAndSet(this, 0, 1)) {
> try {
> asyncHandler.onThrowable(t);
> } catch (Throwable te) {
> LOGGER.debug("asyncHandler.onThrowable", te);
> }
> }
> }
> ```
> 
> The `future.completeExceptionally(t)` call causes `future.get()` above to 
> throw `ExecutionException`, and our main test thread continues.  Netty's 
> thread separately proceeds to call `asyncHandler.onThrowable(t)`, which races 
> with the main thread.
> 
> I have not looked into the internals of `CompletableFuture#join()` to say 
> why it differs, but it does pass consistently for me.

Jordan, will you update your PR? Or do you plan to file one, Bill?


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64523/#review193491
---


On Dec. 12, 2017, 3:29 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64523/
> ---
> 
> (Updated Dec. 12, 2017, 3:29 a.m.)
> 
> 
> Review request for Aurora and Stephan Erb.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Attempt #2 to fix the flaky Webhook test.
> 
> Along the same lines of Stephan's change 
> (https://reviews.apache.org/r/64482/), but opting to never start the Jetty 
> server and forcing a connect exception.
> 
> Overall, this flakiness has seemed to increase in volume over the past month. 
> I've been running into it with a noticable fraction of internal/test builds.
> 
> 
> Diffs
> -
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
> 
> 
> Diff: https://reviews.apache.org/r/64523/diff/1/
> 
> 
> Testing
> ---
> 
> This change seems pretty hard to test considering the differences in 
> environment and the unknown cause of the actual errors. Maybe we run the 
> reviewbot on this repo repeatedly? Obviously not the most scientifically 
> sound solution.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 64534: Reproduce Kerberos Python Build Issues

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64534/#review193530
---



@ReviewBot retry

- Stephan Erb


On Dec. 12, 2017, 12:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64534/
> ---
> 
> (Updated Dec. 12, 2017, 12:38 p.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> * Johns Thrift patch 
> * remove flaky Java patch
> 
> 
> Diffs
> -
> 
>   .isort.cfg f53ccba7d1d45adcb4753cfd05c53bbec5c89dde 
>   3rdparty/python/requirements.txt 85a23d7770d82bfc74b1fc847170a8779dd7249f 
>   api/src/main/thrift/org/apache/aurora/gen/BUILD 
> 835b0cc6f51a592f7d5b6129dbe9519455eb6555 
>   api/src/main/thrift/org/apache/thermos/BUILD 
> b58338e76bdee3481071d8f27235a9dbb6bf07b5 
>   build-support/jenkins/build.sh f59fe278aea55fc3ffa1b5dff7b686cfcfae1ee7 
>   build-support/packer/build.sh 85444125abc0c7e600a09933411e57c0d74051ac 
>   build-support/python/checkstyle-check 
> 0c8800232e20347b2be9a538ffed1d47a7815241 
>   build-support/python/isort 37e6cd23d2437055a15abfec6924c1bdd051de20 
>   build-support/python/isort-check a1bb94c99250f54f8fc261fb311cd34ce8355c78 
>   build-support/python/isort-run 1b39d41212d6e870c55d420a9d844fd67dd40eae 
>   build-support/release/make-python-sdists 
> 51e11dbc1929609374031c9b23a96f035eba902b 
>   build-support/thrift/.gitignore 82c5bc5cd773ecf3ec194de41a340a3ef9b83aa5 
>   build-support/thrift/AURORA-1727.compiler.cpp.Makefile.in.patch 
> 77c966caa3d1f644241bcc2b1968bc9306c56689 
>   
> build-support/thrift/AURORA-1727.compiler.cpp.src.generate.t_java_generator.cc.patch
>  42300b43a8f72e45c96b975e5d3a6a7bd0283529 
>   build-support/thrift/AURORA-1727.lib.py.setup.py.patch 
> 11c7b13341e156f3686511cb40ab13c1256203a6 
>   build-support/thrift/BUILD ab19f1f68682d88f731a463c15591e45a317e760 
>   build-support/thrift/Makefile f440b610afe321af663e393a29eebda7af7bd7a8 
>   build-support/thrift/prepare_binary.sh 
> 4ad997bf039294f7940b93a76ebf014689f8f618 
>   build-support/thrift/thriftw c8debd07bc9da97fb58db795e67c9ac82cc30bc1 
>   build.gradle 76dcc040a822db2e209646e62b71ae45bf3f4793 
>   buildSrc/src/main/groovy/org/apache/aurora/build/ThriftPlugin.groovy 
> fc2bc9082dae2c63aa578c05dc89feb346260a67 
>   pants.ini 0671d9ab6381e5b9c324dc09a891a639cbfb2ccc 
>   src/main/python/apache/aurora/client/cli/BUILD 
> 80a6dec9acf5b64cff64143929916676b79be686 
>   src/main/python/apache/thermos/observer/BUILD 
> 95b8dcd7d123d3a2bcb22df7efc60374ef919bf7 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
>   src/test/python/apache/aurora/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/admin/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/api/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/binding_helpers/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/cli/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/docker/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/common/health_check/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/executor/common/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/thermos/core/test_process.py 
> 6cb9176e14eccbe7ed10501199a34e5e67d6fe44 
> 
> 
> Diff: https://reviews.apache.org/r/64534/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 64508: Convert carriage returns to newlines in reviews

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64508/#review193528
---



Unfortunately the patch I submitted broke review requests. Sorry for being 
sloppy here.

I have pushed 
https://github.com/apache/aurora/commit/301f066369a9ef7262c1702c77004d12bc8eac00
 unbreak the build.

- Stephan Erb


On Dec. 12, 2017, 10:05 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64508/
> ---
> 
> (Updated Dec. 12, 2017, 10:05 a.m.)
> 
> 
> Review request for Aurora and Bill Farner.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Spotbugs prints multiple lines with \r [1][2]. This looks like a single
> line for shell commands but will be converted to multiple lines once
> read by Python.
> 
> By performing the conversion before the tail command, we will get a
> consistent line count in Bash and Python.
> 
> [1] 
> https://github.com/spotbugs/spotbugs/blob/fe8a8d66e97d3ae0b830731461aab0f8b39791f6/spotbugs/src/main/java/edu/umd/cs/findbugs/TextUIProgressCallback.java#L103
> [2] https://github.com/spotbugs/spotbugs/issues/506
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/review_feedback.py 
> a6eeceeb3815b0f49a66c9d6b459d9ce64f515a3 
> 
> 
> Diff: https://reviews.apache.org/r/64508/diff/2/
> 
> 
> Testing
> ---
> 
> ./gradlew --console=plain -Pq build 2>&1 | tr -u "\r" "\n" | tee build_outpu
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 64534: Reproduce Kerberos Python Build Issues

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64534/#review193527
---



@ReviewBot retry

- Stephan Erb


On Dec. 12, 2017, 12:38 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64534/
> ---
> 
> (Updated Dec. 12, 2017, 12:38 p.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> * Johns Thrift patch 
> * remove flaky Java patch
> 
> 
> Diffs
> -
> 
>   .isort.cfg f53ccba7d1d45adcb4753cfd05c53bbec5c89dde 
>   3rdparty/python/requirements.txt 85a23d7770d82bfc74b1fc847170a8779dd7249f 
>   api/src/main/thrift/org/apache/aurora/gen/BUILD 
> 835b0cc6f51a592f7d5b6129dbe9519455eb6555 
>   api/src/main/thrift/org/apache/thermos/BUILD 
> b58338e76bdee3481071d8f27235a9dbb6bf07b5 
>   build-support/jenkins/build.sh f59fe278aea55fc3ffa1b5dff7b686cfcfae1ee7 
>   build-support/packer/build.sh 85444125abc0c7e600a09933411e57c0d74051ac 
>   build-support/python/checkstyle-check 
> 0c8800232e20347b2be9a538ffed1d47a7815241 
>   build-support/python/isort 37e6cd23d2437055a15abfec6924c1bdd051de20 
>   build-support/python/isort-check a1bb94c99250f54f8fc261fb311cd34ce8355c78 
>   build-support/python/isort-run 1b39d41212d6e870c55d420a9d844fd67dd40eae 
>   build-support/release/make-python-sdists 
> 51e11dbc1929609374031c9b23a96f035eba902b 
>   build-support/thrift/.gitignore 82c5bc5cd773ecf3ec194de41a340a3ef9b83aa5 
>   build-support/thrift/AURORA-1727.compiler.cpp.Makefile.in.patch 
> 77c966caa3d1f644241bcc2b1968bc9306c56689 
>   
> build-support/thrift/AURORA-1727.compiler.cpp.src.generate.t_java_generator.cc.patch
>  42300b43a8f72e45c96b975e5d3a6a7bd0283529 
>   build-support/thrift/AURORA-1727.lib.py.setup.py.patch 
> 11c7b13341e156f3686511cb40ab13c1256203a6 
>   build-support/thrift/BUILD ab19f1f68682d88f731a463c15591e45a317e760 
>   build-support/thrift/Makefile f440b610afe321af663e393a29eebda7af7bd7a8 
>   build-support/thrift/prepare_binary.sh 
> 4ad997bf039294f7940b93a76ebf014689f8f618 
>   build-support/thrift/thriftw c8debd07bc9da97fb58db795e67c9ac82cc30bc1 
>   build.gradle 76dcc040a822db2e209646e62b71ae45bf3f4793 
>   buildSrc/src/main/groovy/org/apache/aurora/build/ThriftPlugin.groovy 
> fc2bc9082dae2c63aa578c05dc89feb346260a67 
>   pants.ini 0671d9ab6381e5b9c324dc09a891a639cbfb2ccc 
>   src/main/python/apache/aurora/client/cli/BUILD 
> 80a6dec9acf5b64cff64143929916676b79be686 
>   src/main/python/apache/thermos/observer/BUILD 
> 95b8dcd7d123d3a2bcb22df7efc60374ef919bf7 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 1b5d2d02535345edfe6cf04d18d00434393f800b 
>   src/test/python/apache/aurora/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/admin/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/api/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/binding_helpers/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/cli/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/client/docker/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/common/health_check/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/aurora/executor/common/__init__.py 
> 0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
>   src/test/python/apache/thermos/core/test_process.py 
> 6cb9176e14eccbe7ed10501199a34e5e67d6fe44 
> 
> 
> Diff: https://reviews.apache.org/r/64534/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Review Request 64534: Reproduce Kerberos Python Build Issues

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64534/
---

Review request for Aurora.


Repository: aurora


Description
---

* Johns Thrift patch 
* remove flaky Java patch


Diffs
-

  .isort.cfg f53ccba7d1d45adcb4753cfd05c53bbec5c89dde 
  3rdparty/python/requirements.txt 85a23d7770d82bfc74b1fc847170a8779dd7249f 
  api/src/main/thrift/org/apache/aurora/gen/BUILD 
835b0cc6f51a592f7d5b6129dbe9519455eb6555 
  api/src/main/thrift/org/apache/thermos/BUILD 
b58338e76bdee3481071d8f27235a9dbb6bf07b5 
  build-support/jenkins/build.sh f59fe278aea55fc3ffa1b5dff7b686cfcfae1ee7 
  build-support/packer/build.sh 85444125abc0c7e600a09933411e57c0d74051ac 
  build-support/python/checkstyle-check 
0c8800232e20347b2be9a538ffed1d47a7815241 
  build-support/python/isort 37e6cd23d2437055a15abfec6924c1bdd051de20 
  build-support/python/isort-check a1bb94c99250f54f8fc261fb311cd34ce8355c78 
  build-support/python/isort-run 1b39d41212d6e870c55d420a9d844fd67dd40eae 
  build-support/release/make-python-sdists 
51e11dbc1929609374031c9b23a96f035eba902b 
  build-support/thrift/.gitignore 82c5bc5cd773ecf3ec194de41a340a3ef9b83aa5 
  build-support/thrift/AURORA-1727.compiler.cpp.Makefile.in.patch 
77c966caa3d1f644241bcc2b1968bc9306c56689 
  
build-support/thrift/AURORA-1727.compiler.cpp.src.generate.t_java_generator.cc.patch
 42300b43a8f72e45c96b975e5d3a6a7bd0283529 
  build-support/thrift/AURORA-1727.lib.py.setup.py.patch 
11c7b13341e156f3686511cb40ab13c1256203a6 
  build-support/thrift/BUILD ab19f1f68682d88f731a463c15591e45a317e760 
  build-support/thrift/Makefile f440b610afe321af663e393a29eebda7af7bd7a8 
  build-support/thrift/prepare_binary.sh 
4ad997bf039294f7940b93a76ebf014689f8f618 
  build-support/thrift/thriftw c8debd07bc9da97fb58db795e67c9ac82cc30bc1 
  build.gradle 76dcc040a822db2e209646e62b71ae45bf3f4793 
  buildSrc/src/main/groovy/org/apache/aurora/build/ThriftPlugin.groovy 
fc2bc9082dae2c63aa578c05dc89feb346260a67 
  pants.ini 0671d9ab6381e5b9c324dc09a891a639cbfb2ccc 
  src/main/python/apache/aurora/client/cli/BUILD 
80a6dec9acf5b64cff64143929916676b79be686 
  src/main/python/apache/thermos/observer/BUILD 
95b8dcd7d123d3a2bcb22df7efc60374ef919bf7 
  src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
1b5d2d02535345edfe6cf04d18d00434393f800b 
  src/test/python/apache/aurora/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/admin/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/client/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/client/api/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/client/binding_helpers/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/client/cli/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/client/docker/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/common/health_check/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/aurora/executor/common/__init__.py 
0663a9ad8ae194b63ce6d77dabfe65564e2d26dd 
  src/test/python/apache/thermos/core/test_process.py 
6cb9176e14eccbe7ed10501199a34e5e67d6fe44 


Diff: https://reviews.apache.org/r/64534/diff/1/


Testing
---


Thanks,

Stephan Erb



Re: Review Request 64508: Convert carriage returns to newlines in reviews

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64508/#review193522
---



@ReviewBot retry

- Stephan Erb


On Dec. 12, 2017, 10:05 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64508/
> ---
> 
> (Updated Dec. 12, 2017, 10:05 a.m.)
> 
> 
> Review request for Aurora and Bill Farner.
> 
> 
> Bugs: AURORA-1961
> https://issues.apache.org/jira/browse/AURORA-1961
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Spotbugs prints multiple lines with \r [1][2]. This looks like a single
> line for shell commands but will be converted to multiple lines once
> read by Python.
> 
> By performing the conversion before the tail command, we will get a
> consistent line count in Bash and Python.
> 
> [1] 
> https://github.com/spotbugs/spotbugs/blob/fe8a8d66e97d3ae0b830731461aab0f8b39791f6/spotbugs/src/main/java/edu/umd/cs/findbugs/TextUIProgressCallback.java#L103
> [2] https://github.com/spotbugs/spotbugs/issues/506
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/review_feedback.py 
> a6eeceeb3815b0f49a66c9d6b459d9ce64f515a3 
> 
> 
> Diff: https://reviews.apache.org/r/64508/diff/2/
> 
> 
> Testing
> ---
> 
> ./gradlew --console=plain -Pq build 2>&1 | tr -u "\r" "\n" | tee build_outpu
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 64508: Convert carriage returns to newlines in reviews

2017-12-12 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64508/
---

(Updated Dec. 12, 2017, 10:05 a.m.)


Review request for Aurora and Bill Farner.


Bugs: AURORA-1961
https://issues.apache.org/jira/browse/AURORA-1961


Repository: aurora


Description
---

Spotbugs prints multiple lines with \r [1][2]. This looks like a single
line for shell commands but will be converted to multiple lines once
read by Python.

By performing the conversion before the tail command, we will get a
consistent line count in Bash and Python.

[1] 
https://github.com/spotbugs/spotbugs/blob/fe8a8d66e97d3ae0b830731461aab0f8b39791f6/spotbugs/src/main/java/edu/umd/cs/findbugs/TextUIProgressCallback.java#L103
[2] https://github.com/spotbugs/spotbugs/issues/506


Diffs (updated)
-

  build-support/jenkins/review_feedback.py 
a6eeceeb3815b0f49a66c9d6b459d9ce64f515a3 


Diff: https://reviews.apache.org/r/64508/diff/2/

Changes: https://reviews.apache.org/r/64508/diff/1-2/


Testing
---

./gradlew --console=plain -Pq build 2>&1 | tr -u "\r" "\n" | tee build_outpu


Thanks,

Stephan Erb



  1   2   3   4   5   6   7   8   9   >