Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191666
---


Ship it!




LGTM. Great new feature!


RELEASE-NOTES.md
Lines 12-13 (patched)


Should we prevent that users submit a non-default reschedule policy if the 
scheduler feature toggle is disabled? 
(I am fine either way but just want to point it out)


- Stephan Erb


On Nov. 21, 2017, 7:42 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 7:42 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   docs/reference/configuration.md f647bcf86e84c3b08848e5f54f6a5ad95d55fc8a 
>   docs/reference/scheduler-configuration.md 
> d17541f9458650240983276b69f749a854057aa8 
>   docs/reference/task-lifecycle.md 8ec0077f7c5caef9c012db02ffd40c704fc2b347 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/10/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread David McLaughlin


> On Nov. 21, 2017, 5:54 p.m., David McLaughlin wrote:
> > The outstanding feedback is:
> > 
> > * Discussion on using a mock with capture (my preference) vs using a fake 
> > implementation. 
> > * Concern over how to handle a partition during a transient state. My 
> > preference is to move immediately to LOST (existing behavior).
> > * Confusion over what Disable meant (now removed).
> > 
> > I would appreciate other thoughts on these, as I'm eager to close this work 
> > out and move on to other things.
> 
> Stephan Erb wrote:
> 
> https://github.com/apache/aurora/blob/master/docs/reference/task-lifecycle.md 
> will also need an update. I am OK if you ignore the image as it is already 
> outdated, but the text should reflect reality.
> 
> Jordan Ly wrote:
> My thoughts:
> 
> * I think that the current tests are alright -- they test the behavior of 
> PartitionManager in isolation and are fairly easy to read. I imagine that you 
> mean 'integration test' by 'fake implementation', but please correct me if I 
> am mistaken. On a side note: do we ever explicitly test that something is 
> never called (i.e. `expect(...).times(0)`)? I know EasyMock's default 
> behavior is that if something is called that isn't supposed to be called, it 
> will fail. However, when reading tests, I feel like it the explicit 
> declaration flows smoother in my mind.
> * +1 for the existing behavior.
> * +1 to removing Disable
> 
> Bill Farner wrote:
> I don't feel strongly about any of the outstanding issues.

Okay, great. In that case I will go ahead and ship this.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191603
---


On Nov. 21, 2017, 6:42 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 6:42 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   docs/reference/configuration.md f647bcf86e84c3b08848e5f54f6a5ad95d55fc8a 
>   docs/reference/scheduler-configuration.md 
> d17541f9458650240983276b69f749a854057aa8 
>   docs/reference/task-lifecycle.md 8ec0077f7c5caef9c012db02ffd40c704fc2b347 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread Bill Farner


> On Nov. 21, 2017, 9:54 a.m., David McLaughlin wrote:
> > The outstanding feedback is:
> > 
> > * Discussion on using a mock with capture (my preference) vs using a fake 
> > implementation. 
> > * Concern over how to handle a partition during a transient state. My 
> > preference is to move immediately to LOST (existing behavior).
> > * Confusion over what Disable meant (now removed).
> > 
> > I would appreciate other thoughts on these, as I'm eager to close this work 
> > out and move on to other things.
> 
> Stephan Erb wrote:
> 
> https://github.com/apache/aurora/blob/master/docs/reference/task-lifecycle.md 
> will also need an update. I am OK if you ignore the image as it is already 
> outdated, but the text should reflect reality.
> 
> Jordan Ly wrote:
> My thoughts:
> 
> * I think that the current tests are alright -- they test the behavior of 
> PartitionManager in isolation and are fairly easy to read. I imagine that you 
> mean 'integration test' by 'fake implementation', but please correct me if I 
> am mistaken. On a side note: do we ever explicitly test that something is 
> never called (i.e. `expect(...).times(0)`)? I know EasyMock's default 
> behavior is that if something is called that isn't supposed to be called, it 
> will fail. However, when reading tests, I feel like it the explicit 
> declaration flows smoother in my mind.
> * +1 for the existing behavior.
> * +1 to removing Disable

I don't feel strongly about any of the outstanding issues.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191603
---


On Nov. 21, 2017, 10:42 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 10:42 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   docs/reference/configuration.md f647bcf86e84c3b08848e5f54f6a5ad95d55fc8a 
>   docs/reference/scheduler-configuration.md 
> d17541f9458650240983276b69f749a854057aa8 
>   docs/reference/task-lifecycle.md 8ec0077f7c5caef9c012db02ffd40c704fc2b347 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191626
---


Ship it!




Master (46b1112) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 21, 2017, 6:42 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 6:42 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   docs/reference/configuration.md f647bcf86e84c3b08848e5f54f6a5ad95d55fc8a 
>   docs/reference/scheduler-configuration.md 
> d17541f9458650240983276b69f749a854057aa8 
>   docs/reference/task-lifecycle.md 8ec0077f7c5caef9c012db02ffd40c704fc2b347 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/10/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread Jordan Ly


> On Nov. 21, 2017, 5:54 p.m., David McLaughlin wrote:
> > The outstanding feedback is:
> > 
> > * Discussion on using a mock with capture (my preference) vs using a fake 
> > implementation. 
> > * Concern over how to handle a partition during a transient state. My 
> > preference is to move immediately to LOST (existing behavior).
> > * Confusion over what Disable meant (now removed).
> > 
> > I would appreciate other thoughts on these, as I'm eager to close this work 
> > out and move on to other things.
> 
> Stephan Erb wrote:
> 
> https://github.com/apache/aurora/blob/master/docs/reference/task-lifecycle.md 
> will also need an update. I am OK if you ignore the image as it is already 
> outdated, but the text should reflect reality.

My thoughts:

* I think that the current tests are alright -- they test the behavior of 
PartitionManager in isolation and are fairly easy to read. I imagine that you 
mean 'integration test' by 'fake implementation', but please correct me if I am 
mistaken. On a side note: do we ever explicitly test that something is never 
called (i.e. `expect(...).times(0)`)? I know EasyMock's default behavior is 
that if something is called that isn't supposed to be called, it will fail. 
However, when reading tests, I feel like it the explicit declaration flows 
smoother in my mind.
* +1 for the existing behavior.
* +1 to removing Disable


- Jordan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191603
---


On Nov. 21, 2017, 6:42 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 6:42 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   docs/reference/configuration.md f647bcf86e84c3b08848e5f54f6a5ad95d55fc8a 
>   docs/reference/scheduler-configuration.md 
> d17541f9458650240983276b69f749a854057aa8 
>   docs/reference/task-lifecycle.md 8ec0077f7c5caef9c012db02ffd40c704fc2b347 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 21, 2017, 6:42 p.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Update docs.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  docs/reference/configuration.md f647bcf86e84c3b08848e5f54f6a5ad95d55fc8a 
  docs/reference/scheduler-configuration.md 
d17541f9458650240983276b69f749a854057aa8 
  docs/reference/task-lifecycle.md 8ec0077f7c5caef9c012db02ffd40c704fc2b347 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 
  src/test/python/apache/aurora/config/test_thrift.py 
7a1567a9b67917072bb0ba3eea5857e968375f4d 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
f0819fb7ac758ad1229a76fd9794b393400e9f63 


Diff: https://reviews.apache.org/r/63536/diff/10/

Changes: https://reviews.apache.org/r/63536/diff/9-10/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191616
---


Ship it!




Master (46b1112) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 21, 2017, 6:18 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 6:18 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/9/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 21, 2017, 6:18 p.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Green build.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 
  src/test/python/apache/aurora/config/test_thrift.py 
7a1567a9b67917072bb0ba3eea5857e968375f4d 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
f0819fb7ac758ad1229a76fd9794b393400e9f63 


Diff: https://reviews.apache.org/r/63536/diff/9/

Changes: https://reviews.apache.org/r/63536/diff/8-9/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191608
---



Master (46b1112) is red with this patch.
  ./build-support/jenkins/build.sh

  Running setup.py bdist_wheel for twitter.common.log: finished with status 
'done'
  Stored in directory: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pip/wheels/80/e3/3a/297d8950fcbd822ab5a6a62175818fab38b668cc5a86dbd0b0
  Running setup.py bdist_wheel for pycparser: started
  Running setup.py bdist_wheel for pycparser: finished with status 'done'
  Stored in directory: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pip/wheels/95/14/9a/5e7b9024459d2a6600aaa64e0ba485325aff7a9ac7489db1b6
  Running setup.py bdist_wheel for twitter.common.options: started
  Running setup.py bdist_wheel for twitter.common.options: finished with status 
'done'
  Stored in directory: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/.home/.cache/pip/wheels/17/41/80/c4811d8c1c7ca7007e520c3399872fc340f45c3a26a6a23e6a
Successfully built pantsbuild.pants twitter.common.collections setproctitle 
ansicolors pathspec scandir twitter.common.dirutil psutil pystache docutils 
Markdown Pygments twitter.common.confluence coverage lmdb pywatchman 
twitter.common.lang twitter.common.log pycparser twitter.common.options
Installing collected packages: twitter.common.lang, twitter.common.collections, 
setproctitle, setuptools, six, ansicolors, pyparsing, packaging, pathspec, 
scandir, twitter.common.dirutil, psutil, requests, pystache, pex, docutils, 
Markdown, Pygments, twitter.common.options, twitter.common.log, 
twitter.common.confluence, monotonic, fasteners, coverage, pycparser, cffi, 
lmdb, pywatchman, futures, pantsbuild.pants
  Found existing installation: setuptools 21.2.1
Uninstalling setuptools-21.2.1:
  Successfully uninstalled setuptools-21.2.1
Successfully installed Markdown-2.1.1 Pygments-1.4 ansicolors-1.0.2 cffi-1.7.0 
coverage-3.7.1 docutils-0.12 fasteners-0.14.1 futures-3.0.5 lmdb-0.89 
monotonic-1.4 packaging-16.7 pantsbuild.pants-1.3.0.dev3 pathspec-0.3.4 
pex-1.1.16 psutil-4.3.0 pycparser-2.18 pyparsing-2.2.0 pystache-0.5.3 
pywatchman-1.3.0 requests-2.5.3 scandir-1.2 setproctitle-1.1.10 
setuptools-30.0.0 six-1.11.0 twitter.common.collections-0.3.9 
twitter.common.confluence-0.3.9 twitter.common.dirutil-0.3.9 
twitter.common.lang-0.3.9 twitter.common.log-0.3.9 twitter.common.options-0.3.9
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

18:00:44 00:00 [main]
   (To run a reporting server: ./pants server)
18:00:44 00:00   [setup]
18:00:44 00:00 [parse]
   Executing tasks in goals: compile
18:00:44 00:00   [compile]
18:00:44 00:00 [compile-prep-command]
18:00:44 00:00   [prep_command]
18:00:48 00:04 [compile]
18:00:48 00:04 [python-eval]
18:00:48 00:04 [pythonstyle]
18:00:49 00:05   [cache]  
   No cached artifacts for 42 targets.
   Invalidated 42 targets.
F401:ERROR   src/main/python/apache/aurora/client/cli/jobs.py:066 
'PystachioPartitionPolicy' imported but unused
 |from apache.aurora.config.schema.base import PartitionPolicy as 
PystachioPartitionPolicy


FAILURE: 1 Python Style issues found. For import order related issues, please 
try `./pants fmt.isort `


18:01:05 00:21   [complete]
   FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 21, 2017, 5:50 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 5:50 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread Stephan Erb


> On Nov. 21, 2017, 6:54 p.m., David McLaughlin wrote:
> > The outstanding feedback is:
> > 
> > * Discussion on using a mock with capture (my preference) vs using a fake 
> > implementation. 
> > * Concern over how to handle a partition during a transient state. My 
> > preference is to move immediately to LOST (existing behavior).
> > * Confusion over what Disable meant (now removed).
> > 
> > I would appreciate other thoughts on these, as I'm eager to close this work 
> > out and move on to other things.

https://github.com/apache/aurora/blob/master/docs/reference/task-lifecycle.md 
will also need an update. I am OK if you ignore the image as it is already 
outdated, but the text should reflect reality.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191603
---


On Nov. 21, 2017, 6:50 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 6:50 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/8/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191603
---



The outstanding feedback is:

* Discussion on using a mock with capture (my preference) vs using a fake 
implementation. 
* Concern over how to handle a partition during a transient state. My 
preference is to move immediately to LOST (existing behavior).
* Confusion over what Disable meant (now removed).

I would appreciate other thoughts on these, as I'm eager to close this work out 
and move on to other things.

- David McLaughlin


On Nov. 21, 2017, 5:50 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 5:50 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/8/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread David McLaughlin


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
> > Lines 185 (patched)
> > 
> >
> > Nit: `isPartitionAware` to minutely improve readability.

Done.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 91 (patched)
> > 
> >
> > A `LOG.info` on either side of the branch on task configuration would 
> > be nice to simplify production debugging.
> > 
> > `Partitioned task will be rescheduled in %s seconds`
> > 
> > `Partitioned task will not be rescheduled`

Done.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
> > Line 266 (original), 307 (patched)
> > 
> >
> > Nit: order the states consistently where possible.  Makes it easier to 
> > scan for where states are handled differently.  The prevailing convention 
> > in this patch seems to be `PARTITIONED` last, which sounds fine to me.

Done.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
> > Lines 57 (patched)
> > 
> >
> > nit: rm line

Done.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
> > Lines 60 (patched)
> > 
> >
> > nit: static; ditto elsewhere

Done.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
> > Lines 74-75 (patched)
> > 
> >
> > can you swap old/new param order?  i was confused by some callers until 
> > i came up here :-)

Done.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java
> > Lines 431 (patched)
> > 
> >
> > Please phrase this check as:
> > 
> > ```java
> > assertEquals(
> >   ImmutableList.of(PENDING, ASSIGNED, ...),
> >   updatedTask.getTaskEvents().stream()
> >   .map(e -> e.getStatus())
> >   .collect(Collectors.toList()));
> > ```
> > 
> > This will give greater confidence that the compaction worked as 
> > intended.

Much better, thanks. Done.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191229
---


On Nov. 21, 2017, 5:50 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 21, 2017, 5:50 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-21 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 21, 2017, 5:50 p.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Feedback, and remove Disable and just revert to explicit PartitionPolicy in 
pystachio.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 
  src/test/python/apache/aurora/config/test_thrift.py 
7a1567a9b67917072bb0ba3eea5857e968375f4d 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
f0819fb7ac758ad1229a76fd9794b393400e9f63 


Diff: https://reviews.apache.org/r/63536/diff/8/

Changes: https://reviews.apache.org/r/63536/diff/7-8/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-16 Thread David McLaughlin


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
> > Lines 209 (patched)
> > 
> >
> > I suggest keeping this as a valid transition and allowing the 
> > PartitionManager to deal with it.

See the comment I added to this transition. The TL;DR is it could indefinitely 
block restarts, maintenance and preemption when reschedule=False. I don't think 
this is desirable. Today any such indefinite delays are handled by transient 
task timeouts, but obviously we can't apply those timeouts to PARTITIONED. 

My conclusion after much deliberation (I mention some of this about duplicate 
instances in the design doc) was that it's fine just to move to LOST (which is 
the behavior today). The reasoning I ended up with is that PartitionPolicy is 
intended for keeping tasks *running*, and that partitions encountered when 
tearing tasks down are far less important (or not important at all). 

At the same time, it's a discussion worth fleshing out. The other alternative I 
had considered was to simply ignore PARTITIONED when restarting and wait for 
the transient task timeout to hit. But because we'd only want to do that on 
reschedule=False, this logic would have to live in PartitionManager. But this 
would not solve the problem of actually waiting indefinitely when 
reschedule=False.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
> > Lines 281 (patched)
> > 
> >
> > If `FAILED` is in this list, i would expect `FINISHED` as well.

Not sure how I missed that. Thanks!


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/main/python/apache/aurora/config/schema/base.py
> > Lines 157 (patched)
> > 
> >
> > I would prefer the name `RescheduleImmediately` (or similar) to make 
> > this self-documenting.

See my reply to Jordan. Disable does not mean this, it's an alias for 
reschedule=False. I lifted the name Disabled straight from the comment in the 
proposal doc, but looks like it's added confusion. I'm thinking of just 
reverting that suggestion and going back to explicit PartitionPolicy.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
> > Lines 55 (patched)
> > 
> >
> > (disclaimer: see my longer comment about an integration test before 
> > trying to satisfy other comments in this file)
> > 
> > Have a look at `FakeScheduledExecutor` and see if you find it helpful.  
> > The API is a bit weird, but it aims to simplify this type of test where you 
> > need a `Clock` and a `ScheduledExecutor`.  At the very least, it eliminates 
> > the noise of using `Capture`s.
> 
> David McLaughlin wrote:
> I actually found the capture much easier to read and reason about than 
> the FakeScheduledExecutor, and I thought it led to some pretty clean tests. 
> Curious how others feel.

I had seen the FakeScheduledExecutor and opted not to use it. I actually find 
the capture much easier to read and reason about. Curious how others feel.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
> > Lines 55 (patched)
> > 
> >
> > (disclaimer: see my longer comment about an integration test before 
> > trying to satisfy other comments in this file)
> > 
> > Have a look at `FakeScheduledExecutor` and see if you find it helpful.  
> > The API is a bit weird, but it aims to simplify this type of test where you 
> > need a `Clock` and a `ScheduledExecutor`.  At the very least, it eliminates 
> > the noise of using `Capture`s.
> 
> David McLaughlin wrote:
> I had seen the FakeScheduledExecutor and opted not to use it. I actually 
> find the capture much easier to read and reason about. Curious how others 
> feel.

I actually found the capture much easier to read and reason about than the 
FakeScheduledExecutor, and I thought it led to some pretty clean tests. Curious 
how others feel.


> On Nov. 16, 2017, 10:01 p.m., Bill Farner wrote:
> > src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
> > Lines 109-115 (patched)
> > 
> >
> > Test cases like this lead me to believe that an integration test is 
> > more appropriate.  This test case reads as "nothing happens when a task 
> > with reschedule=false transitions from RUNNING to PARTITIONED".  What we 
> > really mean, though, is that "a task with reschedule=false 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191261
---


Ship it!




Master (46b1112) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 16, 2017, 10:30 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 16, 2017, 10:30 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/7/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-16 Thread David McLaughlin


> On Nov. 16, 2017, 10:22 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 90-91 (patched)
> > 
> >
> > Can this just be:
> > ```
> > if (stateChange.getNewState().equals(ScheduleStatus.PARTITIONED))
> > ```

partitionPolicy is optional. I'll explain this check with a comment.


> On Nov. 16, 2017, 10:22 p.m., Jordan Ly wrote:
> > src/main/python/apache/aurora/config/schema/base.py
> > Lines 157 (patched)
> > 
> >
> > Can you explain what `Disable` does? If it is `RescheduleImmediately`, 
> > then the default 0 on `PartitionPolicy` is sufficient.

Hmm, Disable might not be a good name for this actually. It's more like 
NeverReschedule. But that is also not a good name because we would give up on 
the partition in certain conditions (user or operator initiated actions).


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191256
---


On Nov. 16, 2017, 10:30 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 16, 2017, 10:30 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/7/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-16 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 16, 2017, 10:30 p.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Add comments to explain the rationale behind why we ignore the PartitionPolicy 
and move straight to LOST when tasks are being 
killed/preempted/restarted/drained.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  RELEASE-NOTES.md d653b797533735fc936a1cff2e005813757e1290 
  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 
  src/test/python/apache/aurora/config/test_thrift.py 
7a1567a9b67917072bb0ba3eea5857e968375f4d 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
f0819fb7ac758ad1229a76fd9794b393400e9f63 


Diff: https://reviews.apache.org/r/63536/diff/7/

Changes: https://reviews.apache.org/r/63536/diff/6-7/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-16 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191256
---


Fix it, then Ship it!




Overall LGTM! You may want to add some entry to the release notes explaining 
how Mesos should fix some bugs on their end before enabling this.


src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 90-91 (patched)


Can this just be:
```
if (stateChange.getNewState().equals(ScheduleStatus.PARTITIONED))
```



src/main/python/apache/aurora/config/schema/base.py
Lines 157 (patched)


Can you explain what `Disable` does? If it is `RescheduleImmediately`, then 
the default 0 on `PartitionPolicy` is sufficient.


- Jordan Ly


On Nov. 16, 2017, 1:54 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 16, 2017, 1:54 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/6/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-16 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191229
---


Fix it, then Ship it!




Looks great overall!  The e2e test is very easy to follow.


src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
Lines 185 (patched)


Nit: `isPartitionAware` to minutely improve readability.



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 91 (patched)


A `LOG.info` on either side of the branch on task configuration would be 
nice to simplify production debugging.

`Partitioned task will be rescheduled in %s seconds`

`Partitioned task will not be rescheduled`



src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
Lines 209 (patched)


I suggest keeping this as a valid transition and allowing the 
PartitionManager to deal with it.



src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
Lines 281 (patched)


If `FAILED` is in this list, i would expect `FINISHED` as well.



src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java
Line 266 (original), 307 (patched)


Nit: order the states consistently where possible.  Makes it easier to scan 
for where states are handled differently.  The prevailing convention in this 
patch seems to be `PARTITIONED` last, which sounds fine to me.



src/main/python/apache/aurora/config/schema/base.py
Lines 157 (patched)


I would prefer the name `RescheduleImmediately` (or similar) to make this 
self-documenting.



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 55 (patched)


(disclaimer: see my longer comment about an integration test before trying 
to satisfy other comments in this file)

Have a look at `FakeScheduledExecutor` and see if you find it helpful.  The 
API is a bit weird, but it aims to simplify this type of test where you need a 
`Clock` and a `ScheduledExecutor`.  At the very least, it eliminates the noise 
of using `Capture`s.



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 57 (patched)


nit: rm line



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 60 (patched)


nit: static; ditto elsewhere



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 74-75 (patched)


can you swap old/new param order?  i was confused by some callers until i 
came up here :-)



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 87 (patched)


nit: newline before



src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java
Lines 109-115 (patched)


Test cases like this lead me to believe that an integration test is more 
appropriate.  This test case reads as "nothing happens when a task with 
reschedule=false transitions from RUNNING to PARTITIONED".  What we really 
mean, though, is that "a task with reschedule=false immediately moves to LOST 
upon transition attempt from RUNNING to PARTITIONED".

With this in mind, i suggest you eliminate this test and move the coverage 
into `StateManagerImplTest`.  I think this is rather natural, since they work 
in unison to manage state transitions.

I don't feel strongly about this, since it is not the convention in the 
codebase, and i don't know what hurdles you may encounter if you try to bundle 
this behavior in `StateManagerImplTest`.



src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java
Lines 431 (patched)


Please phrase this check as:

```java
assertEquals(
  ImmutableList.of(PENDING, ASSIGNED, ...),
  updatedTask.getTaskEvents().stream()
  .map(e -> e.getStatus())
  .collect(Collectors.toList()));
```

This will give greater confidence that the compaction worked as intended.


- Bill Farner


On Nov. 15, 2017, 5:54 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-15 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191144
---


Ship it!




Master (46b1112) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 16, 2017, 9:54 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 16, 2017, 9:54 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/6/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-15 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 16, 2017, 1:54 a.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Add e2e test.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 
  src/test/python/apache/aurora/config/test_thrift.py 
7a1567a9b67917072bb0ba3eea5857e968375f4d 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
f0819fb7ac758ad1229a76fd9794b393400e9f63 


Diff: https://reviews.apache.org/r/63536/diff/6/

Changes: https://reviews.apache.org/r/63536/diff/5-6/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-14 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191025
---



I think this change is sufficiently complex that it warrants an e2e test. Will 
add that next.

- David McLaughlin


On Nov. 15, 2017, 12:05 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 15, 2017, 12:05 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/5/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-14 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191021
---


Ship it!




Master (4fecf1f) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 15, 2017, 12:05 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 15, 2017, 12:05 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/5/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-14 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 15, 2017, 12:05 a.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Support:

Job(
 ...
  partition_policy=PartitionPolicy(delay_secs=30)
  ...
)

 and 
 
Job(
 ...
  partition_policy=Disable()
  ...
)


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 
  src/test/python/apache/aurora/config/test_thrift.py 
7a1567a9b67917072bb0ba3eea5857e968375f4d 


Diff: https://reviews.apache.org/r/63536/diff/5/

Changes: https://reviews.apache.org/r/63536/diff/4-5/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-14 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191013
---


Ship it!




Master (4fecf1f) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 15, 2017, 7:41 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 15, 2017, 7:41 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/4/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-14 Thread David McLaughlin


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 526 (patched)
> > 
> >
> > It seems logical that this accounting would live alongside 
> > `ScheduledTask.failureCount`.
> > 
> > How would you feel about naming this `timesPartitioned` instead?  I 
> > feel that more clearly disambiguates from overloaded meanings of 
> > `partition`, e.g. https://en.wikipedia.org/wiki/Partition_(database)

Thanks for this suggestion, I agree it's much clearer.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/base/Conversions.java
> > Lines 71-72 (original), 71-72 (patched)
> > 
> >
> > Comment is now stale.

Removed.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
> > Lines 100 (patched)
> > 
> >
> > How about `-partition_aware` instead?  I find the `enable` qualifier 
> > unnecessary.

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 31 (patched)
> > 
> >
> > How about s/futureMap/delayedTransitions/

Map has been removed.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 40 (patched)
> > 
> >
> > Please use `AsyncUtil.singleThreadLoggingScheduledExecutor()` instead.  
> > In addition to automatically logging and tracking exceptions, it will 
> > create a daemon thread.

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 48 (patched)
> > 
> >
> > Consider using `Tasks.getLatestEvent(task).getTimestamp()`
> > 
> > to hide the need for `Iterables.getLast()`

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 53-54 (patched)
> > 
> >
> > `String taskId = Tasks.id(stateChange.getTask());`

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > 
> >
> > Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
> > helping you perform a CAS.  If we had a _true_ CAS operation, you could 
> > very safely avoid the need to cancel futures.  You can't do that with the 
> > status only since the task can now cycle with `PARTITIONED`.
> > 
> > What do you think about using the last transition timestamp to do this?
> > 
> > ```java
> > long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> > executor.schedule(() -> {
> >   storage.write(... {
> > if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) 
> > {
> >   // proceed
> > } else {
> >   // cas failed, the task has since transitioned
> > }
> >   }
> > 
> > }, delay, TimeUnit.SECONDS);
> > ```
> 
> David McLaughlin wrote:
> The other motivation (other than correctness, which I'm pretty sure we 
> now have) for removing and cancelling futures was to get the tasks off the 
> heap. Of course, that's not actually happening here because I'm not using a 
> ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy to true). With 
> the AsyncUtil suggestion above, I'll be sure to set that to make it clear 
> what I'm doing. 
> 
> If we still think it's necessary to include this timestamp check, I'm 
> happy to do that (although we'd have to refetch the task from storage inside 
> the storage.write of course).
> 
> Bill Farner wrote:
> > get the tasks off the heap
> 
> Smells like premature optimization.  Either, you can certainly factor 
> this so you only maintain a reference to the task ID and the timestamp.
> 
> David McLaughlin wrote:
> Isn't premature optimization usually brandished in response to complexity 
> though? I don't see the complexity here.
> 
> Bill Farner wrote:
> I'm simply pointing out that we shouldn't maintain the map if the primary 
> motivation is to reduce heap consumption.
> 
> I do think the code would be easier to reason about if future 
> cancellation and maintaining the additional map were not 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-14 Thread David McLaughlin


> On Nov. 9, 2017, 1:10 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 57 (patched)
> > 
> >
> > I don't think you need to `synchronize` here since it is a 
> > `ConcurrentHashMap`. You can probably just call `remove`, and if the value 
> > it returns is not `null` you can cancel it.
> > 
> > This way, you can remove the double-checked locking.

Removed the map.


> On Nov. 9, 2017, 1:10 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SideEffect.java
> > Lines 73 (patched)
> > 
> >
> > I would rename this to avoid confusion. Maybe make it a verb like 
> > `TRANSITION_TO_LOST`

Done.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190521
---


On Nov. 14, 2017, 10:41 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 14, 2017, 10:41 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/4/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-14 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 14, 2017, 10:41 p.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Rebase + green build. Server side is code-complete, but I'll follow up with a 
change to the Python DSL to match the recommendation in the proposal document.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 


Diff: https://reviews.apache.org/r/63536/diff/4/

Changes: https://reviews.apache.org/r/63536/diff/3-4/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-10 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190758
---



Master (9b9b2ee) is red with this patch.
  ./build-support/jenkins/build.sh

java.lang.IllegalArgumentException

org.apache.aurora.scheduler.storage.db.TaskStoreTest > 
testSaveWithMesosFetcherUris FAILED
java.lang.AssertionError

org.apache.aurora.scheduler.storage.db.JobUpdateStoreTest > 
testMultipleJobDetails FAILED
java.lang.IllegalArgumentException

org.apache.aurora.scheduler.storage.db.TaskStoreTest > testNullVsEmptyRelations 
FAILED
com.google.common.util.concurrent.UncheckedExecutionException
Caused by: java.lang.IllegalArgumentException

org.apache.aurora.scheduler.storage.db.TaskStoreTest > 
testQueryMultipleInstances FAILED
java.lang.AssertionError

org.apache.aurora.scheduler.storage.db.TaskStoreTest > testAddSlaveHost FAILED
java.lang.Exception
Caused by: java.lang.AssertionError
I 00:05:32.878 [ShutdownHook, SchedulerMain] Stopping scheduler services. 
I 00:05:33.001 [SessionTracker, SessionTrackerImpl] SessionTrackerImpl 
exited loop! 

1195 tests completed, 54 failed, 2 skipped
:test FAILED
:jacocoTestReport
Coverage report generated: 
file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/jacoco/test/html/index.html
:jacocoTestCoverageVerification

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':test'.
> There were failing tests. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/tests/test/index.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

* Get more help at https://help.gradle.org

BUILD FAILED in 14m 49s
44 actionable tasks: 35 executed, 9 up-to-date


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 10, 2017, 11:45 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 10, 2017, 11:45 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-10 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 10, 2017, 11:45 p.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

I've added most of the Java tests. Build is broken because the logic to 
backport this to the MyBatis stores is missing. I don't think that is worth the 
effort, and will probably wait until we get rid of the db stores (which I think 
is happening as soon as we put 0.19 out?).


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c869493c06499340d73e1b219e17a0d7d8b5ead9 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
d72f055749801ee9d6f31f60857cc795d0ed7ab1 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 


Diff: https://reviews.apache.org/r/63536/diff/3/

Changes: https://reviews.apache.org/r/63536/diff/2-3/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread Bill Farner


> On Nov. 9, 2017, 9:23 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > 
> >
> > Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
> > helping you perform a CAS.  If we had a _true_ CAS operation, you could 
> > very safely avoid the need to cancel futures.  You can't do that with the 
> > status only since the task can now cycle with `PARTITIONED`.
> > 
> > What do you think about using the last transition timestamp to do this?
> > 
> > ```java
> > long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> > executor.schedule(() -> {
> >   storage.write(... {
> > if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) 
> > {
> >   // proceed
> > } else {
> >   // cas failed, the task has since transitioned
> > }
> >   }
> > 
> > }, delay, TimeUnit.SECONDS);
> > ```
> 
> David McLaughlin wrote:
> The other motivation (other than correctness, which I'm pretty sure we 
> now have) for removing and cancelling futures was to get the tasks off the 
> heap. Of course, that's not actually happening here because I'm not using a 
> ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy to true). With 
> the AsyncUtil suggestion above, I'll be sure to set that to make it clear 
> what I'm doing. 
> 
> If we still think it's necessary to include this timestamp check, I'm 
> happy to do that (although we'd have to refetch the task from storage inside 
> the storage.write of course).
> 
> Bill Farner wrote:
> > get the tasks off the heap
> 
> Smells like premature optimization.  Either, you can certainly factor 
> this so you only maintain a reference to the task ID and the timestamp.
> 
> David McLaughlin wrote:
> Isn't premature optimization usually brandished in response to complexity 
> though? I don't see the complexity here.

I'm simply pointing out that we shouldn't maintain the map if the primary 
motivation is to reduce heap consumption.

I do think the code would be easier to reason about if future cancellation and 
maintaining the additional map were not necessary.  I don't feel strongly about 
this, however.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
---


On Nov. 8, 2017, 4:48 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 8, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread David McLaughlin


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > 
> >
> > Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
> > helping you perform a CAS.  If we had a _true_ CAS operation, you could 
> > very safely avoid the need to cancel futures.  You can't do that with the 
> > status only since the task can now cycle with `PARTITIONED`.
> > 
> > What do you think about using the last transition timestamp to do this?
> > 
> > ```java
> > long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> > executor.schedule(() -> {
> >   storage.write(... {
> > if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) 
> > {
> >   // proceed
> > } else {
> >   // cas failed, the task has since transitioned
> > }
> >   }
> > 
> > }, delay, TimeUnit.SECONDS);
> > ```
> 
> David McLaughlin wrote:
> The other motivation (other than correctness, which I'm pretty sure we 
> now have) for removing and cancelling futures was to get the tasks off the 
> heap. Of course, that's not actually happening here because I'm not using a 
> ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy to true). With 
> the AsyncUtil suggestion above, I'll be sure to set that to make it clear 
> what I'm doing. 
> 
> If we still think it's necessary to include this timestamp check, I'm 
> happy to do that (although we'd have to refetch the task from storage inside 
> the storage.write of course).
> 
> Bill Farner wrote:
> > get the tasks off the heap
> 
> Smells like premature optimization.  Either, you can certainly factor 
> this so you only maintain a reference to the task ID and the timestamp.

Isn't premature optimization usually brandished in response to complexity 
though? I don't see the complexity here.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
---


On Nov. 9, 2017, 12:48 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 9, 2017, 12:48 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread Bill Farner


> On Nov. 9, 2017, 9:23 a.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > 
> >
> > Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
> > helping you perform a CAS.  If we had a _true_ CAS operation, you could 
> > very safely avoid the need to cancel futures.  You can't do that with the 
> > status only since the task can now cycle with `PARTITIONED`.
> > 
> > What do you think about using the last transition timestamp to do this?
> > 
> > ```java
> > long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> > executor.schedule(() -> {
> >   storage.write(... {
> > if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) 
> > {
> >   // proceed
> > } else {
> >   // cas failed, the task has since transitioned
> > }
> >   }
> > 
> > }, delay, TimeUnit.SECONDS);
> > ```
> 
> David McLaughlin wrote:
> The other motivation (other than correctness, which I'm pretty sure we 
> now have) for removing and cancelling futures was to get the tasks off the 
> heap. Of course, that's not actually happening here because I'm not using a 
> ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy to true). With 
> the AsyncUtil suggestion above, I'll be sure to set that to make it clear 
> what I'm doing. 
> 
> If we still think it's necessary to include this timestamp check, I'm 
> happy to do that (although we'd have to refetch the task from storage inside 
> the storage.write of course).

> get the tasks off the heap

Smells like premature optimization.  Either, you can certainly factor this so 
you only maintain a reference to the task ID and the timestamp.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
---


On Nov. 8, 2017, 4:48 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 8, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190632
---



Thanks for all the feedback! I'll move forward towards a production quality 
solution now.

- David McLaughlin


On Nov. 9, 2017, 12:48 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 9, 2017, 12:48 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread David McLaughlin


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > 
> >
> > Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
> > helping you perform a CAS.  If we had a _true_ CAS operation, you could 
> > very safely avoid the need to cancel futures.  You can't do that with the 
> > status only since the task can now cycle with `PARTITIONED`.
> > 
> > What do you think about using the last transition timestamp to do this?
> > 
> > ```java
> > long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> > executor.schedule(() -> {
> >   storage.write(... {
> > if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) 
> > {
> >   // proceed
> > } else {
> >   // cas failed, the task has since transitioned
> > }
> >   }
> > 
> > }, delay, TimeUnit.SECONDS);
> > ```

The other motivation (other than correctness, which I'm pretty sure we now 
have) for removing and cancelling futures was to get the tasks off the heap. Of 
course, that's not actually happening here because I'm not using a 
ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy to true). With 
the AsyncUtil suggestion above, I'll be sure to set that to make it clear what 
I'm doing. 

If we still think it's necessary to include this timestamp check, I'm happy to 
do that (although we'd have to refetch the task from storage inside the 
storage.write of course).


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
---


On Nov. 9, 2017, 12:48 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 9, 2017, 12:48 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread Jordan Ly


> On Nov. 9, 2017, 1:10 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 402 (patched)
> > 
> >
> > I wondering if having repeated `PARTITIONED` events would be helpful in 
> > debugging issues, and if it is worth having to deal with the possibility of 
> > an unbounded TaskEvent list.
> > 
> > I don't really have a strong opinion yet, but I would be curious as to 
> > what others think.
> 
> Bill Farner wrote:
> I agree to an extent, but the pathological case is acutely painful.  It 
> could mean an irrecoverable cluster, and tracking down the reason would be 
> challenging.
> 
> > debugging
> 
> Do you think preserving the count helps here?  Between the count as an 
> indicator of a problem and logs to dig deeper, i can't see any loss in signal.

You are definitely correct! I was originially thinking we would want to 
preserve timestamps, but I forgot about the logs and having the count is a 
strong enough signal to go into the logs which has all the information we need.


- Jordan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190521
---


On Nov. 9, 2017, 12:48 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 9, 2017, 12:48 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
---



Overall approach appears sound.  LGTM to proceed to tests.


api/src/main/thrift/org/apache/aurora/gen/api.thrift
Lines 526 (patched)


It seems logical that this accounting would live alongside 
`ScheduledTask.failureCount`.

How would you feel about naming this `timesPartitioned` instead?  I feel 
that more clearly disambiguates from overloaded meanings of `partition`, e.g. 
https://en.wikipedia.org/wiki/Partition_(database)



src/main/java/org/apache/aurora/scheduler/base/Conversions.java
Lines 71-72 (original), 71-72 (patched)


Comment is now stale.



src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
Lines 100 (patched)


How about `-partition_aware` instead?  I find the `enable` qualifier 
unnecessary.



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 31 (patched)


How about s/futureMap/delayedTransitions/



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 40 (patched)


Please use `AsyncUtil.singleThreadLoggingScheduledExecutor()` instead.  In 
addition to automatically logging and tracking exceptions, it will create a 
daemon thread.



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 48 (patched)


Consider using `Tasks.getLatestEvent(task).getTimestamp()`

to hide the need for `Iterables.getLast()`



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 53-54 (patched)


`String taskId = Tasks.id(stateChange.getTask());`



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 69 (patched)


Readability nit:
```java
if (!stateChange.isTransition()) {
  // This is a storage recovery event, so the calculated time
  // may be in the past.
  delay = Math.max(...);
}
```



src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 76 (patched)


Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
helping you perform a CAS.  If we had a _true_ CAS operation, you could very 
safely avoid the need to cancel futures.  You can't do that with the status 
only since the task can now cycle with `PARTITIONED`.

What do you think about using the last transition timestamp to do this?

```java
long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
executor.schedule(() -> {
  storage.write(... {
if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) {
  // proceed
} else {
  // cas failed, the task has since transitioned
}
  }

}, delay, TimeUnit.SECONDS);
```



src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
Lines 294 (patched)


`==` instead of `.equals()`



src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
Lines 297 (patched)


This is unnecessary.



src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
Lines 390 (patched)


s/The goal here is to c/C/



src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
Lines 400 (patched)


It's worth mentioning that this is done to bound the number of events 
stored for a task.



src/main/java/org/apache/aurora/scheduler/state/StateModule.java
Lines 70 (patched)


PubsubEventModule.bindSubscriber(binder, PartitionManager.class);


- Bill Farner


On Nov. 8, 2017, 4:48 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 8, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-09 Thread Bill Farner


> On Nov. 8, 2017, 5:10 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 402 (patched)
> > 
> >
> > I wondering if having repeated `PARTITIONED` events would be helpful in 
> > debugging issues, and if it is worth having to deal with the possibility of 
> > an unbounded TaskEvent list.
> > 
> > I don't really have a strong opinion yet, but I would be curious as to 
> > what others think.

I agree to an extent, but the pathological case is acutely painful.  It could 
mean an irrecoverable cluster, and tracking down the reason would be 
challenging.

> debugging

Do you think preserving the count helps here?  Between the count as an 
indicator of a problem and logs to dig deeper, i can't see any loss in signal.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190521
---


On Nov. 8, 2017, 4:48 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 8, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-08 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190521
---



Overall the logic seems sound to me!


src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 57 (patched)


I don't think you need to `synchronize` here since it is a 
`ConcurrentHashMap`. You can probably just call `remove`, and if the value it 
returns is not `null` you can cancel it.

This way, you can remove the double-checked locking.



src/main/java/org/apache/aurora/scheduler/state/SideEffect.java
Lines 73 (patched)


I would rename this to avoid confusion. Maybe make it a verb like 
`TRANSITION_TO_LOST`



src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
Lines 402 (patched)


I wondering if having repeated `PARTITIONED` events would be helpful in 
debugging issues, and if it is worth having to deal with the possibility of an 
unbounded TaskEvent list.

I don't really have a strong opinion yet, but I would be curious as to what 
others think.


- Jordan Ly


On Nov. 9, 2017, 12:48 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 9, 2017, 12:48 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-08 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190519
---



Master (773d2d6) is red with this patch.
  ./build-support/jenkins/build.sh

   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java:109:
 error: method buildFrameworkInfo in class CommandLineDriverSettingsModule 
cannot be applied to given types;
Protos.FrameworkInfo info = 
CommandLineDriverSettingsModule.buildFrameworkInfo(
   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java:125:
 error: method buildFrameworkInfo in class CommandLineDriverSettingsModule 
cannot be applied to given types;
Protos.FrameworkInfo info = 
CommandLineDriverSettingsModule.buildFrameworkInfo(
   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java:140:
 error: method buildFrameworkInfo in class CommandLineDriverSettingsModule 
cannot be applied to given types;
Protos.FrameworkInfo info = 
CommandLineDriverSettingsModule.buildFrameworkInfo(
   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
5 errors
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileTestJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

* Get more help at https://help.gradle.org

BUILD FAILED in 2m 47s
27 actionable tasks: 21 executed, 6 up-to-date


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 9, 2017, 12:48 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 9, 2017, 12:48 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-08 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 9, 2017, 12:48 a.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Added logic to handle failovers / dealing with timer race conditions.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c869493c06499340d73e1b219e17a0d7d8b5ead9 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
d72f055749801ee9d6f31f60857cc795d0ed7ab1 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 


Diff: https://reviews.apache.org/r/63536/diff/2/

Changes: https://reviews.apache.org/r/63536/diff/1-2/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-08 Thread David McLaughlin


> On Nov. 7, 2017, 11:36 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 38-42 (patched)
> > 
> >
> > On the other hand, this would be consistent from the user perspective. 
> > A user only sees the compacted task events with a single RUNNING to 
> > PARTITIONED transition. 
> > 
> > If we cancel the future, he might wonder why the task does not end up 
> > as LOST.
> 
> David McLaughlin wrote:
> Maybe it's not clear without tests - but this timer is responsible for 
> moving from PARTITIONED -> LOST after a specified delay. This comment is 
> concerned about the user saying "give up on a partition after 15 minutes" and 
> then a task moving from RUNNING -> PARTITIONED -> RUNNING -> PARTITIONED 
> within a 15 minute window. When the task transitions back to RUNNING the 
> first time, the user would expect the timer to be reset.
> 
> David McLaughlin wrote:
> It also just occured to me this timer won't survive a restart. So if I 
> want to use this event-driven model, then I'll need to plug into the storage 
> restore operation too.

Jordan pointed out that it will survive a restart as the storage layer sends 
all the events on recovery (which I should have remembered from WebHooks!). So 
I just need to check each state change for isTransition and then change delay 
to depend on the latest task event time.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190383
---


On Nov. 7, 2017, 7:28 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 7, 2017, 7:28 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-07 Thread David McLaughlin


> On Nov. 7, 2017, 11:36 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 38-42 (patched)
> > 
> >
> > On the other hand, this would be consistent from the user perspective. 
> > A user only sees the compacted task events with a single RUNNING to 
> > PARTITIONED transition. 
> > 
> > If we cancel the future, he might wonder why the task does not end up 
> > as LOST.
> 
> David McLaughlin wrote:
> Maybe it's not clear without tests - but this timer is responsible for 
> moving from PARTITIONED -> LOST after a specified delay. This comment is 
> concerned about the user saying "give up on a partition after 15 minutes" and 
> then a task moving from RUNNING -> PARTITIONED -> RUNNING -> PARTITIONED 
> within a 15 minute window. When the task transitions back to RUNNING the 
> first time, the user would expect the timer to be reset.

It also just occured to me this timer won't survive a restart. So if I want to 
use this event-driven model, then I'll need to plug into the storage restore 
operation too.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190383
---


On Nov. 7, 2017, 7:28 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 7, 2017, 7:28 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-07 Thread David McLaughlin


> On Nov. 7, 2017, 11:36 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 38-42 (patched)
> > 
> >
> > On the other hand, this would be consistent from the user perspective. 
> > A user only sees the compacted task events with a single RUNNING to 
> > PARTITIONED transition. 
> > 
> > If we cancel the future, he might wonder why the task does not end up 
> > as LOST.

Maybe it's not clear without tests - but this timer is responsible for moving 
from PARTITIONED -> LOST after a specified delay. This comment is concerned 
about the user saying "give up on a partition after 15 minutes" and then a task 
moving from RUNNING -> PARTITIONED -> RUNNING -> PARTITIONED within a 15 minute 
window. When the task transitions back to RUNNING the first time, the user 
would expect the timer to be reset.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190383
---


On Nov. 7, 2017, 7:28 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 7, 2017, 7:28 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-07 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190383
---




src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
Lines 38-42 (patched)


On the other hand, this would be consistent from the user perspective. A 
user only sees the compacted task events with a single RUNNING to PARTITIONED 
transition. 

If we cancel the future, he might wonder why the task does not end up as 
LOST.


- Stephan Erb


On Nov. 7, 2017, 8:28 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 7, 2017, 8:28 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/1/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-07 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190359
---



Master (ad86177) is red with this patch.
  ./build-support/jenkins/build.sh

   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java:109:
 error: method buildFrameworkInfo in class CommandLineDriverSettingsModule 
cannot be applied to given types;
Protos.FrameworkInfo info = 
CommandLineDriverSettingsModule.buildFrameworkInfo(
   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java:125:
 error: method buildFrameworkInfo in class CommandLineDriverSettingsModule 
cannot be applied to given types;
Protos.FrameworkInfo info = 
CommandLineDriverSettingsModule.buildFrameworkInfo(
   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java:140:
 error: method buildFrameworkInfo in class CommandLineDriverSettingsModule 
cannot be applied to given types;
Protos.FrameworkInfo info = 
CommandLineDriverSettingsModule.buildFrameworkInfo(
   ^
  required: 
String,String,Optional,Amount,boolean,boolean,boolean,Optional
  found: 
String,String,Optional,Amount,boolean,boolean,Optional
  reason: actual and formal argument lists differ in length
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
5 errors
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileTestJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

* Get more help at https://help.gradle.org

BUILD FAILED in 3m 44s
27 actionable tasks: 21 executed, 6 up-to-date


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 7, 2017, 12:28 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 7, 2017, 12:28 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
>