Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-15 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review191144
---


Ship it!




Master (46b1112) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Nov. 16, 2017, 9:54 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> ---
> 
> (Updated Nov. 16, 2017, 9:54 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 7a1567a9b67917072bb0ba3eea5857e968375f4d 
>   src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
> f0819fb7ac758ad1229a76fd9794b393400e9f63 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/6/
> 
> 
> Testing
> ---
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> 
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> 

Re: Review Request 63536: Give jobs the ability to determine how to handle partitions by integrating with new Mesos Partition-Aware APIs

2017-11-15 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
---

(Updated Nov. 16, 2017, 1:54 a.m.)


Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Changes
---

Add e2e test.


Repository: aurora


Description
---

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
1d369263d779b549b9304018437f535ddc855966 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
  src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
7c223eae69503fe1d5bf34c430438637abcbcb9b 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
c03fff11ea3a4086f9daaa8b07315006c1b481e4 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 
  src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
13f656f241a8a9a3d339f4053f165070c2669ef3 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
  
src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
 7b0429109e9a7795e559db264e7737fc55ff0169 
  src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
  src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
  src/test/python/apache/aurora/client/cli/test_task.py 
186cb2737ba8e169819b7d54f86a7344a669b6cb 
  src/test/python/apache/aurora/config/test_thrift.py 
7a1567a9b67917072bb0ba3eea5857e968375f4d 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
f0819fb7ac758ad1229a76fd9794b393400e9f63 


Diff: https://reviews.apache.org/r/63536/diff/6/

Changes: https://reviews.apache.org/r/63536/diff/5-6/


Testing
---

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments


Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin