-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/
-----------------------------------------------------------

Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
Farner.


Repository: aurora


Description
-------

This is my prototype code for adding partition-awareness to Aurora. There is a 
proposal document to accompany this here: 
https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#

I'd like feedback on the high-level approach before adding unit tests, metrics, 
logging, etc.


Diffs
-----

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
c869493c06499340d73e1b219e17a0d7d8b5ead9 
  examples/vagrant/upstart/aurora-scheduler.conf 
5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
  src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
33cc012a2cad929b0dd1ce236597b870cfc5aba0 
  
src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
 5e83b2acdde7198d16427d4031e9772f78612554 
  src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
b91a0852e968b4aa9d74801601671cb61af3648a 
  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
9989ed441cd9bc442e6472768880ce7924c3bdd9 
  src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
d72f055749801ee9d6f31f60857cc795d0ed7ab1 
  src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
  src/main/python/apache/aurora/client/cli/jobs.py 
b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
  src/main/python/apache/aurora/config/schema/base.py 
18ce826363009e1cc0beac5cce4edf42610d487e 
  src/main/python/apache/aurora/config/thrift.py 
bedf8cd6641e1b1a930602791b758d584af4891c 


Diff: https://reviews.apache.org/r/63536/diff/1/


Testing
-------

Manual testing in Vagrant by stopping and starting the Mesos agent. With three 
jobs:

1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
directly to LOST)
2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
while before moving to LOST)
3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
indefinitely)

I also verified tasks are able to transition to various states (back to RUNNING 
or moving to FAILED, etc.) when you turn the agent back on.


File Attachments
----------------

Task in PARTITIONED state
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
Task back into running when partition resolved
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
Compaction of PARTITIONED cycles (note timestamps)
  
https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png


Thanks,

David McLaughlin

Reply via email to