> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 526 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885203#file1885203line526>
> >
> >     It seems logical that this accounting would live alongside 
> > `ScheduledTask.failureCount`.
> >     
> >     How would you feel about naming this `timesPartitioned` instead?  I 
> > feel that more clearly disambiguates from overloaded meanings of 
> > `partition`, e.g. https://en.wikipedia.org/wiki/Partition_(database)

Thanks for this suggestion, I agree it's much clearer.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/base/Conversions.java
> > Lines 71-72 (original), 71-72 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885205#file1885205line71>
> >
> >     Comment is now stale.

Removed.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
> > Lines 100 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885206#file1885206line100>
> >
> >     How about `-partition_aware` instead?  I find the `enable` qualifier 
> > unnecessary.

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 31 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885207#file1885207line31>
> >
> >     How about s/futureMap/delayedTransitions/

Map has been removed.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 40 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885207#file1885207line40>
> >
> >     Please use `AsyncUtil.singleThreadLoggingScheduledExecutor()` instead.  
> > In addition to automatically logging and tracking exceptions, it will 
> > create a daemon thread.

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 48 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885207#file1885207line48>
> >
> >     Consider using `Tasks.getLatestEvent(task).getTimestamp()`
> >     
> >     to hide the need for `Iterables.getLast()`

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 53-54 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885207#file1885207line53>
> >
> >     `String taskId = Tasks.id(stateChange.getTask());`

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885207#file1885207line76>
> >
> >     Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
> > helping you perform a CAS.  If we had a _true_ CAS operation, you could 
> > very safely avoid the need to cancel futures.  You can't do that with the 
> > status only since the task can now cycle with `PARTITIONED`.
> >     
> >     What do you think about using the last transition timestamp to do this?
> >     
> >     ```java
> >     long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> >     executor.schedule(() -> {
> >       storage.write(... {
> >         if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) 
> > {
> >           // proceed
> >         } else {
> >           // cas failed, the task has since transitioned
> >         }
> >       }
> >     
> >     }, delay, TimeUnit.SECONDS);
> >     ```
> 
> David McLaughlin wrote:
>     The other motivation (other than correctness, which I'm pretty sure we 
> now have) for removing and cancelling futures was to get the tasks off the 
> heap. Of course, that's not actually happening here because I'm not using a 
> ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy to true). With 
> the AsyncUtil suggestion above, I'll be sure to set that to make it clear 
> what I'm doing. 
>     
>     If we still think it's necessary to include this timestamp check, I'm 
> happy to do that (although we'd have to refetch the task from storage inside 
> the storage.write of course).
> 
> Bill Farner wrote:
>     > get the tasks off the heap
>     
>     Smells like premature optimization.  Either, you can certainly factor 
> this so you only maintain a reference to the task ID and the timestamp.
> 
> David McLaughlin wrote:
>     Isn't premature optimization usually brandished in response to complexity 
> though? I don't see the complexity here.
> 
> Bill Farner wrote:
>     I'm simply pointing out that we shouldn't maintain the map if the primary 
> motivation is to reduce heap consumption.
>     
>     I do think the code would be easier to reason about if future 
> cancellation and maintaining the additional map were not necessary.  I don't 
> feel strongly about this, however.

I took this suggestion and removed the map.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 294 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885209#file1885209line294>
> >
> >     `==` instead of `.equals()`

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 297 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885209#file1885209line297>
> >
> >     This is unnecessary.

Removed.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 390 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885209#file1885209line390>
> >
> >     s/The goal here is to c/C/

Done.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 400 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885209#file1885209line400>
> >
> >     It's worth mentioning that this is done to bound the number of events 
> > stored for a task.

Added.


> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateModule.java
> > Lines 70 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885210#file1885210line70>
> >
> >     PubsubEventModule.bindSubscriber(binder, PartitionManager.class);

Done.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
-----------------------------------------------------------


On Nov. 14, 2017, 10:41 p.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> -----------------------------------------------------------
> 
> (Updated Nov. 14, 2017, 10:41 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 1d369263d779b549b9304018437f535ddc855966 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   src/main/java/org/apache/aurora/scheduler/base/Jobs.java 
> 8d5f4e57c6b4f847cb74471f246fd0b7dd0cbc36 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 7c223eae69503fe1d5bf34c430438637abcbcb9b 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> c03fff11ea3a4086f9daaa8b07315006c1b481e4 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
>   src/test/java/org/apache/aurora/scheduler/base/JobsTest.java 
> 13f656f241a8a9a3d339f4053f165070c2669ef3 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> c2d875bb5c393dd95d75251fe86dc649ceba7bd9 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModuleTest.java
>  7b0429109e9a7795e559db264e7737fc55ff0169 
>   src/test/java/org/apache/aurora/scheduler/state/PartitionManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java 
> 0366cd6e9ddba0c3b9c88ffb50738767a352a17a 
>   src/test/java/org/apache/aurora/scheduler/state/TaskStateMachineTest.java 
> 8d6c3fff0af2df39bb929f760b862a2edf5d6fca 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> 186cb2737ba8e169819b7d54f86a7344a669b6cb 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/4/
> 
> 
> Testing
> -------
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> ----------------
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Reply via email to