> On Nov. 9, 2017, 5:23 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java
> > Lines 76 (patched)
> > <https://reviews.apache.org/r/63536/diff/2/?file=1885207#file1885207line76>
> >
> >     Thinking out loud - the `Optional.of(PARTITIONED)` parameter is already 
> > helping you perform a CAS.  If we had a _true_ CAS operation, you could 
> > very safely avoid the need to cancel futures.  You can't do that with the 
> > status only since the task can now cycle with `PARTITIONED`.
> >     
> >     What do you think about using the last transition timestamp to do this?
> >     
> >     ```java
> >     long casLastTransitionMs = Tasks.getLatestEvent(task).getTimestamp();
> >     executor.schedule(() -> {
> >       storage.write(... {
> >         if (casLastTransition == Tasks.getLatestEvent(task).getTimestamp()) 
> > {
> >           // proceed
> >         } else {
> >           // cas failed, the task has since transitioned
> >         }
> >       }
> >     
> >     }, delay, TimeUnit.SECONDS);
> >     ```

The other motivation (other than correctness, which I'm pretty sure we now 
have) for removing and cancelling futures was to get the tasks off the heap. Of 
course, that's not actually happening here because I'm not using a 
ScheduledThreadPoolExecutor (and setting removeOnCancelPolicy to true). With 
the AsyncUtil suggestion above, I'll be sure to set that to make it clear what 
I'm doing. 

If we still think it's necessary to include this timestamp check, I'm happy to 
do that (although we'd have to refetch the task from storage inside the 
storage.write of course).


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63536/#review190607
-----------------------------------------------------------


On Nov. 9, 2017, 12:48 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/63536/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2017, 12:48 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Bill 
> Farner.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> This is my prototype code for adding partition-awareness to Aurora. There is 
> a proposal document to accompany this here: 
> https://docs.google.com/document/d/1E3GlsVTJLEMAkDWk2_PTxzkRZcapb8nF_5q5AADQI7g/edit#
> 
> I'd like feedback on the high-level approach before adding unit tests, 
> metrics, logging, etc.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> c869493c06499340d73e1b219e17a0d7d8b5ead9 
>   examples/vagrant/upstart/aurora-scheduler.conf 
> 5ca3caef03b6632cd4dbf47711b1ef183f6a6449 
>   src/main/java/org/apache/aurora/scheduler/base/Conversions.java 
> 33cc012a2cad929b0dd1ce236597b870cfc5aba0 
>   
> src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java
>  5e83b2acdde7198d16427d4031e9772f78612554 
>   src/main/java/org/apache/aurora/scheduler/state/PartitionManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/SideEffect.java 
> b91a0852e968b4aa9d74801601671cb61af3648a 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 9989ed441cd9bc442e6472768880ce7924c3bdd9 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> d72f055749801ee9d6f31f60857cc795d0ed7ab1 
>   src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java 
> eb4fe7d78ad1e6ec430c428df527bd0cf3a053c1 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> b79ae56bee0e5692cacf1e66f4a4126b06aaffdc 
>   src/main/python/apache/aurora/config/schema/base.py 
> 18ce826363009e1cc0beac5cce4edf42610d487e 
>   src/main/python/apache/aurora/config/thrift.py 
> bedf8cd6641e1b1a930602791b758d584af4891c 
> 
> 
> Diff: https://reviews.apache.org/r/63536/diff/2/
> 
> 
> Testing
> -------
> 
> Manual testing in Vagrant by stopping and starting the Mesos agent. With 
> three jobs:
> 
> 1) No PartitionPolicy (verified existing behavior of moving from PARTITIONED 
> directly to LOST)
> 2) PartitionPolicy with custom delay_secs (verified sat in PARTITIONED for a 
> while before moving to LOST)
> 3) PartitionPolicy with reschedule=False (verified sat in PARTITIONED 
> indefinitely)
> 
> I also verified tasks are able to transition to various states (back to 
> RUNNING or moving to FAILED, etc.) when you turn the agent back on.
> 
> 
> File Attachments
> ----------------
> 
> Task in PARTITIONED state
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/02c7fc72-b11d-4ef9-a86b-914e748cad99__Screen_Shot_2017-11-07_at_11.23.41_AM.png
> Task back into running when partition resolved
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/a0413f54-1572-4410-a386-0a22e78fab13__Screen_Shot_2017-11-07_at_11.26.02_AM.png
> Compaction of PARTITIONED cycles (note timestamps)
>   
> https://reviews.apache.org/media/uploaded/files/2017/11/07/edec32e5-b3ec-4fdc-b93f-5449519805ae__Screen_Shot_2017-11-07_at_11.27.47_AM.png
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Reply via email to