> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
> >  line 273
> > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273>
> >
> >     I think avoid acquisition of a write lock here is a good goal to aim 
> > for.  If, for example, we are already holding the write lock for a large 
> > snapshot, we could cause it to appear as though we have not received a 
> > pulse for an extended duration.
> >     
> >     Also, given that there is likely to be an automated system on the other 
> > end, we should be somewhat defensive to a high call frequency.  This 
> > suggests to me that potentially-expensive read operations should be avoided 
> > if possible.
> >     
> >     For these reasons, i think it would be wise to decouple recording of 
> > pulses from reacting to them.  It's also apparent that there is currently 
> > no asynchronous transition to the 'blocked' states, they're triggered 
> > either by a tardy pulse or another external action.  I think this would be 
> > surprising behavior, and the scheduler should automatically transition to 
> > these states without any external input.

"I think avoid acquisition of a write lock here is a good goal to aim for.  If, 
for example, we are already holding the write lock for a large snapshot, we 
could cause it to appear as though we have not received a pulse for an extended 
duration." - makes sense, I can try to split the recording part from the status 
update via an async action. 

"Also, given that there is likely to be an automated system on the other end, 
we should be somewhat defensive to a high call frequency.  This suggests to me 
that potentially-expensive read operations should be avoided if possible." - I 
don't see how this would be possible without breaking the RPC contract as I 
still have to pull JobUpdateDetails in order to respond to the pulse with 
proper status message (FINISHED|PAUSED|OK).


"It's also apparent that there is currently no asynchronous transition to the 
'blocked' states, they're triggered either by a tardy pulse or another external 
action.  I think this would be surprising behavior, and the scheduler should 
automatically transition to these states without any external input." - the 
transition to BLOCKED state happens during the evaluation loop similar to other 
internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and 
etc.). Transitions to these states do not happen instantenously and are 
triggered by instance activities (i.e. task state change events). I don't see 
why BLOCKED state should be any different. The update is not technically alive 
until the evaluation event arrives. Changing this behavior just for heartbeats 
is not worth the extra complexity and will be inconsistent with the rest of the 
feature.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
-----------------------------------------------------------


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 5:24 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated 
> updates get blocked until a pulse arrives. An update then becomes active and 
> proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
> terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the 
> alternative is a state machine change, which is much hairier and will require 
> more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
>   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
> 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
> b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
>   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
> d479d20259f284528b2291e2e486b36d8e47fe5e 
>   
> src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
>  60f535988a20638fb16515d5027bfa65f1afb73c 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
> d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
> 76460f95e058181b711fb6869f5a34c1d5bdfe31 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   
> src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
>  f9c9ceddc559b43b4a5c45c745d54ff47484edde 
>   
> src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
>  ca7c0c2675477cc727ca006697665f997972dfde 
>   
> src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java
>  89765ac3d34a827d3748fb96a275d78e9d1b8b72 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 41d422939209d0808235128e4242c11e8ef25969 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
> 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Reply via email to