> On Jan. 28, 2015, 7:41 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
> >  lines 259-263
> > <https://reviews.apache.org/r/30225/diff/1/?file=832014#file832014line259>
> >
> >     I am unsure why this is being called inside pulse. Once pulse is 
> > activated, only the absence of a pulse can modify the update, right? We 
> > don't resume a paused update by receiving a pulse. 
> >     
> >     So surely the last pulse time would be checked externally to the method 
> > that performs the pulse? 
> >     
> >     If we can remove this, you can get rid of the write lock completely 
> > here, because all you need are strongly consistent reads (which we have) to 
> > accurately update the cooridinatedUpdateStates map correctly.
> 
> Maxim Khutornenko wrote:
>     An update blocked (not PAUSED) due to a missed pulse can be unblocked by 
> a new pulse. This covers a few important design desisions:
>     - An update can be created blocked by default awaiting for the first 
> pulse to start its progress;
>     - An occasional network partition/delay will not require an explicit 
> external service operation to resume;
>     - A scheduler restart is treated the same as initial update creation - an 
> update is rehydrated and waits for a pulse to resume;
>     
>     More details and scenarios here: 
> https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md
> 
> David McLaughlin wrote:
>     How do we show to the user (via client output or UI) that the update is 
> currently blocked?
> 
> Maxim Khutornenko wrote:
>     One possible way is described here: 
> https://issues.apache.org/jira/browse/AURORA-1049

I don't think this is sufficient. We'd need auditing to explain to users why an 
update was paused (blocked) for a given time, not just the current status.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review70058
-----------------------------------------------------------


On Jan. 23, 2015, 8:37 p.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30225/
> -----------------------------------------------------------
> 
> (Updated Jan. 23, 2015, 8:37 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
> 
> 
> Bugs: AURORA-1010
>     https://issues.apache.org/jira/browse/AURORA-1010
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Added pulsing support into the JobUpdateController. The qualified coordinated 
> updates get blocked until a pulse arrives. An update then becomes active and 
> proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
> terminal state (whichever comes first).
> 
> Not particularly happy with plumbing through OneWayJobUpdater but the 
> alternative is a state machine change, which is much hairier and will require 
> more changes in the JobUpdaterController. Going with the minimal diff here.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
> d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  a992938d4e12b20f81608be6bbdc24c0a211c3fd 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 4c827b183a87b4d97774edbfaa960bd1c3de72a5 
>   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
> 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
> 
> Diff: https://reviews.apache.org/r/30225/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Reply via email to