> On Jan. 28, 2015, 7:41 p.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, > > lines 259-263 > > <https://reviews.apache.org/r/30225/diff/1/?file=832014#file832014line259> > > > > I am unsure why this is being called inside pulse. Once pulse is > > activated, only the absence of a pulse can modify the update, right? We > > don't resume a paused update by receiving a pulse. > > > > So surely the last pulse time would be checked externally to the method > > that performs the pulse? > > > > If we can remove this, you can get rid of the write lock completely > > here, because all you need are strongly consistent reads (which we have) to > > accurately update the cooridinatedUpdateStates map correctly. > > Maxim Khutornenko wrote: > An update blocked (not PAUSED) due to a missed pulse can be unblocked by > a new pulse. This covers a few important design desisions: > - An update can be created blocked by default awaiting for the first > pulse to start its progress; > - An occasional network partition/delay will not require an explicit > external service operation to resume; > - A scheduler restart is treated the same as initial update creation - an > update is rehydrated and waits for a pulse to resume; > > More details and scenarios here: > https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md > > David McLaughlin wrote: > How do we show to the user (via client output or UI) that the update is > currently blocked? > > Maxim Khutornenko wrote: > One possible way is described here: > https://issues.apache.org/jira/browse/AURORA-1049 > > David McLaughlin wrote: > I don't think this is sufficient. We'd need auditing to explain to users > why an update was paused (blocked) for a given time, not just the current > status.
That would require persistance and changing the actual status of the job. I'd rather not introduce a new state that would only be applicable to specific update configurations. The more important here is the visibility into the internal "transient state" to troubleshoot a coordinated job unable to make progress. - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review70058 ----------------------------------------------------------- On Jan. 23, 2015, 8:37 p.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30225/ > ----------------------------------------------------------- > > (Updated Jan. 23, 2015, 8:37 p.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. > > > Bugs: AURORA-1010 > https://issues.apache.org/jira/browse/AURORA-1010 > > > Repository: aurora > > > Description > ------- > > Added pulsing support into the JobUpdateController. The qualified coordinated > updates get blocked until a pulse arrives. An update then becomes active and > proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a > terminal state (whichever comes first). > > Not particularly happy with plumbing through OneWayJobUpdater but the > alternative is a state machine change, which is much hairier and will require > more changes in the JobUpdaterController. Going with the minimal diff here. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java > d3b30d48b76d8d7c64cda006a34f7ed3296526f2 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > a992938d4e12b20f81608be6bbdc24c0a211c3fd > src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java > 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 4c827b183a87b4d97774edbfaa960bd1c3de72a5 > src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java > 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 > > Diff: https://reviews.apache.org/r/30225/diff/ > > > Testing > ------- > > ./gradlew -Pq build > > > Thanks, > > Maxim Khutornenko > >