> On Jan. 27, 2015, 5:49 p.m., Joshua Cohen wrote: > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, > > line 227 > > <https://reviews.apache.org/r/30225/diff/1/?file=832014#file832014line227> > > > > Given how sensitive we are to storage lock contention, is there any > > work in this block that can be done outside of the lock? From a casual > > glance the only potentially time-consuming (non-write) operation is the > > fetch from the job update store, but I'm not sure how intensive an > > operation that will be (or the impact of performing it outside of the > > context of a lock). > > Maxim Khutornenko wrote: > Moving a fetch call out of the write transaction would create a potential > for a race that may compromise the feature and create hard to reason/trace > corner cases. This is not worth the arguable gain in contention improvement > in my opinion. > > David McLaughlin wrote: > AFAICT the worst race condition is sending a heartbeat for a job that is > already complete. Are there other race conditions we are protecting against? > > Also, I don't see the value in persisting the heartbeats. What is the > rationale for this?
"AFAICT the worst race condition is sending a heartbeat for a job that is already complete. Are there other race conditions we are protecting against?" - anything inside the evaluation logic is sensitive to the data provided in the details pull. This ranges from an invalid "OK" request sent back for a completed update to an internal updater failure due to outdated expectations. This pattern of querying inside of a write transaction is common in the updater (e.g. look at changeJobUpdateStatus() called any time an update is evaluated) and I'd rather not second-guess the correctness of the split. "Also, I don't see the value in persisting the heartbeats. What is the rationale for this?" - what makes you think they are persisted? The heartbeats are stored in-memory only (look for coordinatedUpdateStates map). - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review69834 ----------------------------------------------------------- On Jan. 23, 2015, 8:37 p.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30225/ > ----------------------------------------------------------- > > (Updated Jan. 23, 2015, 8:37 p.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. > > > Bugs: AURORA-1010 > https://issues.apache.org/jira/browse/AURORA-1010 > > > Repository: aurora > > > Description > ------- > > Added pulsing support into the JobUpdateController. The qualified coordinated > updates get blocked until a pulse arrives. An update then becomes active and > proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a > terminal state (whichever comes first). > > Not particularly happy with plumbing through OneWayJobUpdater but the > alternative is a state machine change, which is much hairier and will require > more changes in the JobUpdaterController. Going with the minimal diff here. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java > d3b30d48b76d8d7c64cda006a34f7ed3296526f2 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > a992938d4e12b20f81608be6bbdc24c0a211c3fd > src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java > 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 4c827b183a87b4d97774edbfaa960bd1c3de72a5 > src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java > 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 > > Diff: https://reviews.apache.org/r/30225/diff/ > > > Testing > ------- > > ./gradlew -Pq build > > > Thanks, > > Maxim Khutornenko > >