> On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, > > line 273 > > <https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273> > > > > I think avoid acquisition of a write lock here is a good goal to aim > > for. If, for example, we are already holding the write lock for a large > > snapshot, we could cause it to appear as though we have not received a > > pulse for an extended duration. > > > > Also, given that there is likely to be an automated system on the other > > end, we should be somewhat defensive to a high call frequency. This > > suggests to me that potentially-expensive read operations should be avoided > > if possible. > > > > For these reasons, i think it would be wise to decouple recording of > > pulses from reacting to them. It's also apparent that there is currently > > no asynchronous transition to the 'blocked' states, they're triggered > > either by a tardy pulse or another external action. I think this would be > > surprising behavior, and the scheduler should automatically transition to > > these states without any external input.
"I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration." - makes sense, I can try to split the recording part from the status update via an async action. "Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible." - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). "It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input." - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review71063 ----------------------------------------------------------- On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/30225/ > ----------------------------------------------------------- > > (Updated Feb. 4, 2015, 5:24 p.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. > > > Bugs: AURORA-1010 > https://issues.apache.org/jira/browse/AURORA-1010 > > > Repository: aurora > > > Description > ------- > > Added pulsing support into the JobUpdateController. The qualified coordinated > updates get blocked until a pulse arrives. An update then becomes active and > proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a > terminal state (whichever comes first). > > Not particularly happy with plumbing through OneWayJobUpdater but the > alternative is a state machine change, which is much hairier and will require > more changes in the JobUpdaterController. Going with the minimal diff here. > > > Diffs > ----- > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > 4f603f9e7ed004e53937a45ea8edf7241e15f5cf > src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java > 7f2ec71d744d5fafac84291cc79feba3398d0e1e > src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java > b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 > src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java > d479d20259f284528b2291e2e486b36d8e47fe5e > > src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java > 60f535988a20638fb16515d5027bfa65f1afb73c > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java > d3b30d48b76d8d7c64cda006a34f7ed3296526f2 > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > ec9d1e07abca1bf30262e1c0f741a34741e96f5d > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java > 76460f95e058181b711fb6869f5a34c1d5bdfe31 > src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java > 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 > > src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml > f9c9ceddc559b43b4a5c45c745d54ff47484edde > > src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java > ca7c0c2675477cc727ca006697665f997972dfde > > src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java > 89765ac3d34a827d3748fb96a275d78e9d1b8b72 > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 41d422939209d0808235128e4242c11e8ef25969 > src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java > 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 > > Diff: https://reviews.apache.org/r/30225/diff/ > > > Testing > ------- > > ./gradlew -Pq build > > > Thanks, > > Maxim Khutornenko > >
