> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote: > > While our minds are on deadlock risks, it's a good idea to assess other > > potential vulnerabilities. > > > > A quick filter to find other potential sources deserving a glance: > > $ grep -Rl synchronized src/main/java | xargs grep -l Storage > > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java > > src/main/java/org/apache/aurora/scheduler/async/Preemptor.java > > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java > > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java > > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java > > src/main/java/org/apache/aurora/scheduler/TaskVars.java > > > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > > Kevin Sweeney wrote: > My proposal is to add runtime deadlock detection for these cases via > CycleDetectingLockFactory. I have runtime evidence that this deadlock exists > and would like to keep this change small in scope. Happy to add this as a > followup item to AURORA-800. > > Bill Farner wrote: > That effort shouldn't cause us to skip due diligence of a skim for other > places we're vulnerable.
A cursory look through doesn't reveal any immediate concerns. Preemptor does acquire the storage lock in a synchronized method; however the only caller of Preemptor always holds the storage write lock. Others just use synchronization to ensure consistent internal state. Note I used 'synchronized ' to avoid synchronizedMap. % grep -Rl 'synchronized ' src/main/java | xargs grep -lE '(.write|.consistentRead|.consistentFetchTasks)' src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java src/main/java/org/apache/aurora/scheduler/async/Preemptor.java src/main/java/org/apache/aurora/scheduler/TaskVars.java src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java Of course, this doesn't reveal cases where a call to a dependency might cause the storage lock to be acquired, nor does it protect against accidental introduction of new deadlocks so AURORA-800 is still relevant. - Kevin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26422/#review55803 ----------------------------------------------------------- On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/26422/ > ----------------------------------------------------------- > > (Updated Oct. 8, 2014, 10:27 a.m.) > > > Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji. > > > Bugs: AURORA-801 > https://issues.apache.org/jira/browse/AURORA-801 > > > Repository: aurora > > > Description > ------- > > Drop syncrhonized from JobUpdateEventSubscriber > > This fixes a startup deadlock. > > > Diffs > ----- > > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java > 49d8b7a6c4adc4c58049c439bd09019c9e6885b1 > > Diff: https://reviews.apache.org/r/26422/diff/ > > > Testing > ------- > > ./gradlew -Pq build > > Manually verified that all delegated calls to the JobUpdateController are > already protected by the storage write-lock. > > Rather than add a potentially-flaky regression test (like the one added in > https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime > deadlock detection (https://issues.apache.org/jira/browse/AURORA-800). > > > Thanks, > > Kevin Sweeney > >