> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other 
> > potential vulnerabilities.
> > 
> > A quick filter to find other potential sources deserving a glance:
> >     $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> >     src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> >     src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> >     src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> >     src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> >     src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> >     src/main/java/org/apache/aurora/scheduler/TaskVars.java
> >     
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
> 
> Kevin Sweeney wrote:
>     My proposal is to add runtime deadlock detection for these cases via 
> CycleDetectingLockFactory. I have runtime evidence that this deadlock exists 
> and would like to keep this change small in scope. Happy to add this as a 
> followup item to AURORA-800.
> 
> Bill Farner wrote:
>     That effort shouldn't cause us to skip due diligence of a skim for other 
> places we're vulnerable.

A cursory look through doesn't reveal any immediate concerns. Preemptor does 
acquire the storage lock in a synchronized method; however the only caller of 
Preemptor always holds the storage write lock. Others just use synchronization 
to ensure consistent internal state.

Note I used 'synchronized ' to avoid synchronizedMap.
% grep -Rl 'synchronized '  src/main/java | xargs grep -lE 
'(.write|.consistentRead|.consistentFetchTasks)'
src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
src/main/java/org/apache/aurora/scheduler/TaskVars.java
src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java

Of course, this doesn't reveal cases where a call to a dependency might cause 
the storage lock to be acquired, nor does it protect against accidental 
introduction of new deadlocks so AURORA-800 is still relevant.


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
-----------------------------------------------------------


On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> -----------------------------------------------------------
> 
> (Updated Oct. 8, 2014, 10:27 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
>     https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -----
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>

Reply via email to