Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-08 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55854
---

Ship it!


Thanks for the extra look!

- Bill Farner


On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> ---
> 
> (Updated Oct. 8, 2014, 5:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>



Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-08 Thread Kevin Sweeney


> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other 
> > potential vulnerabilities.
> > 
> > A quick filter to find other potential sources deserving a glance:
> > $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> > src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java
> > 
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
> 
> Kevin Sweeney wrote:
> My proposal is to add runtime deadlock detection for these cases via 
> CycleDetectingLockFactory. I have runtime evidence that this deadlock exists 
> and would like to keep this change small in scope. Happy to add this as a 
> followup item to AURORA-800.
> 
> Bill Farner wrote:
> That effort shouldn't cause us to skip due diligence of a skim for other 
> places we're vulnerable.

A cursory look through doesn't reveal any immediate concerns. Preemptor does 
acquire the storage lock in a synchronized method; however the only caller of 
Preemptor always holds the storage write lock. Others just use synchronization 
to ensure consistent internal state.

Note I used 'synchronized ' to avoid synchronizedMap.
% grep -Rl 'synchronized '  src/main/java | xargs grep -lE 
'(.write|.consistentRead|.consistentFetchTasks)'
src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
src/main/java/org/apache/aurora/scheduler/TaskVars.java
src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java

Of course, this doesn't reveal cases where a call to a dependency might cause 
the storage lock to be acquired, nor does it protect against accidental 
introduction of new deadlocks so AURORA-800 is still relevant.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
---


On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> ---
> 
> (Updated Oct. 8, 2014, 10:27 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>



Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-08 Thread Bill Farner


> On Oct. 8, 2014, 4:19 p.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other 
> > potential vulnerabilities.
> > 
> > A quick filter to find other potential sources deserving a glance:
> > $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> > src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java
> > 
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
> 
> Kevin Sweeney wrote:
> My proposal is to add runtime deadlock detection for these cases via 
> CycleDetectingLockFactory. I have runtime evidence that this deadlock exists 
> and would like to keep this change small in scope. Happy to add this as a 
> followup item to AURORA-800.

That effort shouldn't cause us to skip due diligence of a skim for other places 
we're vulnerable.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
---


On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> ---
> 
> (Updated Oct. 8, 2014, 5:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>



Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-08 Thread Kevin Sweeney


> On Oct. 8, 2014, 9:19 a.m., Bill Farner wrote:
> > While our minds are on deadlock risks, it's a good idea to assess other 
> > potential vulnerabilities.
> > 
> > A quick filter to find other potential sources deserving a glance:
> > $ grep -Rl synchronized src/main/java | xargs grep -l Storage
> > src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
> > src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
> > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > src/main/java/org/apache/aurora/scheduler/TaskVars.java
> > 
> > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java

My proposal is to add runtime deadlock detection for these cases via 
CycleDetectingLockFactory. I have runtime evidence that this deadlock exists 
and would like to keep this change small in scope. Happy to add this as a 
followup item to AURORA-800.


- Kevin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
---


On Oct. 8, 2014, 10:27 a.m., Kevin Sweeney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> ---
> 
> (Updated Oct. 8, 2014, 10:27 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>



Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-08 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55827
---

Ship it!


Ship It!

- David McLaughlin


On Oct. 8, 2014, 5:27 p.m., Kevin Sweeney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> ---
> 
> (Updated Oct. 8, 2014, 5:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.
> 
> 
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>



Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-08 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/
---

(Updated Oct. 8, 2014, 10:27 a.m.)


Review request for Aurora, David McLaughlin, Bill Farner, and Zameer Manji.


Bugs: AURORA-801
https://issues.apache.org/jira/browse/AURORA-801


Repository: aurora


Description
---

Drop syncrhonized from JobUpdateEventSubscriber

This fixes a startup deadlock.


Diffs
-

  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java 
49d8b7a6c4adc4c58049c439bd09019c9e6885b1 

Diff: https://reviews.apache.org/r/26422/diff/


Testing
---

./gradlew -Pq build

Manually verified that all delegated calls to the JobUpdateController are 
already protected by the storage write-lock.

Rather than add a potentially-flaky regression test (like the one added in 
https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).


Thanks,

Kevin Sweeney



Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-08 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55803
---


While our minds are on deadlock risks, it's a good idea to assess other 
potential vulnerabilities.

A quick filter to find other potential sources deserving a glance:
$ grep -Rl synchronized src/main/java | xargs grep -l Storage
src/main/java/org/apache/aurora/scheduler/async/GcExecutorLauncher.java
src/main/java/org/apache/aurora/scheduler/async/Preemptor.java
src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java
src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
src/main/java/org/apache/aurora/scheduler/TaskVars.java

src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java

- Bill Farner


On Oct. 7, 2014, 7:28 p.m., Kevin Sweeney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> ---
> 
> (Updated Oct. 7, 2014, 7:28 p.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>



Re: Review Request 26422: Drop syncrhonized from JobUpdateEventSubscriber

2014-10-07 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26422/#review55700
---

Ship it!


Ship It!

- Zameer Manji


On Oct. 7, 2014, 12:28 p.m., Kevin Sweeney wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26422/
> ---
> 
> (Updated Oct. 7, 2014, 12:28 p.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-801
> https://issues.apache.org/jira/browse/AURORA-801
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Drop syncrhonized from JobUpdateEventSubscriber
> 
> This fixes a startup deadlock.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateEventSubscriber.java
>  49d8b7a6c4adc4c58049c439bd09019c9e6885b1 
> 
> Diff: https://reviews.apache.org/r/26422/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew -Pq build
> 
> Manually verified that all delegated calls to the JobUpdateController are 
> already protected by the storage write-lock.
> 
> Rather than add a potentially-flaky regression test (like the one added in 
> https://reviews.apache.org/r/25556/) I'd prefer to prioritize adding runtime 
> deadlock detection (https://issues.apache.org/jira/browse/AURORA-800).
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>