Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-20 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/
---

(Updated Jan. 20, 2016, 2:36 p.m.)


Review request for Aurora, John Sirois and Maxim Khutornenko.


Changes
---

Review Feedback.


Bugs: AURORA-1582
https://issues.apache.org/jira/browse/AURORA-1582


Repository: aurora


Description
---

Task pruning is critical to the operation of a large cluster. Failure to prune 
tasks can lead to storage growing in unexpected ways leading to scheduling 
slowdown. This patch adds shutdown on task pruning failure to prevent the 
scheduler from entering an undefined state.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/pruning/PruningModule.java 
735199ac1ab343c24471890aa330d6635c26 
  src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
2064089937f5178b1413d386a312f4173a0e35fb 
  src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
295960f13693c6ba0d7075a8ef7f9680a91ae69d 

Diff: https://reviews.apache.org/r/42332/diff/


Testing
---

./gradlew build -Pq


Thanks,

Zameer Manji



Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-15 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114830
---



src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java (line 
94)


Ultra drive-by review.  But have you explored making this class a service 
and letting general service failure handle the ripple to application teardown?  
IMHO that's a nice abstraction if it work out.


- Bill Farner


On Jan. 14, 2016, 4:53 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> ---
> 
> (Updated Jan. 14, 2016, 4:53 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
> https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Task pruning is critical to the operation of a large cluster. Failure to 
> prune tasks can lead to storage growing in unexpected ways leading to 
> scheduling slowdown. This patch adds shutdown on task pruning failure to 
> prevent the scheduler from entering an undefined state.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-15 Thread John Sirois


> On Jan. 15, 2016, 4:33 p.m., John Sirois wrote:
> > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java, 
> > line 152
> > 
> >
> > You could centralize this handling in deleteTasks - it has all the ids 
> > to give as useful a message as you have here, and that localizes the 
> > otherwise overly broad looking RTE catch.  It'll clean up the temp you had 
> > to introduce below as well.  Further, it'll give you 1 nice spot to add a 
> > comment explaining why termination on 1 failed prune is sane; ie: you can 
> > point out the inevitable history growth issue and that rolling back is 
> > better than waiting for the failure and then rolling back.
> 
> Zameer Manji wrote:
> The objective is to catch any unexpected exceptions in the Runnable given 
> to the executor. For example the exception that triggered this investigation 
> did not come from delete tasks but from sorting the set of tasks to delete:
> 
> FluentIterable
> .from(Tasks.LATEST_ACTIVITY.sortedCopy(inactiveTasks))
> .filter(safeToDelete)
> .limit(tasksToPrune)
> .transform(Tasks::id)
> .toSet();
> 
> 
> I agree that this approach is not ideal. I'm not sure on how to change 
> this without changing the executor to have some sort of ListenableFuture that 
> does something `onFailure`.

Right, and that sounds like a great idea to me, it would be nice to have a more 
centralized way to deal with this and a ListenableFutures provide it:
  
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/ListeningExecutorService.html
  
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/MoreExecutors.html#listeningDecorator(java.util.concurrent.ExecutorService)
  
Not sure if you want to pursue that idea and drop this or do this change, add 
the larger change then flip over to it later.


- John


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114795
---


On Jan. 14, 2016, 5:53 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> ---
> 
> (Updated Jan. 14, 2016, 5:53 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
> https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Task pruning is critical to the operation of a large cluster. Failure to 
> prune tasks can lead to storage growing in unexpected ways leading to 
> scheduling slowdown. This patch adds shutdown on task pruning failure to 
> prevent the scheduler from entering an undefined state.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-15 Thread Zameer Manji


> On Jan. 15, 2016, 3:33 p.m., John Sirois wrote:
> > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java, 
> > line 152
> > 
> >
> > You could centralize this handling in deleteTasks - it has all the ids 
> > to give as useful a message as you have here, and that localizes the 
> > otherwise overly broad looking RTE catch.  It'll clean up the temp you had 
> > to introduce below as well.  Further, it'll give you 1 nice spot to add a 
> > comment explaining why termination on 1 failed prune is sane; ie: you can 
> > point out the inevitable history growth issue and that rolling back is 
> > better than waiting for the failure and then rolling back.

The objective is to catch any unexpected exceptions in the Runnable given to 
the executor. For example the exception that triggered this investigation did 
not come from delete tasks but from sorting the set of tasks to delete:

FluentIterable
.from(Tasks.LATEST_ACTIVITY.sortedCopy(inactiveTasks))
.filter(safeToDelete)
.limit(tasksToPrune)
.transform(Tasks::id)
.toSet();


I agree that this approach is not ideal. I'm not sure on how to change this 
without changing the executor to have some sort of ListenableFuture that does 
something `onFailure`.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114795
---


On Jan. 14, 2016, 4:53 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> ---
> 
> (Updated Jan. 14, 2016, 4:53 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
> https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Task pruning is critical to the operation of a large cluster. Failure to 
> prune tasks can lead to storage growing in unexpected ways leading to 
> scheduling slowdown. This patch adds shutdown on task pruning failure to 
> prevent the scheduler from entering an undefined state.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-15 Thread John Sirois

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114795
---



src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java (line 
152)


You could centralize this handling in deleteTasks - it has all the ids to 
give as useful a message as you have here, and that localizes the otherwise 
overly broad looking RTE catch.  It'll clean up the temp you had to introduce 
below as well.  Further, it'll give you 1 nice spot to add a comment explaining 
why termination on 1 failed prune is sane; ie: you can point out the inevitable 
history growth issue and that rolling back is better than waiting for the 
failure and then rolling back.


- John Sirois


On Jan. 14, 2016, 5:53 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> ---
> 
> (Updated Jan. 14, 2016, 5:53 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
> https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Task pruning is critical to the operation of a large cluster. Failure to 
> prune tasks can lead to storage growing in unexpected ways leading to 
> scheduling slowdown. This patch adds shutdown on task pruning failure to 
> prevent the scheduler from entering an undefined state.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-15 Thread Zameer Manji


> On Jan. 15, 2016, 9:57 a.m., Maxim Khutornenko wrote:
> > History prunner is regsistered as a service. Shouldn't its failure triger a 
> > shutdown according to this: https://reviews.apache.org/r/39631?

`TaskHistoryPruner` is not a service, only `JobUpdateHistoryPruner` is. Even if 
this was a service this error would not shut it down because the exception 
occurs on a separate thread. In this case the exception occurs on the executor 
threads provided by `AsyncModule`. Previously, a failure there would just be 
logged (and I recently added support for a metric). I talked with John about 
this and we agreed that clients of this executor should be responsible for 
determining if an exception in the Runnable is fatal or not.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114732
---


On Jan. 14, 2016, 4:53 p.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> ---
> 
> (Updated Jan. 14, 2016, 4:53 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
> https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Task pruning is critical to the operation of a large cluster. Failure to 
> prune tasks can lead to storage growing in unexpected ways leading to 
> scheduling slowdown. This patch adds shutdown on task pruning failure to 
> prevent the scheduler from entering an undefined state.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-15 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114732
---


History prunner is regsistered as a service. Shouldn't its failure triger a 
shutdown according to this: https://reviews.apache.org/r/39631?

- Maxim Khutornenko


On Jan. 15, 2016, 12:53 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> ---
> 
> (Updated Jan. 15, 2016, 12:53 a.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
> https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Task pruning is critical to the operation of a large cluster. Failure to 
> prune tasks can lead to storage growing in unexpected ways leading to 
> scheduling slowdown. This patch adds shutdown on task pruning failure to 
> prevent the scheduler from entering an undefined state.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>



Re: Review Request 42332: Trigger shutdown on task pruning failure.

2016-01-14 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review114623
---

Ship it!


Master (4dff5da) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Jan. 15, 2016, 12:53 a.m., Zameer Manji wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> ---
> 
> (Updated Jan. 15, 2016, 12:53 a.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
> https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Task pruning is critical to the operation of a large cluster. Failure to 
> prune tasks can lead to storage growing in unexpected ways leading to 
> scheduling slowdown. This patch adds shutdown on task pruning failure to 
> prevent the scheduler from entering an undefined state.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew build -Pq
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>