-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42332/#review115663
-----------------------------------------------------------



src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java (lines 
152 - 161)
<https://reviews.apache.org/r/42332/#comment176658>

    Am I reading this as a busy loop consuming 100% thread CPU waiting for 
something that may never happen? I don't think this is an acceptable solution.
    
    Perhaps it's time to refactor task prunner into an 
AbstractScheduledService? I always felt task prunner approach of holding on to 
task IDs for 2 days just to act once on them isn't very efficient. What if 
instead of acting on every particular task ID we have a periodic (say every 30 
seconds) run loop to prune job keys instead?
    
    Implementation-wise, it could be a Set of unique job keys that we populate 
on every TaskStateChange event. A runOneIteration() would poll that set and 
apply both latency and max_per_job thresholds for all related terminal tasks 
within the same iteration.
    
    The only downside for the above is a somewhat increased history count 
between the cleanup runs but given that our current thresholds are chosen 
mostly arbitrarily I think that should be acceptable.


- Maxim Khutornenko


On Jan. 20, 2016, 10:39 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42332/
> -----------------------------------------------------------
> 
> (Updated Jan. 20, 2016, 10:39 p.m.)
> 
> 
> Review request for Aurora, John Sirois and Maxim Khutornenko.
> 
> 
> Bugs: AURORA-1582
>     https://issues.apache.org/jira/browse/AURORA-1582
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Task pruning is key to operating a large cluster and failure to prune should 
> trigger shutdown to prevent unbounded growth of storage. This patch turns 
> `TaskHistoryPruner` into a service which propagates failure from failed 
> pruning attempts towards the `ServiceManager`. Also completing a TODO which 
> removes a test for behaviour that is very awkward to test for.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/pruning/PruningModule.java 
> 735199ac1ccccab343c24471890aa330d6635c26 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 2064089937f5178b1413d386a312f4173a0e35fb 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java 
> 295960f13693c6ba0d7075a8ef7f9680a91ae69d 
> 
> Diff: https://reviews.apache.org/r/42332/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew build -Pq
> e2e tests
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>

Reply via email to