----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42332/#review115663 -----------------------------------------------------------
src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java (lines 152 - 161) <https://reviews.apache.org/r/42332/#comment176658> Am I reading this as a busy loop consuming 100% thread CPU waiting for something that may never happen? I don't think this is an acceptable solution. Perhaps it's time to refactor task prunner into an AbstractScheduledService? I always felt task prunner approach of holding on to task IDs for 2 days just to act once on them isn't very efficient. What if instead of acting on every particular task ID we have a periodic (say every 30 seconds) run loop to prune job keys instead? Implementation-wise, it could be a Set of unique job keys that we populate on every TaskStateChange event. A runOneIteration() would poll that set and apply both latency and max_per_job thresholds for all related terminal tasks within the same iteration. The only downside for the above is a somewhat increased history count between the cleanup runs but given that our current thresholds are chosen mostly arbitrarily I think that should be acceptable. - Maxim Khutornenko On Jan. 20, 2016, 10:39 p.m., Zameer Manji wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/42332/ > ----------------------------------------------------------- > > (Updated Jan. 20, 2016, 10:39 p.m.) > > > Review request for Aurora, John Sirois and Maxim Khutornenko. > > > Bugs: AURORA-1582 > https://issues.apache.org/jira/browse/AURORA-1582 > > > Repository: aurora > > > Description > ------- > > Task pruning is key to operating a large cluster and failure to prune should > trigger shutdown to prevent unbounded growth of storage. This patch turns > `TaskHistoryPruner` into a service which propagates failure from failed > pruning attempts towards the `ServiceManager`. Also completing a TODO which > removes a test for behaviour that is very awkward to test for. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/pruning/PruningModule.java > 735199ac1ccccab343c24471890aa330d6635c26 > src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java > 2064089937f5178b1413d386a312f4173a0e35fb > > src/test/java/org/apache/aurora/scheduler/pruning/TaskHistoryPrunerTest.java > 295960f13693c6ba0d7075a8ef7f9680a91ae69d > > Diff: https://reviews.apache.org/r/42332/diff/ > > > Testing > ------- > > ./gradlew build -Pq > e2e tests > > > Thanks, > > Zameer Manji > >