> On June 1, 2017, 12:53 a.m., David McLaughlin wrote: > > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java > > Line 388 (original), 411 (patched) > > <https://reviews.apache.org/r/59699/diff/1/?file=1736080#file1736080line411> > > > > I'm not sure this is the right approach. I don't think we want to skip > > writing transactions to the replicated log. > > > > All of the APIs down to the storage layer already support batching so > > seems like we just want to move the delete command outside of the code that > > generates pubsub events. E.g. call taskStore.deleteTasks(taskIds) here when > > we receive the batch from the prune tasks endpoint: > > > > > > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L371 > > > > And then remove the taskStore.deleteTasks from the PubsubEvent factory > > method (have no idea why it's in there). > > Kai Huang wrote: > Moving the taskStore.deleteTasks out will break some other operations: > e.g. > When we kill a pending task, we will also delete the task (generates a > PubsubEvent and calls taskStore.deleteTasks). > https://github.com/apache/aurora/blob/master/src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java#L215 > > > We need to add taskStore.deleteTasks for all other use cases that > indirectly calls taskStore.deleteTasks. > > Kai Huang wrote: > Instead of rewriting the code, I think I just need to refactor those > tests mentioned above. Currently they were all written in a way that assumes > the taskStore.deleteTasks was included in the PubSubEvent factory. > > David McLaughlin wrote: > I see. Also looking at the code it seems here we do actually still write > a transaction, so the boolean flag you've introduced is more like > "skipDelete" rather than "skipTransaction". Having a skipDelete flag in a > method called deleteTasks would be a pretty big code smell :) > > deleteTasks is only called by the TaskHistoryPruner and the admin > endpoint, and is always a batch endpoint - so I think the flaw in this code > was pushing it into the per-task state machine code. All we want in our > deleteTasks code is: > > @Override > public void deleteTasks(MutableStoreProvider storeProvider, final > Set<String> taskIds) { > TaskStore.Mutable taskStore = storeProvider.getUnsafeTaskStore(); > taskStore.deleteTasks(taskIds); > Set<PubsubEvent> events = taskIds.stream().map(id -> > deleteTasks(taskStore, id)).collect(...); > eventSink.post(events); > } > > This would require moving the taskStore.deletedTasks out of the event > creation code (which I would consider renaming from deleteTasks to > taskDeletedEvent or something) and into the side-effect handler like so: > > > case DELETED: > taskStore.deleteTasks(...); > events.add(deleteTasks(taskStore, ...)); > > > Does that make sense?
Yes, that makes much more sense. I'll extract the taskStore.deleteTasks and rewrite deleteTasks code for the task prune endpoint. - Kai ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59699/#review176567 ----------------------------------------------------------- On June 1, 2017, 12:18 a.m., Kai Huang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59699/ > ----------------------------------------------------------- > > (Updated June 1, 2017, 12:18 a.m.) > > > Review request for Aurora, David McLaughlin and Santhosh Kumar. > > > Bugs: AURORA-1929 > https://issues.apache.org/jira/browse/AURORA-1929 > > > Repository: aurora > > > Description > ------- > > Improve task history pruning by batch deleting tasks. > > The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the > cluster has a large number of inactive tasks. > > This CR batches all removeTasks operations and execute them all at once to > avoid additional cost of coalescing. The fix will also benefit implicit task > history pruning since it has similar underlying implementation. See > https://issues.apache.org/jira/browse/AURORA-1929 for more information and > details. > > > Diffs > ----- > > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java > 73878210f9028901fda3b08e66c6a63c24260d35 > > > Diff: https://reviews.apache.org/r/59699/diff/1/ > > > Testing > ------- > > ./build-support/jenkins/build.sh > > > Thanks, > > Kai Huang > >
