> On June 1, 2017, 12:53 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Line 388 (original), 411 (patched)
> > <https://reviews.apache.org/r/59699/diff/1/?file=1736080#file1736080line411>
> >
> > I'm not sure this is the right approach. I don't think we want to skip
> > writing transactions to the replicated log.
> >
> > All of the APIs down to the storage layer already support batching so
> > seems like we just want to move the delete command outside of the code that
> > generates pubsub events. E.g. call taskStore.deleteTasks(taskIds) here when
> > we receive the batch from the prune tasks endpoint:
> >
> >
> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L371
> >
> > And then remove the taskStore.deleteTasks from the PubsubEvent factory
> > method (have no idea why it's in there).
>
> Kai Huang wrote:
> Moving the taskStore.deleteTasks out will break some other operations:
> e.g.
> When we kill a pending task, we will also delete the task (generates a
> PubsubEvent and calls taskStore.deleteTasks).
> https://github.com/apache/aurora/blob/master/src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java#L215
>
>
> We need to add taskStore.deleteTasks for all other use cases that
> indirectly calls taskStore.deleteTasks.
>
> Kai Huang wrote:
> Instead of rewriting the code, I think I just need to refactor those
> tests mentioned above. Currently they were all written in a way that assumes
> the taskStore.deleteTasks was included in the PubSubEvent factory.
I see. Also looking at the code it seems here we do actually still write a
transaction, so the boolean flag you've introduced is more like "skipDelete"
rather than "skipTransaction". Having a skipDelete flag in a method called
deleteTasks would be a pretty big code smell :)
deleteTasks is only called by the TaskHistoryPruner and the admin endpoint, and
is always a batch endpoint - so I think the flaw in this code was pushing it
into the per-task state machine code. All we want in our deleteTasks code is:
@Override
public void deleteTasks(MutableStoreProvider storeProvider, final
Set<String> taskIds) {
TaskStore.Mutable taskStore = storeProvider.getUnsafeTaskStore();
taskStore.deleteTasks(taskIds);
Set<PubsubEvent> events = taskIds.stream().map(id ->
deleteTasks(taskStore, id)).collect(...);
eventSink.post(events);
}
This would require moving the taskStore.deletedTasks out of the event creation
code (which I would consider renaming from deleteTasks to taskDeletedEvent or
something) and into the side-effect handler like so:
case DELETED:
taskStore.deleteTasks(...);
events.add(deleteTasks(taskStore, ...));
Does that make sense?
- David
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176567
-----------------------------------------------------------
On June 1, 2017, 12:18 a.m., Kai Huang wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> -----------------------------------------------------------
>
> (Updated June 1, 2017, 12:18 a.m.)
>
>
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
>
>
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Improve task history pruning by batch deleting tasks.
>
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the
> cluster has a large number of inactive tasks.
>
> This CR batches all removeTasks operations and execute them all at once to
> avoid additional cost of coalescing. The fix will also benefit implicit task
> history pruning since it has similar underlying implementation. See
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and
> details.
>
>
> Diffs
> -----
>
> src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> 73878210f9028901fda3b08e66c6a63c24260d35
>
>
> Diff: https://reviews.apache.org/r/59699/diff/1/
>
>
> Testing
> -------
>
> ./build-support/jenkins/build.sh
>
>
> Thanks,
>
> Kai Huang
>
>