> On June 1, 2017, 12:53 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Line 388 (original), 411 (patched)
> > <https://reviews.apache.org/r/59699/diff/1/?file=1736080#file1736080line411>
> >
> >     I'm not sure this is the right approach. I don't think we want to skip 
> > writing transactions to the replicated log. 
> >     
> >     All of the APIs down to the storage layer already support batching so 
> > seems like we just want to move the delete command outside of the code that 
> > generates pubsub events. E.g. call taskStore.deleteTasks(taskIds) here when 
> > we receive the batch from the prune tasks endpoint:
> >     
> >     
> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L371
> >     
> >     And then remove the taskStore.deleteTasks from the PubsubEvent factory 
> > method (have no idea why it's in there).
> 
> Kai Huang wrote:
>     Moving the taskStore.deleteTasks out will break some other operations:
>     e.g.
>     When we kill a pending task, we will also delete the task (generates a 
> PubsubEvent and calls taskStore.deleteTasks). 
> https://github.com/apache/aurora/blob/master/src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java#L215
>     
>     
>     We need to add taskStore.deleteTasks for all other use cases that 
> indirectly calls taskStore.deleteTasks.
> 
> Kai Huang wrote:
>     Instead of rewriting the code, I think I just need to refactor those 
> tests mentioned above. Currently they were all written in a way that assumes 
> the taskStore.deleteTasks was included in the PubSubEvent factory.
> 
> David McLaughlin wrote:
>     I see. Also looking at the code it seems here we do actually still write 
> a transaction, so the boolean flag you've introduced is more like 
> "skipDelete" rather than "skipTransaction". Having a skipDelete flag in a 
> method called deleteTasks would be a pretty big code smell :)
>     
>     deleteTasks is only called by the TaskHistoryPruner and the admin 
> endpoint, and is always a batch endpoint - so I think the flaw in this code 
> was pushing it into the per-task state machine code. All we want in our 
> deleteTasks code is:
>     
>         @Override
>         public void deleteTasks(MutableStoreProvider storeProvider, final 
> Set<String> taskIds) {
>           TaskStore.Mutable taskStore = storeProvider.getUnsafeTaskStore();
>           taskStore.deleteTasks(taskIds);
>           Set<PubsubEvent> events = taskIds.stream().map(id -> 
> deleteTasks(taskStore, id)).collect(...);
>           eventSink.post(events);
>         }
>         
>     This would require moving the taskStore.deletedTasks out of the event 
> creation code (which I would consider renaming from deleteTasks to 
> taskDeletedEvent or something) and into the side-effect handler like so:
>     
>     
>         case DELETED:
>           taskStore.deleteTasks(...);
>           events.add(deleteTasks(taskStore, ...));
>       
>       
>     Does that make sense?

Yes, that makes much more sense. I'll extract the taskStore.deleteTasks and 
rewrite deleteTasks code for the task prune endpoint.


- Kai


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176567
-----------------------------------------------------------


On June 1, 2017, 12:18 a.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 12:18 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
>     https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/1/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>

Reply via email to