> On June 1, 2017, 12:53 a.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Line 388 (original), 411 (patched)
> > <https://reviews.apache.org/r/59699/diff/1/?file=1736080#file1736080line411>
> >
> >     I'm not sure this is the right approach. I don't think we want to skip 
> > writing transactions to the replicated log. 
> >     
> >     All of the APIs down to the storage layer already support batching so 
> > seems like we just want to move the delete command outside of the code that 
> > generates pubsub events. E.g. call taskStore.deleteTasks(taskIds) here when 
> > we receive the batch from the prune tasks endpoint:
> >     
> >     
> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L371
> >     
> >     And then remove the taskStore.deleteTasks from the PubsubEvent factory 
> > method (have no idea why it's in there).
> 
> Kai Huang wrote:
>     Moving the taskStore.deleteTasks out will break some other operations:
>     e.g.
>     When we kill a pending task, we will also delete the task (generates a 
> PubsubEvent and calls taskStore.deleteTasks). 
> https://github.com/apache/aurora/blob/master/src/test/java/org/apache/aurora/scheduler/state/StateManagerImplTest.java#L215
>     
>     
>     We need to add taskStore.deleteTasks for all other use cases that 
> indirectly calls taskStore.deleteTasks.
> 
> Kai Huang wrote:
>     Instead of rewriting the code, I think I just need to refactor those 
> tests mentioned above. Currently they were all written in a way that assumes 
> the taskStore.deleteTasks was included in the PubSubEvent factory.

I see. Also looking at the code it seems here we do actually still write a 
transaction, so the boolean flag you've introduced is more like "skipDelete" 
rather than "skipTransaction". Having a skipDelete flag in a method called 
deleteTasks would be a pretty big code smell :)

deleteTasks is only called by the TaskHistoryPruner and the admin endpoint, and 
is always a batch endpoint - so I think the flaw in this code was pushing it 
into the per-task state machine code. All we want in our deleteTasks code is:

    @Override
    public void deleteTasks(MutableStoreProvider storeProvider, final 
Set<String> taskIds) {
      TaskStore.Mutable taskStore = storeProvider.getUnsafeTaskStore();
      taskStore.deleteTasks(taskIds);
      Set<PubsubEvent> events = taskIds.stream().map(id -> 
deleteTasks(taskStore, id)).collect(...);
      eventSink.post(events);
    }
    
This would require moving the taskStore.deletedTasks out of the event creation 
code (which I would consider renaming from deleteTasks to taskDeletedEvent or 
something) and into the side-effect handler like so:


    case DELETED:
      taskStore.deleteTasks(...);
      events.add(deleteTasks(taskStore, ...));
  
  
Does that make sense?


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176567
-----------------------------------------------------------


On June 1, 2017, 12:18 a.m., Kai Huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 12:18 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
>     https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/1/
> 
> 
> Testing
> -------
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>

Reply via email to