[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

kayousterhout Sat, 29 Mar 2014 11:31:22 -0700

Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/228#issuecomment-39004325
  
    Can we just add the accumulator update to TaskSetManager, in the 
handleSuccessfulTask() method?  This seems much simpler because the 
TaskSetManager already has all of the state about which tasks are running, 
which ones have been resubmitted or speculated, etc.  I think this change would 
be much simpler.
    
    Over time, a lot of functionality has leaked into the DAGScheduler, such 
that there's a lot of state that's kept in multiple places: in the DAGScheduler 
and in the TaskSetManager or the TaskSchedulerImpl.  The abstraction is 
supposed to be that the DAGScheduler handles the high level semantics of 
scheduling stages and dealing with inter-stage dependencies, and the 
TaskSetManager handles the low-level details of the tasks for each stage.  
There are some parts of this abstraction that are currently broken (where the 
DAGScheduler knows too much about task-level details) and refactoring this is 
on my todo list, but in the meantime I think we should try not to make this 
problem any worse, because it makes the code much more complicated, more 
difficult to understand, and buggy.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-732: eliminate duplicate update of the a...

Reply via email to