Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/892#issuecomment-45378774
  
    @pwendell pendingTasksWithNoPrefs handles case of executor failures, tasks 
which have no preferences, etc. Or maybe I misunderstood the question ?
    
    As I mentioned before, the motivation for the PR is orthogonal to the fix 
proposed.
    If nodes are not available for locality for the initial (few) stages the 
better approach would be to wait for adequate nodes to appear.
    
    The more general problem of how to handle this when churn rate of executors 
is high is different imo and probably what seems to be attempted to be 
addressed in the PR (minus the wait time, etc).
    
    To tackle that, we would need to redefine what noPrefs does (which is what 
@pwendell is alluding to I think ?) and probably do something along lines of 
what Matei suggested : add to rack/host list even if the host/rack is not 
available right now. Only problem with that is the number of lists we end up 
maintaining can be high : though not too high given 4k or so current upper 
bound to hadoop cluster.
    Ofcourse, this would be per active stage - which gives us some notion of 
how expensive it could be in worst case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to