[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

mridulm Thu, 06 Oct 2016 22:06:13 -0700

Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/15218
  
    
    Btw, taking a step back, I am not sure this will work as you expect it to.
    Other than a few taskset's - those without locality information - the 
schedule is going to be highly biased towards the locality information supplied.
    
    This typically will mean PROCESS_LOCAL (almost always) and then NODE_LOCAL 
- which means, exactly match the executor or host (irrespective of the order we 
traverse the task list).
    
    The shuffle of offers we do is for a specific set of purposes - spread load 
if no locality information (not very common imo) or spread it across cluster 
when locality information is of more 'low quality' - like from an InputFormat 
or for shuffle when we are using heuristics which might not be optimal.
    
    But since I have not looked at this in a while, will CC kay. +CC 
@kayousterhout pls do take a look in case I am missing something.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

Reply via email to