Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/15218
Btw, taking a step back, I am not sure this will work as you expect it to.
Other than a few taskset's - those without locality information - the
schedule is going to be highly biased towards the locality information supplied.
This typically will mean PROCESS_LOCAL (almost always) and then NODE_LOCAL
- which means, exactly match the executor or host (irrespective of the order we
traverse the task list).
The shuffle of offers we do is for a specific set of purposes - spread load
if no locality information (not very common imo) or spread it across cluster
when locality information is of more 'low quality' - like from an InputFormat
or for shuffle when we are using heuristics which might not be optimal.
But since I have not looked at this in a while, will CC kay. +CC
@kayousterhout pls do take a look in case I am missing something.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]