cloud-fan edited a comment on issue #26633: [SPARK-29994][CORE] Add WILDCARD task location URL: https://github.com/apache/spark/pull/26633#issuecomment-558671538 > I generally recommend people to set the locality delay to 0 on the node side because you can get very weird results where tasks wait way to long to be scheduled. well, if we look at resource utilization, it may be better to wait for locality, and save resources for other jobs/tasks. This is really a hard problem, and the default 3 seconds locality wait may not be optimal either. We can only know the optimal solution if we know what jobs/tasks will be submitted in the future. For `LocalShuffledRowRDD`, we don't need an optimal solution. We only need to avoid regressions. Compared to the norma shuffle reader, which fetches shuffle blocks from different hosts, this new WILDCARD location won't make things worse. It tries to satisfy the locality, and maximums the resource utilization like the normal shuffle reader. Is it possible to make this thing internal? e.g. do not document it publicly. This is not a perfect solution but I'm afraid there is no perf solution. This solution at lease gives us an option: if the locality is not that important for some certain tasks, you can use WILDCARD to let Spark schedule your tasks in other hosts.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
