jiangxb1987 commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task location URL: https://github.com/apache/spark/pull/26633#issuecomment-558867250 Please correct me if I'm wrong, it seems to me that the `LocalShuffleRowRDD` is a special case of `ShuffleRowRDD`, with the only difference that `ShuffleRowRDD` normally doesn't have preferred locations, while `LocalShuffleRowRDD` has a preferred locations list. On execution time, if the preferred locations list is short, it's highly possible that the tasks from `LocalShuffleRowRDD` would wait for preferred locations(executors/hosts) due to delay scheduling, which sometimes make the wait time even longer than the task duration. Set the locality wait time to 0 should be an answer to this case (and possibly many other use cases, too). But on the other hand, it would cause regression to other jobs/stages, where task locality is critical(those `exception`s as Thomas mentioned), we just can't ignore those regressions, and I can image how many efforts it would take to fix the regressions on `exception` cases. How about we accept the current PR as a temporary solution to workaround the delay scheduling issue, thus those RDDs that don't want to wait for perfect locality can just add `WILDCARD` to their perferredLocations? To me it's better than setting the locality wait time to 0 directly, as it won't affect other workloads.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
