jiangxb1987 commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-558867250
 
 
   Please correct me if I'm wrong, it seems to me that the `LocalShuffleRowRDD` 
is a special case of `ShuffleRowRDD`, with the only difference that 
`ShuffleRowRDD` normally doesn't have preferred locations, while 
`LocalShuffleRowRDD` has a preferred locations list. On execution time, if the 
preferred locations list is short, it's highly possible that the tasks from 
`LocalShuffleRowRDD` would wait for preferred locations(executors/hosts) due to 
delay scheduling, which sometimes make the wait time even longer than the task 
duration.
   
   Set the locality wait time to 0 should be an answer to this case (and 
possibly many other use cases, too). But on the other hand, it would cause 
regression to other jobs/stages, where task locality is critical(those 
`exception`s  as Thomas mentioned), we just can't ignore those regressions, and 
I can image how many efforts it would take to fix the regressions on 
`exception` cases.
   
   How about we accept the current PR as a temporary solution to workaround the 
delay scheduling issue, thus those RDDs that don't want to wait for perfect 
locality can just add `WILDCARD` to their perferredLocations? To me it's better 
than setting the locality wait time to 0 directly, as it won't affect other 
workloads.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to