cloud-fan commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-559141731
 
 
   It is the same problem as https://issues.apache.org/jira/browse/SPARK-18886 
. It would be great if we can solve that problem first, but seems there is no 
conclusion yet.
   
   There is one difference in `LocalShuffledRowRDD`: it has a baseline. It's 
converted from the normal shuffle reader, and we shouldn't be slower than it. 
The normal shuffle reader mostly doesn't have locality (a reducer needs to read 
blocks from many hosts), so the WILDCARD location is a good solution. It kinds 
of turn off the locality wait for `LocalShuffledRowRDD`, to make it not slower 
than normal shuffle reader.
   
   Do we have an ETA about when we can resolve 
https://issues.apache.org/jira/browse/SPARK-18886  ? We can't remove locality 
wait as nowadays we usually run many jobs on a Spark cluster. It's unclear to 
me what's the best solution to it.
   
   BTW, this feature won't be documented and it's not that public to me. Users 
can only know it by reading the discussion here. We can still remove it later 
if https://issues.apache.org/jira/browse/SPARK-18886 is resolved. To me this is 
just a workaround to turn off delay scheduling for certain tasks instead of 
globally, which does have value.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to