cloud-fan commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-558671538
 
 
   > I generally recommend people to set the locality delay to 0 on the node 
side because you can get very weird results where tasks wait way to long to be 
scheduled.
   
   well, if we look at resource utilization, it may be better to wait for 
locality, and save resources for other jobs/tasks.
   
   This is really a hard problem, and the default 3 seconds locality wait may 
not be optimal either. We can only know the optimal solution if we know what 
jobs/tasks will be submitted in the future.
   
   For `LocalShuffledRowRDD`, we don't need an optimal solution. We only need 
to avoid regressions. Compared to the norma shuffle reader, which fetches 
shuffle blocks from different hosts, this new WILDCARD location won't make 
things worse. It tries to satisfy the locality, and maximums the resource 
utilization like the normal shuffle reader.
   
   Is it possible to make this thing internal? e.g. do not document it 
publicly. This is not a perfect solution but I'm afraid there is no perf 
solution.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to