[GitHub] [spark] mridulm commented on pull request #41676: [SPARK-44109][CORE] Remove duplicate preferred locations of each RDD partition

via GitHub Fri, 30 Jun 2023 00:21:21 -0700


mridulm commented on PR #41676:
URL: https://github.com/apache/spark/pull/41676#issuecomment-1614239102


   > Hi, @mridulm Thanks for your reminder. Can you describe the non-shuffle 
scenario more specifically?
   
   It is not very common for shuffle stages to have preferred locality (unless 
push based - where node is always 1 or nontrivial fraction of reducer input 
generated in an executor).
   It is much more common for non-shuffle stages to surface locality.
   Depending on the underlying impl, this can or need not have duplication: 
hdfs for example does not have duplication, while replicated RDD could, and so 
on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on pull request #41676: [SPARK-44109][CORE] Remove duplicate preferred locations of each RDD partition

Reply via email to