mridulm commented on PR #41676: URL: https://github.com/apache/spark/pull/41676#issuecomment-1614239102
> Hi, @mridulm Thanks for your reminder. Can you describe the non-shuffle scenario more specifically? It is not very common for shuffle stages to have preferred locality (unless push based - where node is always 1 or nontrivial fraction of reducer input generated in an executor). It is much more common for non-shuffle stages to surface locality. Depending on the underlying impl, this can or need not have duplication: hdfs for example does not have duplication, while replicated RDD could, and so on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
