[GitHub] [spark] rmcyang commented on pull request #34500: [WIP][SPARK-33574][CORE] Improve locality for push-based shuffle especially for join-like operations

GitBox Tue, 24 May 2022 01:08:30 -0700


rmcyang commented on PR #34500:
URL: https://github.com/apache/spark/pull/34500#issuecomment-1135547465


   > Is this still WIP @rmcyang ? Also, can you please add tests for the sql 
aqe codepath ? Essentially will this optimization help sql join when AQE is 
enabled.
   
   Did some tests in out internal branch, merger locations could be reused as 
expected with AQE disabled; however, every sibling stage could unfortunately 
use a different set of merger locations when AQE enabled. When 
`findCoPartitionedSiblingMapStages` got called with AQE enabled, the shuffle 
map stage is not able to figure out its sibling map stages, which to me is 
caused by every shuffle map stage now becomes a job and thus the DAG got sliced 
into many parts. Thus this improvement is now only valid when AQE disabled.
   @mridulm, any thoughts on whether we should figure out a better mechanism to 
track the sibling stages info, in order to take advantage of this improvement, 
when AQE is enabled?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rmcyang commented on pull request #34500: [WIP][SPARK-33574][CORE] Improve locality for push-based shuffle especially for join-like operations

Reply via email to