rmcyang commented on PR #34500: URL: https://github.com/apache/spark/pull/34500#issuecomment-1135547465
> Is this still WIP @rmcyang ? Also, can you please add tests for the sql aqe codepath ? Essentially will this optimization help sql join when AQE is enabled. Did some tests in out internal branch, merger locations could be reused as expected with AQE disabled; however, every sibling stage could unfortunately use a different set of merger locations when AQE enabled. When `findCoPartitionedSiblingMapStages` got called with AQE enabled, the shuffle map stage is not able to figure out its sibling map stages, which to me is caused by every shuffle map stage now becomes a job and thus the DAG got sliced into many parts. Thus this improvement is now only valid when AQE disabled. @mridulm, any thoughts on whether we should figure out a better mechanism to track the sibling stages info, in order to take advantage of this improvement, when AQE is enabled? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
