andyvanyperenAM commented on pull request #29655: URL: https://github.com/apache/spark/pull/29655#issuecomment-749039290
> Thanks @imback82 for working on this, but I think #19054 seems to be a better approach for me (i.e. add `leftDistributionKeys` and `rightDistributionKeys` in `SortMergeJoinExec`/`ShuffledHashJoinExec`, and avoid shuffle by adding logic in `EnsureRequirements.reorderJoinPredicates`). @tejasapatil and I are in the same team so just bringing more context on this: we added #19054 in our internal fork and don't see much OOM issues. If #19054 is better approach in other people's opinions as well, I can redo that PR to latest master for review. > > Adding the rule after `ensureRequirements` seems to add more burden on future development. We need to think about it every time during development as there's a new rule after `ensureRequirements` can remove shuffle. Hi @c21 I was wondering what the status of this issue is? I still see it as closed on the github page and on the jira page. What is the best way to get this issue resolved? Is there anything I can do as a non-expert to help the process? You mentioned you added #19054 to your internal fork without OOM issues, did some other pop up or is it good to go? Is there a way to get it in the main branch to be able to use it as a simple end-user? Thanks for the update and sorry if this is the incorrect place to give this a bump. kind regards, Andy ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
