[GitHub] [spark] sunchao commented on pull request #39633: [SPARK-42038][SQL] SPJ: Support partially clustered distribution

via GitHub Fri, 03 Feb 2023 09:44:24 -0800


sunchao commented on PR #39633:
URL: https://github.com/apache/spark/pull/39633#issuecomment-1416197756


   Yes, converted.
   
   I found it is quite difficult to move the logic out of `EnsureRequirements`, 
because, as mentioned above, the optimization also depends on 
`reorderJoinPredicates` which needs to be executed together with 
`ensureDistributionAndOrdering` in a lock-step as we go up the query plan.
   
   In addition, even if I extract out `reorderJoinPredicates` into a separate 
util method or something, it's still difficult to make the optimization as a 
separate rule because it also depends on certain part of the logic in 
`ensureDistributionAndOrdering` (for instance, as we go up the query plan tree, 
we'd expect join branches that have incompatible `KeyGroupedPartitioning`s to 
be handled by `ensureDistributionAndOrdering` and converted to use hash 
partitioning, so that we should not try to apply the optimization again on 
those branches).
   
   As a compromise, I've extracted all special logic related to 
`KeyGroupedPartitioning` into a separate method `checkKeyGroupCompatible`, so 
the main body of `ensureDistributionAndOrdering` now looks much simpler. If 
necessary, I can also move this method (and related) into a separate class file 
instead, to make them more isolated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on pull request #39633: [SPARK-42038][SQL] SPJ: Support partially clustered distribution

Reply via email to