chirag-s-db commented on code in PR #53098:
URL: https://github.com/apache/spark/pull/53098#discussion_r2538735177
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala:
##########
@@ -140,6 +140,13 @@ case class EnsureRequirements(
// Choose all the specs that can be used to shuffle other children
val candidateSpecs = specs
.filter(_._2.canCreatePartitioning)
+ .filter {
Review Comment:
`checkKeyGroupCompatible` applies to the case where we have 2
KeyGroupedPartitioned scans that are being joined against each other. For
example, something like:
```
SortMergeJoinExec ...
+- BatchScanExec tbl1 ... -> reporting KeyGroupedPartitioning
+- BatchScanExec tbl2 ... -> reporting KeyGroupedPartitioning
```
If one child is not KeyGroupedPartitioned, we can still avoid the shuffle
for one child (in general):
```
SortMergeJoinExec ...
+- BatchScanExec tbl1 ... -> reporting KeyGroupedPartitioning
+- ShuffleExchangeExec KeyGroupedPartitioning
+- BatchScanExec tbl2 ... -> reporting UnknownPartitioning
```
However, if the child reporting the KeyGroupedPartitioning is not a
BatchScanExec, then we can't safely push down the JOIN keys, making it unsafe
to do this. This may arise if we call `.checkpoint()` on a `BatchScanExec`:
```
SortMergeJoinExec ...
+- RDDScanExec ... -> reporting KeyGroupedPartitioning (coming from ckpt
of tbl1 scan)
+- ShuffleExchangeExec KeyGroupedPartitioning
+- BatchScanExec tbl2 ... -> reporting UnknownPartitioning
```
This extra check is for this second case, where we want to make sure that
we're not using a KeyGroupedPartitioning to shuffle another child of a JOIN
without being able to push down JOIN keys. The test "SPARK-53322: checkpointed
scans can't shuffle other children on SPJ" is for this case, and will fail
without this change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]