rekbun commented on code in PR #45267:
URL: https://github.com/apache/spark/pull/45267#discussion_r1966642543
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala:
##########
@@ -164,6 +165,18 @@ case class BatchScanExec(
(groupedParts, expressions)
}
+ // Also re-group the partitions if we are reducing compatible
partition expressions
+ val finalGroupedPartitions = spjParams.reducers match {
Review Comment:
I believe this could produce incorrect result when joining presorted
bucketed tables with compatible bucket counts.
Specifically, if we have two tables:
1. Bucketed and sorted on the same join keys
2. With different bucket counts, where one table's bucket count is a
multiple of the other
When performing a bucketed join in Spark, it's expected that the sort order
should be preserved. However, it appears that the merging or grouping process
involved in the join might break these sorting guarantees, leading to incorrect
results.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]