rekbun commented on code in PR #45267:
URL: https://github.com/apache/spark/pull/45267#discussion_r1966642543
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala:
##########
@@ -164,6 +165,18 @@ case class BatchScanExec(
(groupedParts, expressions)
}
+ // Also re-group the partitions if we are reducing compatible
partition expressions
+ val finalGroupedPartitions = spjParams.reducers match {
Review Comment:
I believe this could produce incorrect results when joining presorted
bucketed tables with compatible bucket counts.
Specifically, if we have two tables:
1. Bucketed and sorted on the same join keys
2. With different bucket counts, where one table's bucket count is a
multiple of the other
When performing a bucketed join in Spark, it's expected that the sort order
should be preserved. However, the merging or grouping process involved in the
join might break these sorting guarantees, leading to incorrect results.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]