peter-toth commented on code in PR #54330:
URL: https://github.com/apache/spark/pull/54330#discussion_r2880116822


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala:
##########
@@ -549,23 +521,19 @@ case class EnsureRequirements(
           // whether partially clustered distribution can be applied. For 
instance, the
           // optimization cannot be applied to a left outer join, where the 
left hand
           // side is chosen as the side to replicate partitions according to 
stats.
-          // Similarly, the partially clustered distribution cannot be applied 
if the
-          // partially clustered side must use the scan's key-grouped 
partitioning to
-          // satisfy some unrelated required distribution in its plan (for 
example, for an aggregate
-          // or window function), as this will give incorrect results (for 
example, duplicate
-          // row_number() values).
           // Otherwise, query result could be incorrect.
-          val canReplicateLeft = canReplicateLeftSide(joinType) &&
-            canApplyPartialClusteredDistribution(right)
-          val canReplicateRight = canReplicateRightSide(joinType) &&
-            canApplyPartialClusteredDistribution(left)
+          val canReplicateLeft = canReplicateLeftSide(joinType)
+          val canReplicateRight = canReplicateRightSide(joinType)

Review Comment:
   Before this PR partition grouping logic was applied in the leaf scan node, 
which meant that there could be conflicting requrements from a join for the 
scan to provide partially clustered data; and from a node below the join for 
the scan to provide grouped (clustered) data. We needed the guard to not 
override requrements of nodes below the join.
   After this PR we can place `GroupPartitionsExec` to anywhere in the plan if 
we need grouping/regrouping of partitions or keep unclustered partitions as 
they are.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to