[GitHub] [spark] sunchao commented on a change in pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

GitBox Thu, 31 Mar 2022 12:18:35 -0700


sunchao commented on a change in pull request #35657:
URL: https://github.com/apache/spark/pull/35657#discussion_r839942784




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
##########
@@ -256,6 +285,16 @@ case class EnsureRequirements(
         reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, 
rightExpressions, rightKeys)
           .orElse(reorderJoinKeysRecursively(
             leftKeys, rightKeys, leftPartitioning, None))
+      case (Some(KeyGroupedPartitioning(clustering, _, _)), _) =>

Review comment:
       Yes, but when `spark.sql.requireAllClusterKeysForCoPartition` is true we 
still check whether partition keys fully match the cluster keys in 
`EnsureRequirements.checkKeyGroupedSpec` via a `zip` + `semanticEquals`:
   ```scala
         if 
(SQLConf.get.getConf(SQLConf.REQUIRE_ALL_CLUSTER_KEYS_FOR_CO_PARTITION)) {
           attributes.length == clustering.length && 
attributes.zip(clustering).forall {
             case (l, r) => l.semanticEquals(r)
           }
         } else {
           true // already validated in `KeyGroupedPartitioning.satisfies`
         }
   ```
   
   which is very similar to `HashPartitioning`. 
   
   Given the similarity between reordering and the key position check in 
`ShuffleSpec`, we might be able to remove the reordering logic all together, 
but it could worth a separate PR IMO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

Reply via email to