HeartSaVioR commented on a change in pull request #35574:
URL: https://github.com/apache/spark/pull/35574#discussion_r810699517



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -271,6 +279,17 @@ case class HashPartitioning(expressions: Seq[Expression], 
numPartitions: Int)
   override def createShuffleSpec(distribution: ClusteredDistribution): 
ShuffleSpec =
     HashShuffleSpec(this, distribution)
 
+  /**
+   * Checks if [[HashPartitioning]] is partitioned on exactly same full 
`clustering` keys of
+   * [[ClusteredDistribution]].
+   */
+  def isPartitionedOnFullKeys(distribution: ClusteredDistribution): Boolean = {
+    expressions.length == distribution.clustering.length &&

Review comment:
       Now I'm also in favor of having more restricted condition. With more 
restricted condition, end users can change the order of keys to turn their 
query further as a last resort if simply turning the config on isn't performant 
enough. We expect that changing the order of the hash keys would make a change 
on the partition ID, right?
   
   The scenario when end users will turn on this config is a major point. They 
wouldn't turn on this config before they try running the query. (This config is 
marked as internal, and by default it's disabled.) They would turn on the 
config after running the query and Spark worked badly. One can argue that they 
can add repartition manually in their code/SQL statement which makes sense in 
general, but we have counter-arguments, 1) they don't only have a few of 
queries 2) the queries could be machine/tool-generated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to