HeartSaVioR commented on a change in pull request #35574:
URL: https://github.com/apache/spark/pull/35574#discussion_r810699517
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -271,6 +279,17 @@ case class HashPartitioning(expressions: Seq[Expression],
numPartitions: Int)
override def createShuffleSpec(distribution: ClusteredDistribution):
ShuffleSpec =
HashShuffleSpec(this, distribution)
+ /**
+ * Checks if [[HashPartitioning]] is partitioned on exactly same full
`clustering` keys of
+ * [[ClusteredDistribution]].
+ */
+ def isPartitionedOnFullKeys(distribution: ClusteredDistribution): Boolean = {
+ expressions.length == distribution.clustering.length &&
Review comment:
Now I'm also in favor of having more restricted condition. With more
restricted condition, end users can change the order of keys to turn their
query further as a last resort if simply turning the config on isn't performant
enough. We expect that changing the order of the hash keys would make a change
on the partition ID, right?
The scenario when end users will turn on this config is a major point. They
wouldn't turn on this config before they try running the query. (This config is
marked as internal, and by default it's disabled.) They would turn on the
config after running the query and Spark worked badly. One can argue that they
can add repartition manually in their code/SQL statement which makes sense in
general, but we have counter-arguments, 1) they don't only have a few of
queries 2) the queries could be machine/tool-generated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]