c21 commented on a change in pull request #35574:
URL: https://github.com/apache/spark/pull/35574#discussion_r814478424
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -72,9 +72,14 @@ case object AllTuples extends Distribution {
/**
* Represents data where tuples that share the same values for the `clustering`
* [[Expression Expressions]] will be co-located in the same partition.
+ *
+ * @param requireAllClusterKeys When true, `Partitioning` which satisfies this
distribution,
+ * must match all `clustering` expressions in the
same ordering.
*/
case class ClusteredDistribution(
clustering: Seq[Expression],
+ requireAllClusterKeys: Boolean = SQLConf.get.getConf(
Review comment:
@cloud-fan - I agree for the point of caller-side code unchanged. I
guess it's just feeling more coherent for others to read and understand code,
when putting `clustering` and `requireAllClusterKeys` together. This was raised
by https://github.com/apache/spark/pull/35574#discussion_r813499279 by
@HeartSaVioR as well. I am curious would adding the field in the middle here
break other external library depending on Spark? I guess otherwise reviewers
already paid the cost to review this PR, so not sure how important to change
the caller-side code back. Just want to understand more here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]