c21 commented on a change in pull request #35574:
URL: https://github.com/apache/spark/pull/35574#discussion_r814478424



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -72,9 +72,14 @@ case object AllTuples extends Distribution {
 /**
  * Represents data where tuples that share the same values for the `clustering`
  * [[Expression Expressions]] will be co-located in the same partition.
+ *
+ * @param requireAllClusterKeys When true, `Partitioning` which satisfies this 
distribution,
+ *                              must match all `clustering` expressions in the 
same ordering.
  */
 case class ClusteredDistribution(
     clustering: Seq[Expression],
+    requireAllClusterKeys: Boolean = SQLConf.get.getConf(

Review comment:
       @cloud-fan - I agree for the point of caller-side code unchanged. I 
guess it's just feeling more coherent for others to read and understand code, 
when putting `clustering` and `requireAllClusterKeys` together. This was raised 
by https://github.com/apache/spark/pull/35574#discussion_r813499279 by 
@HeartSaVioR  as well. I am curious would adding the field in the middle here 
break other external library depending on Spark? I guess otherwise reviewers 
already paid the cost of time to review this PR, so not sure how important to 
change the caller-side code back. Just want to understand more here and I am 
open to change back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to