rdblue commented on a change in pull request #31355:
URL: https://github.com/apache/spark/pull/31355#discussion_r576340131



##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/ClusteredDistribution.java
##########
@@ -32,4 +32,13 @@
    * Returns clustering expressions.
    */
   Expression[] clustering();
+
+  /**
+   * Returns the number of partitions required by this write.
+   * <p>
+   * Implementations may want to override this if it requires the specific 
number of partitions.
+   *
+   * @return the required number of partitions, non-positive values mean no 
requirement.
+   */
+  default int requiredNumPartitions() { return 0; }

Review comment:
       @HeartSaVioR, I think this is part of the use case. If this interface 
can require some number of incoming partitions, then it should do that in all 
cases. It makes the behavior of this interface harder to understand if this is 
not enforced for certain values returned by the other methods.
   
   In addition, this interface is less useful if it can't be used to control 
the incoming parallelism without also changing the data. I think a coalesce 
makes a lot more sense than ignoring this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to