rdblue commented on a change in pull request #31355:
URL: https://github.com/apache/spark/pull/31355#discussion_r576340131
##########
File path:
sql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/ClusteredDistribution.java
##########
@@ -32,4 +32,13 @@
* Returns clustering expressions.
*/
Expression[] clustering();
+
+ /**
+ * Returns the number of partitions required by this write.
+ * <p>
+ * Implementations may want to override this if it requires the specific
number of partitions.
+ *
+ * @return the required number of partitions, non-positive values mean no
requirement.
+ */
+ default int requiredNumPartitions() { return 0; }
Review comment:
@HeartSaVioR, I think this is part of the use case. If this interface
can require some number of incoming partitions, then it should do that in all
cases. It makes the behavior of this interface harder to understand if this is
not enforced for certain values returned by the other methods.
In addition, this interface is less useful if it can't be used to control
the incoming parallelism without also changing the data. I think a coalesce
makes a lot more sense than ignoring this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]