HeartSaVioR commented on a change in pull request #31355:
URL: https://github.com/apache/spark/pull/31355#discussion_r576402899
##########
File path:
sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RequiresDistributionAndOrdering.java
##########
@@ -42,6 +42,19 @@
*/
Distribution requiredDistribution();
+ /**
+ * Returns the number of partitions required by this write if specific
distribution is required.
+ * <p>
+ * Implementations may want to override this if it requires the specific
number of partitions
+ * on distribution.
+ * <p>
+ * {@link UnspecifiedDistribution} is not affected by this method, as it
doesn't require the
+ * specific distribution.
+ *
+ * @return the required number of partitions, non-positive values mean no
requirement.
+ */
+ default int requiredNumPartitionsOnDistribution() { return 0; }
Review comment:
I'm actually more familiar with the word "parallelism" but the word
looks to be less used in Spark - "partition" is being used almost everywhere.
I'm OK to mention it as "parallelism" but let's hear more voices on this.
The name comes from the fact the number is only effective when distribution
is specified - longer name is to avoid misunderstanding that it also takes
effect on sorting request, whereas it is not. Probably we could discuss the
impact first and revisit this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]