[GitHub] [spark] sunchao commented on a change in pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

GitBox Mon, 28 Feb 2022 20:36:45 -0800


sunchao commented on a change in pull request #35657:
URL: https://github.com/apache/spark/pull/35657#discussion_r816442688




##########
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/partitioning/Partitioning.java
##########
@@ -18,33 +18,26 @@
 package org.apache.spark.sql.connector.read.partitioning;
 
 import org.apache.spark.annotation.Evolving;
-import org.apache.spark.sql.connector.read.InputPartition;
+import org.apache.spark.sql.connector.distributions.Distribution;
+import org.apache.spark.sql.connector.expressions.SortOrder;
 import org.apache.spark.sql.connector.read.SupportsReportPartitioning;
 
 /**
  * An interface to represent the output data partitioning for a data source, 
which is returned by
- * {@link SupportsReportPartitioning#outputPartitioning()}. Note that this 
should work
- * like a snapshot. Once created, it should be deterministic and always report 
the same number of
- * partitions and the same "satisfy" result for a certain distribution.
+ * {@link SupportsReportPartitioning#outputPartitioning()}.
  *
  * @since 3.0.0
  */
 @Evolving
 public interface Partitioning {
 
   /**
-   * Returns the number of partitions(i.e., {@link InputPartition}s) the data 
source outputs.
+   * Returns the distribution guarantee that the data source provides.
    */
-  int numPartitions();
+  Distribution distribution();

Review comment:
       I'm fine with that, although these two methods will no longer be called 
after this PR so it could hit users by surprise.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

Reply via email to