Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20726#discussion_r175222350
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsReportPartitioning.java
---
@@ -23,6 +23,11 @@
/**
* A mix in interface for {@link DataSourceReader}. Data source readers
can implement this
* interface to report data partitioning and try to avoid shuffle at Spark
side.
+ *
+ * Note that Spark will always infer a
+ * {@link org.apache.spark.sql.catalyst.plans.physical.SinglePartition}
partitioning when the
--- End diff --
We should not expose internal classes. How about
```
Note that sometimes Spark can avoid shuffle if the reader creates exactly 1
{@link DataReaderFactory}, even if the reader does not implement this interface.
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]