Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22009#discussion_r208344510
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ScanConfig.java
---
@@ -18,22 +18,16 @@
package org.apache.spark.sql.sources.v2.reader;
import org.apache.spark.annotation.InterfaceStability;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.catalyst.InternalRow;
-
-import java.util.List;
/**
- * A mix-in interface for {@link DataSourceReader}. Data source readers
can implement this
- * interface to output {@link Row} instead of {@link InternalRow}.
- * This is an experimental and unstable interface.
+ * An interface that carries query specific information for the data scan.
Currently it's used to
+ * hold operator pushdown result and streaming offsets. This is defined as
an empty interface, and
+ * data sources should define their own {@link ScanConfig} classes.
+ *
+ * For APIs that take a {@link ScanConfig} as input, like
+ * {@link ReadSupport#planInputPartitions(ScanConfig)} and
+ * {@link ReadSupport#createReaderFactory(ScanConfig)}, implementations
mostly need to cast the
+ * input {@link ScanConfig} to the concrete {@link ScanConfig} class of
the data source.
*/
[email protected]
-public interface SupportsDeprecatedScanRow extends DataSourceReader {
- default List<InputPartition<InternalRow>> planInputPartitions() {
- throw new IllegalStateException(
- "planInputPartitions not supported by default within
SupportsDeprecatedScanRow");
- }
-
- List<InputPartition<Row>> planRowInputPartitions();
-}
[email protected]
+public interface ScanConfig {}
--- End diff --
I think this should return the scan's output schema. Otherwise the only way
to get it is during pushdown.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]