[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

rdblue Tue, 07 Aug 2018 11:48:00 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22009#discussion_r208344510
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/ScanConfig.java 
---
    @@ -18,22 +18,16 @@
     package org.apache.spark.sql.sources.v2.reader;
     
     import org.apache.spark.annotation.InterfaceStability;
    -import org.apache.spark.sql.Row;
    -import org.apache.spark.sql.catalyst.InternalRow;
    -
    -import java.util.List;
     
     /**
    - * A mix-in interface for {@link DataSourceReader}. Data source readers 
can implement this
    - * interface to output {@link Row} instead of {@link InternalRow}.
    - * This is an experimental and unstable interface.
    + * An interface that carries query specific information for the data scan. 
Currently it's used to
    + * hold operator pushdown result and streaming offsets. This is defined as 
an empty interface, and
    + * data sources should define their own {@link ScanConfig} classes.
    + *
    + * For APIs that take a {@link ScanConfig} as input, like
    + * {@link ReadSupport#planInputPartitions(ScanConfig)} and
    + * {@link ReadSupport#createReaderFactory(ScanConfig)}, implementations 
mostly need to cast the
    + * input {@link ScanConfig} to the concrete {@link ScanConfig} class of 
the data source.
      */
    [email protected]
    -public interface SupportsDeprecatedScanRow extends DataSourceReader {
    -  default List<InputPartition<InternalRow>> planInputPartitions() {
    -    throw new IllegalStateException(
    -        "planInputPartitions not supported by default within 
SupportsDeprecatedScanRow");
    -  }
    -
    -  List<InputPartition<Row>> planRowInputPartitions();
    -}
    [email protected]
    +public interface ScanConfig {}
    --- End diff --
    
    I think this should return the scan's output schema. Otherwise the only way 
to get it is during pushdown.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

Reply via email to