[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

rdblue Tue, 07 Aug 2018 11:14:26 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22009#discussion_r208333129
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java
 ---
    @@ -21,32 +21,32 @@
     
     import org.apache.spark.annotation.InterfaceStability;
     import org.apache.spark.sql.SaveMode;
    -import org.apache.spark.sql.sources.v2.writer.DataSourceWriter;
    +import org.apache.spark.sql.sources.v2.writer.BatchWriteSupport;
     import org.apache.spark.sql.types.StructType;
     
     /**
      * A mix-in interface for {@link DataSourceV2}. Data sources can implement 
this interface to
    - * provide data writing ability and save the data to the data source.
    + * provide data writing ability for batch processing.
      */
     @InterfaceStability.Evolving
    -public interface WriteSupport extends DataSourceV2 {
    +public interface BatchWriteSupportProvider extends DataSourceV2 {
     
       /**
    -   * Creates an optional {@link DataSourceWriter} to save the data to this 
data source. Data
    +   * Creates an optional {@link BatchWriteSupport} to save the data to 
this data source. Data
        * sources can return None if there is no writing needed to be done 
according to the save mode.
        *
        * If this method fails (by throwing an exception), the action will fail 
and no Spark job will be
        * submitted.
        *
        * @param jobId A unique string for the writing job. It's possible that 
there are many writing
    -   *              jobs running at the same time, and the returned {@link 
DataSourceWriter} can
    +   *              jobs running at the same time, and the returned {@link 
BatchWriteSupport} can
        *              use this job id to distinguish itself from other jobs.
        * @param schema the schema of the data to be written.
        * @param mode the save mode which determines what to do when the data 
are already in this data
        *             source, please refer to {@link SaveMode} for more details.
        * @param options the options for the returned data source writer, which 
is an immutable
        *                case-insensitive string-to-string map.
        */
    -  Optional<DataSourceWriter> createWriter(
    +  Optional<BatchWriteSupport> createBatchWriteSupport(
           String jobId, StructType schema, SaveMode mode, DataSourceOptions 
options);
    --- End diff --
    
    The proposal removes the `jobId` string and the `SaveMode`. Shouldn't that 
be done in this PR?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

Reply via email to