[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

cloud-fan Tue, 07 Aug 2018 19:13:17 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22009#discussion_r208437780
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java
 ---
    @@ -29,24 +28,24 @@
      * provide data writing ability for structured streaming.
      */
     @InterfaceStability.Evolving
    -public interface StreamWriteSupport extends DataSourceV2, 
BaseStreamingSink {
    +public interface StreamingWriteSupportProvider extends DataSourceV2, 
BaseStreamingSink {
     
    -    /**
    -     * Creates an optional {@link StreamWriter} to save the data to this 
data source. Data
    -     * sources can return None if there is no writing needed to be done.
    -     *
    -     * @param queryId A unique string for the writing query. It's possible 
that there are many
    -     *                writing queries running at the same time, and the 
returned
    -     *                {@link DataSourceWriter} can use this id to 
distinguish itself from others.
    -     * @param schema the schema of the data to be written.
    -     * @param mode the output mode which determines what successive epoch 
output means to this
    -     *             sink, please refer to {@link OutputMode} for more 
details.
    -     * @param options the options for the returned data source writer, 
which is an immutable
    -     *                case-insensitive string-to-string map.
    -     */
    -    StreamWriter createStreamWriter(
    -        String queryId,
    -        StructType schema,
    -        OutputMode mode,
    -        DataSourceOptions options);
    +  /**
    +   * Creates an optional {@link StreamingWriteSupport} to save the data to 
this data source. Data
    +   * sources can return None if there is no writing needed to be done.
    +   *
    +   * @param queryId A unique string for the writing query. It's possible 
that there are many
    +   *                writing queries running at the same time, and the 
returned
    +   *                {@link StreamingWriteSupport} can use this id to 
distinguish itself from others.
    +   * @param schema the schema of the data to be written.
    +   * @param mode the output mode which determines what successive epoch 
output means to this
    +   *             sink, please refer to {@link OutputMode} for more details.
    +   * @param options the options for the returned data source writer, which 
is an immutable
    +   *                case-insensitive string-to-string map.
    +   */
    +  StreamingWriteSupport createStreamingWritSupport(
    +    String queryId,
    --- End diff --
    
    for the batch API, I think we can remove job id and ask the data source to 
generate UUID themselves. But for streaming, I'm not sure. Maybe we need it for 
failure recovery or streaming restart, cc @jose-torres



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

Reply via email to