[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

jose-torres Tue, 21 Aug 2018 08:01:16 -0700

Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22009#discussion_r211639298
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java
 ---
    @@ -24,16 +24,17 @@
     import org.apache.spark.sql.sources.v2.reader.ScanConfigBuilder;
     
     /**
    - * An interface that defines how to scan the data from data source for 
continuous streaming
    + * An interface that defines how to load the data from data source for 
continuous streaming
      * processing.
      *
    - * The execution engine will create an instance of this interface at the 
start of a streaming query,
    - * then call {@link #newScanConfigBuilder(Offset)} and create an instance 
of {@link ScanConfig} for
    - * the duration of the streaming query or until {@link 
#needsReconfiguration(ScanConfig)} is true.
    - * The {@link ScanConfig} will be used to create input partitions and 
reader factory to process data
    - * for its duration. At the end {@link #stop()} will be called when the 
streaming execution is
    - * completed. Note that a single query may have multiple executions due to 
restart or failure
    - * recovery.
    + * The execution engine will get an instance of this interface from a data 
source provider
    + * (e.g. {@link 
org.apache.spark.sql.sources.v2.ContinuousReadSupportProvider}) at the start of 
a
    + * streaming query, then call {@link #newScanConfigBuilder(Offset)} to 
create an instance of
    + * {@link ScanConfig} for the duration of the streaming query or until
    + * {@link #needsReconfiguration(ScanConfig)} is true. The {@link 
ScanConfig} will be used to create
    + * input partitions and reader factory to scan data for its duration. At 
the end {@link #stop()}
    + * will be called when the streaming execution is completed. Note that a 
single query may have
    + * multiple executions due to restart or failure recovery.
    --- End diff --
    
    I would also add this documentation on the relevant methods. So 
getContinuousReadSupport and getMicroBatchReadSupport would say something like 
"Spark will call this method at the beginning of each streaming query to get a 
ReadSupport", newScanConfigBuilder would say something like "Spark will get a 
ScanConfig once for each data scanning job".



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

Reply via email to