Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r211639298 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -24,16 +24,17 @@ import org.apache.spark.sql.sources.v2.reader.ScanConfigBuilder; /** - * An interface that defines how to scan the data from data source for continuous streaming + * An interface that defines how to load the data from data source for continuous streaming * processing. * - * The execution engine will create an instance of this interface at the start of a streaming query, - * then call {@link #newScanConfigBuilder(Offset)} and create an instance of {@link ScanConfig} for - * the duration of the streaming query or until {@link #needsReconfiguration(ScanConfig)} is true. - * The {@link ScanConfig} will be used to create input partitions and reader factory to process data - * for its duration. At the end {@link #stop()} will be called when the streaming execution is - * completed. Note that a single query may have multiple executions due to restart or failure - * recovery. + * The execution engine will get an instance of this interface from a data source provider + * (e.g. {@link org.apache.spark.sql.sources.v2.ContinuousReadSupportProvider}) at the start of a + * streaming query, then call {@link #newScanConfigBuilder(Offset)} to create an instance of + * {@link ScanConfig} for the duration of the streaming query or until + * {@link #needsReconfiguration(ScanConfig)} is true. The {@link ScanConfig} will be used to create + * input partitions and reader factory to scan data for its duration. At the end {@link #stop()} + * will be called when the streaming execution is completed. Note that a single query may have + * multiple executions due to restart or failure recovery. --- End diff -- I would also add this documentation on the relevant methods. So getContinuousReadSupport and getMicroBatchReadSupport would say something like "Spark will call this method at the beginning of each streaming query to get a ReadSupport", newScanConfigBuilder would say something like "Spark will get a ScanConfig once for each data scanning job".
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org