Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/22009#discussion_r211639298
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java
---
@@ -24,16 +24,17 @@
import org.apache.spark.sql.sources.v2.reader.ScanConfigBuilder;
/**
- * An interface that defines how to scan the data from data source for
continuous streaming
+ * An interface that defines how to load the data from data source for
continuous streaming
* processing.
*
- * The execution engine will create an instance of this interface at the
start of a streaming query,
- * then call {@link #newScanConfigBuilder(Offset)} and create an instance
of {@link ScanConfig} for
- * the duration of the streaming query or until {@link
#needsReconfiguration(ScanConfig)} is true.
- * The {@link ScanConfig} will be used to create input partitions and
reader factory to process data
- * for its duration. At the end {@link #stop()} will be called when the
streaming execution is
- * completed. Note that a single query may have multiple executions due to
restart or failure
- * recovery.
+ * The execution engine will get an instance of this interface from a data
source provider
+ * (e.g. {@link
org.apache.spark.sql.sources.v2.ContinuousReadSupportProvider}) at the start of
a
+ * streaming query, then call {@link #newScanConfigBuilder(Offset)} to
create an instance of
+ * {@link ScanConfig} for the duration of the streaming query or until
+ * {@link #needsReconfiguration(ScanConfig)} is true. The {@link
ScanConfig} will be used to create
+ * input partitions and reader factory to scan data for its duration. At
the end {@link #stop()}
+ * will be called when the streaming execution is completed. Note that a
single query may have
+ * multiple executions due to restart or failure recovery.
--- End diff --
I would also add this documentation on the relevant methods. So
getContinuousReadSupport and getMicroBatchReadSupport would say something like
"Spark will call this method at the beginning of each streaming query to get a
ReadSupport", newScanConfigBuilder would say something like "Spark will get a
ScanConfig once for each data scanning job".
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]