arunmahadevan commented on a change in pull request #23430: [SPARK-26520][SQL]
data source v2 API refactor (micro-batch read)
URL: https://github.com/apache/spark/pull/23430#discussion_r245091849
##########
File path:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/Scan.java
##########
@@ -65,4 +67,20 @@ default String description() {
default Batch toBatch() {
throw new UnsupportedOperationException("Batch scans are not supported");
}
+
+ /**
+ * Returns the physical representation of this scan for streaming query with
micro-batch mode. By
+ * default this method throws exception, data sources must overwrite this
method to provide an
+ * implementation, if the {@link Table} that creates this scan implements
+ * {@link SupportsMicroBatchRead}.
+ *
+ * @param checkpointLocation a path to Hadoop FS scratch space that can be
used for failure
+ * recovery. Data streams for the same logical
source in the same query
+ * will be given the same checkpointLocation.
+ *
+ * @throws UnsupportedOperationException
+ */
+ default MicroBatchStream toMicroBatchStream(String checkpointLocation) {
Review comment:
In "alternative1" there is no equivalent Logical `Scan`? I was thinking we
need the `Scan` (the logical scan) separate from physical scans.
Also if they don't inherit a common parent can it be passed to the
DatasourceV2ScanExec ?
Anyways better to relook and rename as appropriate to keep the different
ones (batch/micro-batch/continuous) have common pre/suffixes and denote what
they mean.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]