Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22009#discussion_r208642275
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/StreamingWriteSupportProvider.java
---
@@ -29,24 +28,24 @@
* provide data writing ability for structured streaming.
*/
@InterfaceStability.Evolving
-public interface StreamWriteSupport extends DataSourceV2,
BaseStreamingSink {
+public interface StreamingWriteSupportProvider extends DataSourceV2,
BaseStreamingSink {
- /**
- * Creates an optional {@link StreamWriter} to save the data to this
data source. Data
- * sources can return None if there is no writing needed to be done.
- *
- * @param queryId A unique string for the writing query. It's possible
that there are many
- * writing queries running at the same time, and the
returned
- * {@link DataSourceWriter} can use this id to
distinguish itself from others.
- * @param schema the schema of the data to be written.
- * @param mode the output mode which determines what successive epoch
output means to this
- * sink, please refer to {@link OutputMode} for more
details.
- * @param options the options for the returned data source writer,
which is an immutable
- * case-insensitive string-to-string map.
- */
- StreamWriter createStreamWriter(
- String queryId,
- StructType schema,
- OutputMode mode,
- DataSourceOptions options);
+ /**
+ * Creates an optional {@link StreamingWriteSupport} to save the data to
this data source. Data
+ * sources can return None if there is no writing needed to be done.
+ *
+ * @param queryId A unique string for the writing query. It's possible
that there are many
+ * writing queries running at the same time, and the
returned
+ * {@link StreamingWriteSupport} can use this id to
distinguish itself from others.
+ * @param schema the schema of the data to be written.
+ * @param mode the output mode which determines what successive epoch
output means to this
+ * sink, please refer to {@link OutputMode} for more details.
+ * @param options the options for the returned data source writer, which
is an immutable
+ * case-insensitive string-to-string map.
+ */
+ StreamingWriteSupport createStreamingWritSupport(
+ String queryId,
--- End diff --
If it needs to be there for streaming, then let's make sure it is in both
APIs. It can help when debugging writes in batch, too.
One more thing: isn't the abstraction that a `WriteSupport` is something
that can be written to, like the `ReadSupport` is something that can be
scanned? A Table fits both, as do Streams.
If that's the case, then why pass the query ID when creating the
`WriteSupport` or stream? The stream doesn't need a UUID, the actual write
does. On the read side, there's `ScanConfig` that is used to hold the state for
a scan, but on the read side there is no equivalent and we end up with odd uses
of the abstraction like this.
What about creating an equivalent of `ScanConfig` for the write side?
@jose-torres, it would be great to hear your opinion on this, too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]