Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/20386#discussion_r165124614
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java
---
@@ -32,40 +32,44 @@
@InterfaceStability.Evolving
public interface StreamWriter extends DataSourceWriter {
/**
- * Commits this writing job for the specified epoch with a list of
commit messages. The commit
- * messages are collected from successful data writers and are produced
by
- * {@link DataWriter#commit()}.
+ * Commits this writing job for the specified epoch.
*
- * If this method fails (by throwing an exception), this writing job is
considered to have been
- * failed, and the execution engine will attempt to call {@link
#abort(WriterCommitMessage[])}.
+ * When this method is called, the number of commit messages added by
+ * {@link #add(WriterCommitMessage)} equals to the number of input data
partitions.
+ *
+ * If this method fails (by throwing an exception), this writing job is
considered to to have been
+ * failed, and {@link #abort()} would be called. The state of the
destination
+ * is undefined and @{@link #abort()} may not be able to deal with it.
*
* To support exactly-once processing, writer implementations should
ensure that this method is
* idempotent. The execution engine may call commit() multiple times for
the same epoch
--- End diff --
What are the exact guarantees you're looking for when calling a system
"exactly-once"? I worry you're looking for something that isn't possible. In
particular, I don't know of any additional guarantee that check would allow us
to make.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]