Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/20386#discussion_r165137514 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java --- @@ -32,40 +32,44 @@ @InterfaceStability.Evolving public interface StreamWriter extends DataSourceWriter { /** - * Commits this writing job for the specified epoch with a list of commit messages. The commit - * messages are collected from successful data writers and are produced by - * {@link DataWriter#commit()}. + * Commits this writing job for the specified epoch. * - * If this method fails (by throwing an exception), this writing job is considered to have been - * failed, and the execution engine will attempt to call {@link #abort(WriterCommitMessage[])}. + * When this method is called, the number of commit messages added by + * {@link #add(WriterCommitMessage)} equals to the number of input data partitions. + * + * If this method fails (by throwing an exception), this writing job is considered to to have been + * failed, and {@link #abort()} would be called. The state of the destination + * is undefined and @{@link #abort()} may not be able to deal with it. * * To support exactly-once processing, writer implementations should ensure that this method is * idempotent. The execution engine may call commit() multiple times for the same epoch --- End diff -- It's true that there's no exactly-once behavior with respect to StreamWriter.commit(). "Exactly-once processing" refers to the promise that the remote sink will contain 1 and no more than 1 committed copy of each processed record.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org