[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

jose-torres Wed, 31 Jan 2018 09:21:07 -0800

Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20386#discussion_r165124614
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java
 ---
    @@ -32,40 +32,44 @@
     @InterfaceStability.Evolving
     public interface StreamWriter extends DataSourceWriter {
       /**
    -   * Commits this writing job for the specified epoch with a list of 
commit messages. The commit
    -   * messages are collected from successful data writers and are produced 
by
    -   * {@link DataWriter#commit()}.
    +   * Commits this writing job for the specified epoch.
        *
    -   * If this method fails (by throwing an exception), this writing job is 
considered to have been
    -   * failed, and the execution engine will attempt to call {@link 
#abort(WriterCommitMessage[])}.
    +   * When this method is called, the number of commit messages added by
    +   * {@link #add(WriterCommitMessage)} equals to the number of input data 
partitions.
    +   *
    +   * If this method fails (by throwing an exception), this writing job is 
considered to to have been
    +   * failed, and {@link #abort()} would be called. The state of the 
destination
    +   * is undefined and @{@link #abort()} may not be able to deal with it.
        *
        * To support exactly-once processing, writer implementations should 
ensure that this method is
        * idempotent. The execution engine may call commit() multiple times for 
the same epoch
    --- End diff --
    
    What are the exact guarantees you're looking for when calling a system 
"exactly-once"? I worry you're looking for something that isn't possible. In 
particular, I don't know of any additional guarantee that check would allow us 
to make.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

Reply via email to