Github user jose-torres commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20710#discussion_r172265884
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java 
---
    @@ -31,13 +31,17 @@
      * the {@link #write(Object)}, {@link #abort()} is called afterwards and 
the remaining records will
      * not be processed. If all records are successfully written, {@link 
#commit()} is called.
      *
    + * Once a data writer returns successfully from {@link #commit()} or 
{@link #abort()}, its lifecycle
    + * is over and Spark will not use it again.
    + *
      * If this data writer succeeds(all records are successfully written and 
{@link #commit()}
      * succeeds), a {@link WriterCommitMessage} will be sent to the driver 
side and pass to
      * {@link DataSourceWriter#commit(WriterCommitMessage[])} with commit 
messages from other data
      * writers. If this data writer fails(one record fails to write or {@link 
#commit()} fails), an
    - * exception will be sent to the driver side, and Spark will retry this 
writing task for some times,
    - * each time {@link DataWriterFactory#createDataWriter(int, int)} gets a 
different `attemptNumber`,
    - * and finally call {@link DataSourceWriter#abort(WriterCommitMessage[])} 
if all retry fail.
    + * exception will be sent to the driver side, and Spark may retry this 
writing task a few times.
    + * In each retry, {@link DataWriterFactory#createDataWriter(int, int, 
long)} will receive a
    + * different `attemptNumber`. Spark will call {@link 
DataSourceWriter#abort(WriterCommitMessage[])}
    --- End diff --
    
    The local abort will be called every time a task attempt fails. The global 
abort referenced here is called only when the job fails.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to