Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/20710#discussion_r172265884
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java
---
@@ -31,13 +31,17 @@
* the {@link #write(Object)}, {@link #abort()} is called afterwards and
the remaining records will
* not be processed. If all records are successfully written, {@link
#commit()} is called.
*
+ * Once a data writer returns successfully from {@link #commit()} or
{@link #abort()}, its lifecycle
+ * is over and Spark will not use it again.
+ *
* If this data writer succeeds(all records are successfully written and
{@link #commit()}
* succeeds), a {@link WriterCommitMessage} will be sent to the driver
side and pass to
* {@link DataSourceWriter#commit(WriterCommitMessage[])} with commit
messages from other data
* writers. If this data writer fails(one record fails to write or {@link
#commit()} fails), an
- * exception will be sent to the driver side, and Spark will retry this
writing task for some times,
- * each time {@link DataWriterFactory#createDataWriter(int, int)} gets a
different `attemptNumber`,
- * and finally call {@link DataSourceWriter#abort(WriterCommitMessage[])}
if all retry fail.
+ * exception will be sent to the driver side, and Spark may retry this
writing task a few times.
+ * In each retry, {@link DataWriterFactory#createDataWriter(int, int,
long)} will receive a
+ * different `attemptNumber`. Spark will call {@link
DataSourceWriter#abort(WriterCommitMessage[])}
--- End diff --
The local abort will be called every time a task attempt fails. The global
abort referenced here is called only when the job fails.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]