Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20386#discussion_r164735522
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
---
@@ -40,16 +40,21 @@
* 1. Create a writer factory by {@link #createWriterFactory()},
serialize and send it to all the
* partitions of the input data(RDD).
* 2. For each partition, create the data writer, and write the data of
the partition with this
- * writer. If all the data are written successfully, call {@link
DataWriter#commit()}. If
- * exception happens during the writing, call {@link
DataWriter#abort()}.
- * 3. If all writers are successfully committed, call {@link
#commit(WriterCommitMessage[])}. If
- * some writers are aborted, or the job failed with an unknown
reason, call
- * {@link #abort(WriterCommitMessage[])}.
+ * writer. If one data writer finishes successfully, the commit
message will be sent back to
+ * the driver side and Spark will call {@link
#add(WriterCommitMessage)}.
+ * If exception happens during the writing, call {@link
DataWriter#abort()}.
+ * 3. If all the data writers finish successfully, and {@link
#add(WriterCommitMessage)} is
+ * successfully called for all the commit messages, Spark will call
{@link #commit()}.
+ * If any of the data writers failed, or any of the {@link
#add(WriterCommitMessage)}
+ * calls failed, or the job failed with an unknown reason, call
{@link #abort()}.
*
* While Spark will retry failed writing tasks, Spark won't retry failed
writing jobs. Users should
* do it manually in their Spark applications if they want to retry.
*
- * Please refer to the documentation of commit/abort methods for detailed
specifications.
+ * All these methods are guaranteed to be called in a single thread.
--- End diff --
nit: `... in a single thread at driver side`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]