Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/20490#discussion_r166995080
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
---
@@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage
message) {}
* failed, and {@link #abort(WriterCommitMessage[])} would be called.
The state of the destination
* is undefined and @{@link #abort(WriterCommitMessage[])} may not be
able to deal with it.
*
- * Note that, one partition may have multiple committed data writers
because of speculative tasks.
- * Spark will pick the first successful one and get its commit message.
Implementations should be
- * aware of this and handle it correctly, e.g., have a coordinator to
make sure only one data
- * writer can commit, or have a way to clean up the data of
already-committed writers.
+ * Note that speculative execution may cause multiple tasks to run for a
partition. By default,
+ * Spark uses the OutputCommitCoordinator to allow only one attempt to
commit.
+ * {@link DataWriterFactory} implementations can disable this behavior.
If disabled, multiple
--- End diff --
It says that already: "DataWriterFactory implementations can disable this
behavior."
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]