Github user rdblue commented on a diff in the pull request:
https://github.com/apache/spark/pull/20386#discussion_r164908529
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java
---
@@ -63,32 +68,42 @@
DataWriterFactory<Row> createWriterFactory();
/**
- * Commits this writing job with a list of commit messages. The commit
messages are collected from
- * successful data writers and are produced by {@link
DataWriter#commit()}.
+ * Handles a commit message which is collected from a successful data
writer.
+ *
+ * Note that, implementations might need to cache all commit messages
before calling
+ * {@link #commit()} or {@link #abort()}.
--- End diff --
In what case would an implementation not cache and commit all at once? What
is the point of a commit if not to make sure all of the data shows up at the
same time?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]