Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/20369#discussion_r164008791
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/streaming/writer/StreamWriter.java
---
@@ -37,8 +41,28 @@
*/
void commit(long epochId, WriterCommitMessage[] messages);
+ /**
+ * Aborts this writing job because some data writers are failed and keep
failing when retry, or
+ * the Spark job fails with some unknown reasons, or {@link
#commit(WriterCommitMessage[])} fails.
+ *
+ * If this method fails (by throwing an exception), the underlying data
source may require manual
+ * cleanup.
+ *
+ * Unless the abort is triggered by the failure of commit, the given
messages should have some
+ * null slots as there maybe only a few data writers that are committed
before the abort
+ * happens, or some data writers were committed but their commit
messages haven't reached the
+ * driver when the abort is triggered. So this is just a "best effort"
for data sources to
+ * clean up the data left by data writers.
+ */
+ void abort(long epochId, WriterCommitMessage[] messages);
+
default void commit(WriterCommitMessage[] messages) {
throw new UnsupportedOperationException(
- "Commit without epoch should not be called with ContinuousWriter");
+ "Commit without epoch should not be called with StreamWriter");
--- End diff --
nit. One more space at the start?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]