Github user jose-torres commented on a diff in the pull request:
https://github.com/apache/spark/pull/20710#discussion_r172321800
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java
---
@@ -48,6 +48,9 @@
* same task id but different attempt number, which
means there are multiple
* tasks with the same task id running at the same
time. Implementations can
* use this attempt number to distinguish writers
of different task attempts.
+ * @param epochId A monotonically increasing id for streaming queries
that are split in to
+ * discrete periods of execution. For non-streaming
queries,
+ * this ID will always be 0.
*/
- DataWriter<T> createDataWriter(int partitionId, int attemptNumber);
+ DataWriter<T> createDataWriter(int partitionId, int attemptNumber, long
epochId);
--- End diff --
The guarantees are identical, and in the current execution model, each
epoch is in fact processed by a single batch job.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]