Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/20710#discussion_r172321800 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -48,6 +48,9 @@ * same task id but different attempt number, which means there are multiple * tasks with the same task id running at the same time. Implementations can * use this attempt number to distinguish writers of different task attempts. + * @param epochId A monotonically increasing id for streaming queries that are split in to + * discrete periods of execution. For non-streaming queries, + * this ID will always be 0. */ - DataWriter<T> createDataWriter(int partitionId, int attemptNumber); + DataWriter<T> createDataWriter(int partitionId, int attemptNumber, long epochId); --- End diff -- The guarantees are identical, and in the current execution model, each epoch is in fact processed by a single batch job.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org