jerrypeng commented on code in PR #55676:
URL: https://github.com/apache/spark/pull/55676#discussion_r3192305578
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/runtime/AsyncProgressTrackingMicroBatchExecution.scala:
##########
@@ -57,6 +57,13 @@ class AsyncProgressTrackingMicroBatchExecution(
// used to check during the first batch if the pipeline is stateful
private var isFirstBatch: Boolean = true
+ // Records the first error seen by any async log write task. Subsequent async
+ // log writes short-circuit by failing with this error before touching
storage.
+ // This prevents creating gaps on durable storage (e.g. offset N missing
while
+ // offset N+1 is present, or commit N+1 written without offset N) when an
Review Comment:
> This prevents creating gaps on durable storage (e.g. offset N missing while
// offset N+1 is present, or commit N+1 written without offset N
There can be gaps in terms of batches when the APT interval is set but what
cannot happen is commit log for batch n is written but not offset log for batch
n.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]