jerrypeng commented on code in PR #38517:
URL: https://github.com/apache/spark/pull/38517#discussion_r1049011457
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -342,17 +342,14 @@ class MicroBatchExecution(
isCurrentBatchConstructed = true
availableOffsets = nextOffsets.toStreamProgress(sources)
/* Initialize committed offsets to a committed batch, which at this
- * is the second latest batch id in the offset log. */
- if (latestBatchId != 0) {
- val secondLatestOffsets = offsetLog.get(latestBatchId - 1).getOrElse
{
- logError(s"The offset log for batch ${latestBatchId - 1} doesn't
exist, " +
- s"which is required to restart the query from the latest batch
$latestBatchId " +
- "from the offset log. Please ensure there are two subsequent
offset logs " +
- "available for the latest batch via manually deleting the offset
file(s). " +
- "Please also ensure the latest batch for commit log is equal or
one batch " +
- "earlier than the latest batch for offset log.")
- throw new IllegalStateException(s"batch ${latestBatchId - 1}
doesn't exist")
- }
+ * is the second latest batch id in the offset log.
Review Comment:
> Do we have a goal to support smooth transition between normal microbatch
execution and async progress tracking for a single query?
yes
The existing behavior will break async progress tracking especially if the
user wants to switch between turning it on and off
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]