siying commented on code in PR #47895:
URL: https://github.com/apache/spark/pull/47895#discussion_r1755429152
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -130,6 +131,11 @@ class MicroBatchExecution(
protected var watermarkTracker: WatermarkTracker = _
+ // Store checkpointIDs for state store checkpoints to be committed or have
been committed to
+ // the commit log.
+ // operatorID -> (partitionID -> uniqueID)
+ private val currentCheckpointUniqueId = MutableMap[Long, Array[String]]()
Review Comment:
Right now, the uniqueID is generated in executor. As a potential
optimization, the driver can send a uniqueID to all executors, but executors
still need to modify it to make it unique among all attempts of the same task.
After doing that, the IDs won't be unique anymore, so we need different IDs per
partition.
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala:
##########
@@ -57,7 +59,9 @@ class IncrementalExecution(
val prevOffsetSeqMetadata: Option[OffsetSeqMetadata],
val offsetSeqMetadata: OffsetSeqMetadata,
val watermarkPropagator: WatermarkPropagator,
- val isFirstBatch: Boolean)
+ val isFirstBatch: Boolean,
+ val currentCheckpointUniqueId:
+ MutableMap[Long, Array[String]] = MutableMap[Long, Array[String]]())
Review Comment:
I'll add a comment, but it is basically operatorID->partitionID->checkpointID
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]