Re: [PR] [SPARK-49411][SS] Communicate CheckpointID between driver and stateful operators [spark]

via GitHub Wed, 11 Sep 2024 12:16:34 -0700


siying commented on code in PR #47895:
URL: https://github.com/apache/spark/pull/47895#discussion_r1755429152



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala:
##########
@@ -130,6 +131,11 @@ class MicroBatchExecution(
 
   protected var watermarkTracker: WatermarkTracker = _
 
+  // Store checkpointIDs for state store checkpoints to be committed or have 
been committed to
+  // the commit log.
+  // operatorID -> (partitionID -> uniqueID)
+  private val currentCheckpointUniqueId = MutableMap[Long, Array[String]]()

Review Comment:
   Right now, the uniqueID is generated in executor. As a potential 
optimization, the driver can send a uniqueID to all executors, but executors 
still need to modify it to make it unique among all attempts of the same task. 
After doing that, the IDs won't be unique anymore, so we need different IDs per 
partition.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala:
##########
@@ -57,7 +59,9 @@ class IncrementalExecution(
     val prevOffsetSeqMetadata: Option[OffsetSeqMetadata],
     val offsetSeqMetadata: OffsetSeqMetadata,
     val watermarkPropagator: WatermarkPropagator,
-    val isFirstBatch: Boolean)
+    val isFirstBatch: Boolean,
+    val currentCheckpointUniqueId:
+      MutableMap[Long, Array[String]] = MutableMap[Long, Array[String]]())

Review Comment:
   I'll add a comment, but it is basically operatorID->partitionID->checkpointID



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49411][SS] Communicate CheckpointID between driver and stateful operators [spark]

Reply via email to