WweiL commented on code in PR #48355:
URL: https://github.com/apache/spark/pull/48355#discussion_r1876847113


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala:
##########
@@ -1533,3 +1717,40 @@ case class AcquiredThreadInfo(
   }
 }
 
+/**
+ * A helper class to manage the lineage information when checkpoint unique id 
is enabled.
+ * "lineage" is an array of LineageItem (version, uniqueId) pair.
+ *
+ * The first item of "lineage" should normally be the version of a snapshot, 
except
+ * for the first few versions. Because they are solely loaded from changelog 
file.
+ * (i.e. with default minDeltasForSnapshot, there is only 1_uuid1.changelog, 
no 1_uuid1.zip)
+ *
+ * The last item of "lineage" corresponds to one version before the 
to-be-committed version.

Review Comment:
   ah no this means lineage only contains committed versions. It basically 
means we only append lineage item during commit: 
   
https://github.com/apache/spark/blob/91d5e815175b32275b34922790d9bb4ba5208e1f/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala#L866



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to