HeartSaVioR commented on code in PR #48853:
URL: https://github.com/apache/spark/pull/48853#discussion_r1858574964


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TTLState.scala:
##########
@@ -223,36 +323,16 @@ trait TTLState {
  * like (expirationMs, groupingKey) -> EMPTY_ROW. This way, we can quickly 
find all the
  * grouping keys that contain at least one element that has expired.
  *
- * There is some trickiness here, though. Suppose we have an element key `k` 
that
- * has a list with one value `v1` that expires at time `t1`. Our primary index 
looks like
- * k -> [v1]; our secondary index looks like [(t1, k) -> EMPTY_ROW]. Now, we 
add another
- * value to the list, `v2`, that expires at time `t2`. The primary index 
updates to be
- * k -> [v1, v2]. However, how do we update our secondary index? We already 
have an entry
- * in our secondary index for `k`, but it's prefixed with `t1`, which we don't 
know at the
- * time of inserting `v2`.
+ * To make sure that we aren't "late" in cleaning up expired values, this 
secondary index
+ * maps from the minimum expiration in a list and a grouping key to the 
EMPTY_VALUE. This
+ * index is called the "TTL index" in the code (to be consistent with 
[[OneToOneTTLState]]),
+ * though it behaves more like a work queue of lists that need to be cleaned 
up.
  *
- * So, do we:
- *    1. Blindly add (t2, k) -> EMPTY_ROW to the secondary index?
- *    2. Delete (t1, k) from the secondary index, and then add (t2, k) -> 
EMPTY_ROW?
+ * Since a grouping key may have a large list and we need to quickly know what 
the
+ * minimum expiration is, we need to reverse this work queue index. This 
reversed index
+ * maps from key to the minimum expiration in the list, and it is called the 
"min-index".

Review Comment:
   We seem to still refer this to min-expiry in other place; let's use 
min-expiry.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to