Re: [PR] [SPARK-55144][SS] Introduce new state format version for performant stream-stream join [spark]

via GitHub Sun, 25 Jan 2026 04:48:44 -0800


HeartSaVioR commented on code in PR #53930:
URL: https://github.com/apache/spark/pull/53930#discussion_r2725458115



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala:
##########
@@ -328,6 +331,39 @@ trait DataEncoder {
    */
   def encodePrefixKeyForRangeScan(row: UnsafeRow): Array[Byte]
 
+  /**
+   * Encodes key and event time, ensuring prefix scan with key and also proper 
sort order with
+   * event time within the same key in RocksDB.
+   *
+   * This method handles the encoding as follows:
+   * - Encodes the key columns normally and put them first
+   * - Appends the event time Long value in big-endian order as the last 8 
bytes
+   *
+   * @param row An UnsafeRow denoting a key
+   * @param eventTime Long value representing the event time
+   * @return Serialized bytes that will maintain prefix scan with key and sort 
order with
+   *         event time
+   * @throws UnsupportedOperationException if called on an encoder that 
doesn't support event time
+   *                                       as postfix.
+   */
+  def encodeKeyForEventTimeAsPostfix(row: UnsafeRow, eventTime: Long): 
Array[Byte]

Review Comment:
   I'd say we shouldn't generalize too much - this is coupled with state store 
API change and I'm not sure we want to introduce an API with just saying it's 
to handle additional long type. That should have enough meaning to do so.
   
   While I think "event time" has enough potential for usages, timestamp is 
fine for me if event time sounds too tight. I'd still want to keep the semantic 
of "time" here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55144][SS] Introduce new state format version for performant stream-stream join [spark]

Reply via email to