zeruibao opened a new pull request, #54298:
URL: https://github.com/apache/spark/pull/54298

   ### What changes were proposed in this pull request?
   This PR adds changelog writer support for deleteRange in the RocksDB state 
store. Previously, deleteRange only performed the RocksDB native range deletion 
but did not record the operation in the changelog file. The changes include:
   
   - Added a new DELETE_RANGE_RECORD record type (byte 0x20) to the RecordType 
enum in StateStoreChangelog.scala
   - Added an abstract deleteRange(beginKey, endKey) method to 
StateStoreChangelogWriter, implemented in V2/V4 writers (V1/V3 throw 
UnsupportedOperationException, consistent with merge)
   - Updated StateStoreChangelogReaderV2 to parse DELETE_RANGE_RECORD entries
   - Updated RocksDB.deleteRange to write to the changelog after the native 
db.deleteRange call, with an includesPrefix parameter for replay correctness
   - Updated RocksDB.replayChangelog to handle DELETE_RANGE_RECORD by calling 
deleteRange during recovery
   - Updated RocksDBStateStoreChangeDataReader to skip DELETE_RANGE_RECORD with 
a warning, since range deletions cannot be expanded into individual key-value 
change records
   - Added a test verifying that deleteRange is properly recorded and replayed 
via changelog checkpointing
   
   
   ### Why are the changes needed?
   When changelog checkpointing is enabled, the state store recovers by 
replaying changelog files rather than loading full snapshots. Since deleteRange 
was not recorded in the changelog, any range deletions were silently lost 
during changelog-based recovery, leading to data inconsistency -- keys that 
should have been deleted would reappear after a restart.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   UT
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Yes, co-authored with Cursor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to