zeruibao opened a new pull request, #54298: URL: https://github.com/apache/spark/pull/54298
### What changes were proposed in this pull request? This PR adds changelog writer support for deleteRange in the RocksDB state store. Previously, deleteRange only performed the RocksDB native range deletion but did not record the operation in the changelog file. The changes include: - Added a new DELETE_RANGE_RECORD record type (byte 0x20) to the RecordType enum in StateStoreChangelog.scala - Added an abstract deleteRange(beginKey, endKey) method to StateStoreChangelogWriter, implemented in V2/V4 writers (V1/V3 throw UnsupportedOperationException, consistent with merge) - Updated StateStoreChangelogReaderV2 to parse DELETE_RANGE_RECORD entries - Updated RocksDB.deleteRange to write to the changelog after the native db.deleteRange call, with an includesPrefix parameter for replay correctness - Updated RocksDB.replayChangelog to handle DELETE_RANGE_RECORD by calling deleteRange during recovery - Updated RocksDBStateStoreChangeDataReader to skip DELETE_RANGE_RECORD with a warning, since range deletions cannot be expanded into individual key-value change records - Added a test verifying that deleteRange is properly recorded and replayed via changelog checkpointing ### Why are the changes needed? When changelog checkpointing is enabled, the state store recovers by replaying changelog files rather than loading full snapshots. Since deleteRange was not recorded in the changelog, any range deletions were silently lost during changelog-based recovery, leading to data inconsistency -- keys that should have been deleted would reappear after a restart. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored with Cursor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
