yadavay-amzn opened a new pull request, #56097:
URL: https://github.com/apache/spark/pull/56097

   ### What changes were proposed in this pull request?
   
   Add `validateStateRowFormat` to `multiGet()` in `RocksDBStateStoreProvider`, 
consistent with existing validation in `get()`, `iterator()`, `prefixScan()`, 
and `rangeScan()`.
   
   ### Why are the changes needed?
   
   `multiGet()` decodes values via `kvEncoder._2.decodeValue` but never calls 
`validateStateRowFormat`. If a stateful operator evolves its schema between 
restarts, `multiGet()` will silently return corrupted data instead of failing 
fast with `StateStoreValueRowFormatValidationFailure`.
   
   All other read-path methods that decode rows already perform this 
validation. `multiGet()` is the only inconsistency:
   
   | Method | Decodes rows? | Has validation? |
   |--------|:---:|:---:|
   | `get()` | Yes | ✅ |
   | `iterator()` | Yes | ✅ |
   | `prefixScan()` | Yes | ✅ (SPARK-56539) |
   | `rangeScan()` | Yes | ✅ (SPARK-56539) |
   | `multiGet()` | Yes | ❌ → ✅ (this PR) |
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. The validation is gated behind the existing 
`spark.sql.streaming.stateStore.formatValidation.enabled` config (default: true 
in testing mode).
   
   ### How was this patch tested?
   
   Added test in `RocksDBStateStoreSuite` that writes data with one schema, 
reopens with a mismatched schema, and verifies `multiGet()` throws 
`StateStoreValueRowFormatValidationFailure`. The test follows the same pattern 
as the existing SPARK-56539 tests for `prefixScan` and `rangeScan`.
   
   - Without fix: `multiGet()` silently returns corrupted rows
   - With fix: `StateStoreValueRowFormatValidationFailure` thrown on first 
decoded value
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to