ericm-db commented on code in PR #56018:
URL: https://github.com/apache/spark/pull/56018#discussion_r3326386583
##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala:
##########
@@ -600,37 +598,6 @@ class RocksDBWithCheckpointV2StateDataSourceReaderSuite
extends StateDataSourceR
"true")
}
- // TODO: Remove this test once we allow migrations from checkpoint v1 to v2
- test("reading checkpoint v2 store with version 1 should fail") {
Review Comment:
Yeah so I dug into the v2-store / v1-config direction and it can't actually
succeed, so I couldn't flip it to a success assertion. This PR makes the commit
log read across versions, but a v2 state store names its files with checkpoint
unique ids, and the reader only uses unique-id naming when
enableStateStoreCheckpointIds (driven by STATE_STORE_CHECKPOINT_FORMAT_VERSION)
is true.
Under v1 config it goes looking for the non-unique 1.changelog and fails
with CANNOT_LOAD_STATE_STORE, not INVALID_LOG_VERSION. So I kept a test for
this direction, but it asserts the new behavior. The read now gets past the
commit-log version check (the part this PR changes) and fails at the
state-store layer instead. I also added the reverse, reading a v1 checkpoint
under a v2-configured session, which does succeed end to end. I can also just
drop this test if you don't want to assert on the failure mode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]