ericm-db commented on code in PR #56018:
URL: https://github.com/apache/spark/pull/56018#discussion_r3326386583


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala:
##########
@@ -600,37 +598,6 @@ class RocksDBWithCheckpointV2StateDataSourceReaderSuite 
extends StateDataSourceR
       "true")
   }
 
-  // TODO: Remove this test once we allow migrations from checkpoint v1 to v2
-  test("reading checkpoint v2 store with version 1 should fail") {

Review Comment:
   Yeah so I dug into the v2-store / v1-config direction and it can't actually 
succeed, so I couldn't flip it to a success assertion. This PR makes the commit 
log read across versions, but a v2 state store names its files with checkpoint 
unique ids, and the reader only uses unique-id naming when 
enableStateStoreCheckpointIds (driven by STATE_STORE_CHECKPOINT_FORMAT_VERSION) 
is true. 
   Under v1 config it goes looking for the non-unique 1.changelog and fails 
with CANNOT_LOAD_STATE_STORE, not INVALID_LOG_VERSION. So I kept a test for 
this direction, but it asserts the new behavior. The read now gets past the 
commit-log version check (the part this PR changes) and fails at the 
state-store layer instead. I also added the reverse, reading a v1 checkpoint 
under a v2-configured session, which does succeed end to end. I can also just 
drop this test if you don't want to assert on the failure mode.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to