HyukjinKwon opened a new pull request, #56718: URL: https://github.com/apache/spark/pull/56718
## What changes were proposed in this pull request? **[DO-NOT-MERGE]** — CI-stabilization / observability change (draft). When loading state from a snapshot (e.g. reading with the `snapshotStartBatchId` option) the snapshot zip for the requested version can be missing — most often because the asynchronous maintenance thread has not uploaded it yet. Today that surfaces only as: ``` [CANNOT_LOAD_STATE_STORE.UNCATEGORIZED] ... Caused by: java.io.FileNotFoundException: .../state/0/1/2.zip does not exist ``` with no indication of whether the snapshot was never created or merely not uploaded in time. This PR enriches the `FileNotFoundException` thrown from `RocksDBFileManager` with the snapshot (`.zip`) and changelog (`.changelog`) files that **are** present in the DFS checkpoint root, so the situation is self-diagnosing from logs (e.g. *asked for 2.zip, only 1.zip present* clearly indicates the async-upload race). The listing is best-effort and never throws. A unit test in `RocksDBSuite` deterministically validates the enriched message. ## Why are the changes needed? The `snapshotStartBatchId ... transformWithState` tests have failed intermittently in scheduled Maven jobs (master, branch-4.x, branch-4.2; both plain and row-checksum variants) with exactly this opaque `FileNotFoundException`. The deterministic test-side fix was handled separately; this change makes any *future* recurrence — in any suite or scheduled job — immediately diagnosable instead of requiring artifact spelunking. ## Does this PR introduce any user-facing change? No (improves an error message on an already-failing path). ## How was this patch tested? New unit test `RocksDBFileManager: missing snapshot during load reports the available versions`, plus a focused fork workflow that runs `RocksDBSuite` once and repeats the snapshot suites 5× to confirm stability. The last commit (validation workflow) must be reverted before merge. This pull request and its description were written by Isaac. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
