HyukjinKwon opened a new pull request, #56721: URL: https://github.com/apache/spark/pull/56721
### What changes were proposed in this pull request? Replace the fixed `Thread.sleep(5000)` in the `"snapshotStartBatchId with transformWithState"` test of `StateDataSourceTransformWithStateSuite` with a deterministic `eventually(...)` wait that polls until the RocksDB snapshot files the `snapshotStartBatchId` reader needs (snapshot version 2 for the partitions read) have actually been uploaded by the asynchronous maintenance thread. The root cause is known, but to guard against regressions the timeout assertion now prints the actual state-directory contents, so a recurrence in scheduled jobs is immediately diagnosable (snapshot still pending vs. cleaned up vs. wrong path) rather than a bare failure. ### Why are the changes needed? The snapshot upload is asynchronous (background maintenance thread), so the fixed sleep is racy under CI load. When it is too short, the snapshot `.zip` is not yet uploaded and the reader fails on the scheduled **Maven (Scala 2.13, JDK 21)** and **JDK 25** builds: ``` [CANNOT_LOAD_STATE_STORE.UNCATEGORIZED] An error occurred during loading state. Caused by: java.io.FileNotFoundException: .../state/0/1/2.zip does not exist ``` ### Does this PR introduce _any_ user-facing change? No. Test-only. ### How was this patch tested? - **Before (failing job):** [`Build / Maven (Scala 2.13, JDK 21)` → `sql#core - slow tests`](https://github.com/apache/spark/actions/runs/28048347820/job/83041196522) — `snapshotStartBatchId with transformWithState ... *** FAILED ***` (`FileNotFoundException ... 2.zip`). - **After (passing, run 10x to confirm the flake is gone):** [✅ 10/10 passed](https://github.com/HyukjinKwon/spark/actions/runs/28074724227/job/83116481122) — the test was executed 10 consecutive times in one sbt session, all green. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
