HyukjinKwon opened a new pull request, #56721:
URL: https://github.com/apache/spark/pull/56721

   ### What changes were proposed in this pull request?
   Replace the fixed `Thread.sleep(5000)` in the `"snapshotStartBatchId with 
transformWithState"` test of `StateDataSourceTransformWithStateSuite` with a 
deterministic `eventually(...)` wait that polls until the RocksDB snapshot 
files the `snapshotStartBatchId` reader needs (snapshot version 2 for the 
partitions read) have actually been uploaded by the asynchronous maintenance 
thread.
   
   The root cause is known, but to guard against regressions the timeout 
assertion now prints the actual state-directory contents, so a recurrence in 
scheduled jobs is immediately diagnosable (snapshot still pending vs. cleaned 
up vs. wrong path) rather than a bare failure.
   
   ### Why are the changes needed?
   The snapshot upload is asynchronous (background maintenance thread), so the 
fixed sleep is racy under CI load. When it is too short, the snapshot `.zip` is 
not yet uploaded and the reader fails on the scheduled **Maven (Scala 2.13, JDK 
21)** and **JDK 25** builds:
   
   ```
   [CANNOT_LOAD_STATE_STORE.UNCATEGORIZED] An error occurred during loading 
state.
   Caused by: java.io.FileNotFoundException: .../state/0/1/2.zip does not exist
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No. Test-only.
   
   ### How was this patch tested?
   - **Before (failing job):** [`Build / Maven (Scala 2.13, JDK 21)` → 
`sql#core - slow 
tests`](https://github.com/apache/spark/actions/runs/28048347820/job/83041196522)
 — `snapshotStartBatchId with transformWithState ... *** FAILED ***` 
(`FileNotFoundException ... 2.zip`).
   - **After (passing, run 10x to confirm the flake is gone):** [✅ 10/10 
passed](https://github.com/HyukjinKwon/spark/actions/runs/28074724227/job/83116481122)
 — the test was executed 10 consecutive times in one sbt session, all green.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   Generated-by: Claude Opus 4.8
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to