HyukjinKwon opened a new pull request, #56729: URL: https://github.com/apache/spark/pull/56729
### What changes were proposed in this pull request? In `SparkConnectListenerBusListener.send`, retry `responseObserver.onNext` a small bounded number of times before tearing the listener down, instead of removing it on the first exception. Also add diagnostic context to the `ClientStreamingQuerySuite."listener events"` assertions and make the test listener's fields `@volatile`. ### Why are the changes needed? `ClientStreamingQuerySuite."listener events"` is flaky: the `QueryStarted`/`QueryProgress` events arrive but the terminal `QueryTerminatedEvent` is never received even though the query has stopped. The server-side listener removes itself and stops sending **all** further events on the first `onNext` failure, so a single transient gRPC hiccup on a frequent progress event silently drops the later terminate event. A bounded retry keeps the listener alive across transient failures while still cleaning up when the client is genuinely unresponsive. The connect server runs in a separate process whose logs are not captured in CI, so the exact failure is inferred; the added assertion diagnostics (`diag(stage)`) surface the client-side state if this test ever flakes again in a scheduled job, to confirm/refine the root cause. **Before (failing in apache/spark CI):** `listener events` 90s timeout, `terminate` empty — https://github.com/apache/spark/actions/runs/28004202389/job/82884598238 **After (this change, validated on a fork):** full connect module green and `ClientStreamingQuerySuite."listener events"` re-run 8x with 0 failures — https://github.com/HyukjinKwon/spark/actions/runs/28074772169 ### Does this PR introduce any user-facing change? No. Server hardening + test diagnostics only. ### How was this patch tested? Re-ran the full connect module and `ClientStreamingQuerySuite."listener events"` 8x on CI (link above); all green. Existing `SparkConnectListenerBusListenerSuite` onNext-throw tests still pass. ### Was this patch authored or co-authored using generative AI tooling? Yes, drafted with Claude Code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
