HyukjinKwon opened a new pull request, #56676:
URL: https://github.com/apache/spark/pull/56676

   ### What changes were proposed in this pull request?
   
   `test_profile_before_sc_for_connect` creates a `ResourceProfile` over Spark 
Connect immediately
   after `SparkSession.builder.remote(...).getOrCreate()`. This PR makes the 
test wait for the Connect
   server to be ready before doing so, using the existing 
`pyspark.testing.eventually` helper to retry a
   trivial job until it succeeds:
   
   ```python
   from pyspark.testing.utils import eventually
   
   def _server_ready() -> bool:
       spark.range(1).count()
       return True
   
   eventually(timeout=120, expected_exceptions=(Exception,))(_server_ready)()
   rp.id
   ```
   
   ### Why are the changes needed?
   
   The scheduled "Build / Python-only, Connect-only (Python 3.11)" build runs 
this test in its
   `Run tests (local-cluster)` step, where the server is started with
   `start-connect-server.sh --master "local-cluster[2, 4, 1024]"`. That script 
returns before the
   local-cluster `SparkContext` is fully initialized, so the first command(s) 
issued against it can
   fail server-side. `test_connect_resources` is the first test in that step, 
so it races server
   startup and fails intermittently (~60% of runs), observed as a bare 
`java.lang.AssertionError` on
   `rp.id`, or `SparkConnectGrpcException: Application error processing RPC` on 
the first job. When the
   cluster happens to be ready, the test passes (~22-77s). Waiting for 
readiness first makes it
   deterministic.
   
   This is a test-only stabilization. The underlying server behavior (an 
internal error leaking on a
   very-early command before the context is ready) is a separate, deeper 
robustness concern and is not
   addressed here.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. Test-only change.
   
   ### How was this patch tested?
   
   Ran the scheduled workflow (`build_python_connect.yml`) on a fork. The 
Connect-only build is green
   end-to-end, including the previously-flaky `local-cluster` step, on two 
consecutive runs:
   
   - https://github.com/HyukjinKwon/spark/actions/runs/27969109059 (attempt 1 
and attempt 2: both
     `Run tests (local)` and `Run tests (local-cluster)` green)
   
   The default `build_and_test` on this branch is also green:
   https://github.com/HyukjinKwon/spark/actions/runs/27973120689
   
   Note: the Connect-only build's `Run tests (local)` step also requires the 
import fix in #56644
   (SPARK-57598); the validation runs above were performed on a branch carrying 
both changes so the
   `local-cluster` step is reached. This PR contains only the 
`test_connect_resources.py` change.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to