zhengruifeng opened a new pull request, #55879:
URL: https://github.com/apache/spark/pull/55879

   ### What changes were proposed in this pull request?
   
   In `.github/workflows/build_and_test.yml`, add a step to the `precondition` 
job that captures `git rev-parse HEAD` right after the apache/spark checkout, 
exposes it as a `head_sha` output, and switch every downstream 
`actions/checkout` from `ref: ${{ inputs.branch }}` to `ref: ${{ 
needs.precondition.outputs.head_sha }}`. The `precondition` job's own checkout 
still resolves `inputs.branch`; the 11 downstream checkouts (`build`, 
`infra-image`, `precompile`, `pyspark`, `sparkr`, `buf`, `lint`, `docs`, 
`tpcds-1g`, `docker-integration-tests`, `k8s-integration-tests`) now all pin to 
the same SHA.
   
   ### Why are the changes needed?
   
   Today each `actions/checkout` step independently re-resolves `ref: ${{ 
inputs.branch }}` (default `master`) at the moment the runner picks it up. 
Different jobs in the same workflow run can therefore end up testing different 
commits.
   
   This bites hardest on the `pyspark` matrix because those jobs `needs: 
[precondition, infra-image, precompile]` and typically start ~17 minutes after 
the run is created (precompile takes that long). If a commit lands on master in 
that window, the `pyspark` job downloads a precompiled JAR built from the older 
commit but checks out Python sources from the newer commit. When the 
intervening commit adds a tightly-coupled change — new Spark Connect relation + 
new proto field + new server planner + new Python tests — every 
NEAREST-BY-style test then fails with:
   
   ```
   [CONNECT_INVALID_PLAN.INVALID_ONE_OF_FIELD_NOT_SET]
   The Spark Connect plan is invalid. This oneOf field in 
spark.connect.Relation is not set: RELTYPE_NOT_SET
   ```
   
   Concrete example from 2026-05-14:
   - Run 
[25835824862](https://github.com/apache/spark/actions/runs/25835824862) 
triggered by `e19bc35c` (SPARK-56844) — `pyspark-connect` failed with 19 
NEAREST BY errors.
   - Run 
[25835929554](https://github.com/apache/spark/actions/runs/25835929554) 
triggered ~3 minutes later by the next commit `13380e78` (SPARK-56395, which 
added the NEAREST BY feature) — same job passed.
   
   The first run's `precompile` checked out `e19bc35c` (no NEAREST BY server 
code), but by the time its `pyspark-connect` job actually started, master was 
at `13380e78` and `actions/checkout` resolved that newer commit (with the new 
Python test files). Pinning every job to the SHA `precondition` saw makes this 
impossible.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI infrastructure only.
   
   ### How was this patch tested?
   
   YAML syntax validated locally. CI will exercise the change end-to-end.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (claude-opus-4-7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to