zhengruifeng opened a new pull request, #55879:
URL: https://github.com/apache/spark/pull/55879
### What changes were proposed in this pull request?
In `.github/workflows/build_and_test.yml`, add a step to the `precondition`
job that captures `git rev-parse HEAD` right after the apache/spark checkout,
exposes it as a `head_sha` output, and switch every downstream
`actions/checkout` from `ref: ${{ inputs.branch }}` to `ref: ${{
needs.precondition.outputs.head_sha }}`. The `precondition` job's own checkout
still resolves `inputs.branch`; the 11 downstream checkouts (`build`,
`infra-image`, `precompile`, `pyspark`, `sparkr`, `buf`, `lint`, `docs`,
`tpcds-1g`, `docker-integration-tests`, `k8s-integration-tests`) now all pin to
the same SHA.
### Why are the changes needed?
Today each `actions/checkout` step independently re-resolves `ref: ${{
inputs.branch }}` (default `master`) at the moment the runner picks it up.
Different jobs in the same workflow run can therefore end up testing different
commits.
This bites hardest on the `pyspark` matrix because those jobs `needs:
[precondition, infra-image, precompile]` and typically start ~17 minutes after
the run is created (precompile takes that long). If a commit lands on master in
that window, the `pyspark` job downloads a precompiled JAR built from the older
commit but checks out Python sources from the newer commit. When the
intervening commit adds a tightly-coupled change — new Spark Connect relation +
new proto field + new server planner + new Python tests — every
NEAREST-BY-style test then fails with:
```
[CONNECT_INVALID_PLAN.INVALID_ONE_OF_FIELD_NOT_SET]
The Spark Connect plan is invalid. This oneOf field in
spark.connect.Relation is not set: RELTYPE_NOT_SET
```
Concrete example from 2026-05-14:
- Run
[25835824862](https://github.com/apache/spark/actions/runs/25835824862)
triggered by `e19bc35c` (SPARK-56844) — `pyspark-connect` failed with 19
NEAREST BY errors.
- Run
[25835929554](https://github.com/apache/spark/actions/runs/25835929554)
triggered ~3 minutes later by the next commit `13380e78` (SPARK-56395, which
added the NEAREST BY feature) — same job passed.
The first run's `precompile` checked out `e19bc35c` (no NEAREST BY server
code), but by the time its `pyspark-connect` job actually started, master was
at `13380e78` and `actions/checkout` resolved that newer commit (with the new
Python test files). Pinning every job to the SHA `precondition` saw makes this
impossible.
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure only.
### How was this patch tested?
YAML syntax validated locally. CI will exercise the change end-to-end.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-opus-4-7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]