zhengruifeng opened a new pull request, #56118:
URL: https://github.com/apache/spark/pull/56118
### What changes were proposed in this pull request?
Add the `precompile-coursier-` cache as a restore-key fallback for the
`test`, `pyspark`, and `sparkr` jobs in
`.github/workflows/build_and_test.yml`,
so they can reuse the dependency JARs already resolved by the `precompile`
job when their own Coursier cache misses.
Concretely, each of the three jobs' `Cache Coursier local repository` step
now has these additional fallback restore-keys (existing primary key and
prefix fallback unchanged):
```yaml
restore-keys: |
<existing-prefix>-coursier-
precompile-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
precompile-coursier-
```
### Why are the changes needed?
The `precompile` job already resolves the full superset of dependencies
(it builds with `-Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud
-Phive -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver
-Pdocker-integration-tests -Pvolcano`) and populates `~/.cache/coursier`,
but writes that cache under the key prefix `precompile-coursier-`. The
downstream test jobs read from `${matrix.java}-${matrix.hadoop}-coursier-`,
`pyspark-coursier-`, and `sparkr-coursier-` respectively, so they cannot
see the precompile job's cache.
The precompile artifact tarball only bundles `target/` directories
(`.class` files and assemblies); it does not include the resolved JARs.
So when a test job's own Coursier cache is cold (new branch, modified
`pom.xml` / `plugins.sbt`), SBT and Coursier still have to re-resolve
and re-download the dependencies from scratch even though the
precompile job already downloaded them in this same workflow.
Adding the precompile cache as a restore-key fallback lets the test
jobs benefit from that work in the cold-cache case. The change is
purely additive: existing per-job caches still take precedence via the
primary key and the first restore-key entry.
### Does this PR introduce _any_ user-facing change?
No. CI-only.
### How was this patch tested?
YAML validates with `python3 -c "import yaml; yaml.safe_load(...)"`. The
effectiveness of the cache fallback can only be observed on actual GHA
runs and will be evaluated by the CI on this PR.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]