zhengruifeng opened a new pull request, #56118:
URL: https://github.com/apache/spark/pull/56118

   ### What changes were proposed in this pull request?
   
   Add the `precompile-coursier-` cache as a restore-key fallback for the
   `test`, `pyspark`, and `sparkr` jobs in 
`.github/workflows/build_and_test.yml`,
   so they can reuse the dependency JARs already resolved by the `precompile`
   job when their own Coursier cache misses.
   
   Concretely, each of the three jobs' `Cache Coursier local repository` step
   now has these additional fallback restore-keys (existing primary key and
   prefix fallback unchanged):
   
   ```yaml
   restore-keys: |
     <existing-prefix>-coursier-
     precompile-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
     precompile-coursier-
   ```
   
   ### Why are the changes needed?
   
   The `precompile` job already resolves the full superset of dependencies
   (it builds with `-Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud
   -Phive -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver
   -Pdocker-integration-tests -Pvolcano`) and populates `~/.cache/coursier`,
   but writes that cache under the key prefix `precompile-coursier-`. The
   downstream test jobs read from `${matrix.java}-${matrix.hadoop}-coursier-`,
   `pyspark-coursier-`, and `sparkr-coursier-` respectively, so they cannot
   see the precompile job's cache.
   
   The precompile artifact tarball only bundles `target/` directories
   (`.class` files and assemblies); it does not include the resolved JARs.
   So when a test job's own Coursier cache is cold (new branch, modified
   `pom.xml` / `plugins.sbt`), SBT and Coursier still have to re-resolve
   and re-download the dependencies from scratch even though the
   precompile job already downloaded them in this same workflow.
   
   Adding the precompile cache as a restore-key fallback lets the test
   jobs benefit from that work in the cold-cache case. The change is
   purely additive: existing per-job caches still take precedence via the
   primary key and the first restore-key entry.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI-only.
   
   ### How was this patch tested?
   
   YAML validates with `python3 -c "import yaml; yaml.safe_load(...)"`. The
   effectiveness of the cache fallback can only be observed on actual GHA
   runs and will be evaluated by the CI on this PR.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to