[PR] [INFRA] Share SBT precompile artifact with tpcds-1g CI job [spark]

via GitHub Fri, 29 May 2026 03:48:49 -0700


zhengruifeng opened a new pull request, #56200:
URL: https://github.com/apache/spark/pull/56200


   ### What changes were proposed in this pull request?
   
   This PR wires the `tpcds-1g` job in `.github/workflows/build_and_test.yml` 
to consume the shared `precompile` artifact, extending the pattern already 
applied to `docker-integration-tests` and `k8s-integration-tests` 
([SPARK-57069](https://issues.apache.org/jira/browse/SPARK-57069); parent 
[SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830)).
   
   Concretely:
   
   - The `precompile` job's `if:` gate is extended to also fire when `tpcds-1g 
== 'true'` in the precondition output, so the artifact is available whenever 
the job runs.
   - `tpcds-1g`:
     - `needs: precondition` -> `needs: [precondition, precompile]`
     - `if:` extended with `(!cancelled()) &&` so the job still runs if 
precompile is cancelled.
     - Adds "Download precompiled artifact" + "Extract precompiled artifact" 
steps after Java install, with graceful fallback (`continue-on-error: true`).
   
   The `tpcds-1g` job drives SBT directly via `build/sbt "sql/testOnly ..."` 
(and `build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData ..."` on a 
TPC-DS data cache miss), so it does not go through `dev/run-tests.py` and needs 
no `SKIP_SCALA_BUILD` flag -- the same situation as `k8s-integration-tests`. 
The first SBT invocation otherwise compiles `sql/core` (main + test) from 
scratch. The `precompile` job already runs `Test/package`, which compiles the 
`sql/core` test classes this job depends on (`TPCDSQueryTestSuite`, 
`TPCDSCollationQueryTestSuite`, `GenTPCDSData`, `TPCDSSchema`). Extracting the 
precompiled `target/` lets SBT skip that compile and run the test phase 
directly.
   
   ### Optional: graceful fallback if precompile fails
   
   Same pattern as the prior consumers:
   - `precompile` keeps `continue-on-error: true`.
   - The "Download precompiled artifact" step is gated on 
`needs.precompile.result == 'success'` and has `continue-on-error: true`.
   - "Extract precompiled artifact" is gated on the download succeeding and has 
`continue-on-error: true`.
   - If extraction fails or the artifact is missing, SBT compiles from scratch 
exactly as before.
   
   Worst case is degraded to the pre-PR behavior, not a workflow failure.
   
   Note: the existing `# Any TPC-DS related updates on this job need to be 
applied to tpcds-1g-gen job of benchmark.yml as well` comment refers to TPC-DS 
data-generation parameters (scale factor, `tpcds-kit` ref, `GenTPCDSData` 
args). This PR changes none of those -- it only adds build-artifact reuse, and 
`benchmark.yml` is a standalone workflow with no shared `precompile` job -- so 
no corresponding change is needed there.
   
   ### Why are the changes needed?
   
   Today every run of `build_and_test.yml` that requires `tpcds-1g` re-runs the 
same `sql/core` SBT compile that the `precompile` job already produced for 
`pyspark` / `sparkr` / `build` / docker / k8s. Wiring `tpcds-1g` to the 
existing artifact removes that duplicate compile for free (precompile is 
already running).
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI infrastructure change only.
   
   ### How was this patch tested?
   
   The change is exercised by the CI run of this PR itself. The 
Download/Extract steps log the artifact size; if the precompile job is forced 
to fail (or its artifact is missing), the job falls back to the original local 
SBT build.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [INFRA] Share SBT precompile artifact with tpcds-1g CI job [spark]

Reply via email to