zhengruifeng opened a new pull request, #56200: URL: https://github.com/apache/spark/pull/56200
### What changes were proposed in this pull request? This PR wires the `tpcds-1g` job in `.github/workflows/build_and_test.yml` to consume the shared `precompile` artifact, extending the pattern already applied to `docker-integration-tests` and `k8s-integration-tests` ([SPARK-57069](https://issues.apache.org/jira/browse/SPARK-57069); parent [SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830)). Concretely: - The `precompile` job's `if:` gate is extended to also fire when `tpcds-1g == 'true'` in the precondition output, so the artifact is available whenever the job runs. - `tpcds-1g`: - `needs: precondition` -> `needs: [precondition, precompile]` - `if:` extended with `(!cancelled()) &&` so the job still runs if precompile is cancelled. - Adds "Download precompiled artifact" + "Extract precompiled artifact" steps after Java install, with graceful fallback (`continue-on-error: true`). The `tpcds-1g` job drives SBT directly via `build/sbt "sql/testOnly ..."` (and `build/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData ..."` on a TPC-DS data cache miss), so it does not go through `dev/run-tests.py` and needs no `SKIP_SCALA_BUILD` flag -- the same situation as `k8s-integration-tests`. The first SBT invocation otherwise compiles `sql/core` (main + test) from scratch. The `precompile` job already runs `Test/package`, which compiles the `sql/core` test classes this job depends on (`TPCDSQueryTestSuite`, `TPCDSCollationQueryTestSuite`, `GenTPCDSData`, `TPCDSSchema`). Extracting the precompiled `target/` lets SBT skip that compile and run the test phase directly. ### Optional: graceful fallback if precompile fails Same pattern as the prior consumers: - `precompile` keeps `continue-on-error: true`. - The "Download precompiled artifact" step is gated on `needs.precompile.result == 'success'` and has `continue-on-error: true`. - "Extract precompiled artifact" is gated on the download succeeding and has `continue-on-error: true`. - If extraction fails or the artifact is missing, SBT compiles from scratch exactly as before. Worst case is degraded to the pre-PR behavior, not a workflow failure. Note: the existing `# Any TPC-DS related updates on this job need to be applied to tpcds-1g-gen job of benchmark.yml as well` comment refers to TPC-DS data-generation parameters (scale factor, `tpcds-kit` ref, `GenTPCDSData` args). This PR changes none of those -- it only adds build-artifact reuse, and `benchmark.yml` is a standalone workflow with no shared `precompile` job -- so no corresponding change is needed there. ### Why are the changes needed? Today every run of `build_and_test.yml` that requires `tpcds-1g` re-runs the same `sql/core` SBT compile that the `precompile` job already produced for `pyspark` / `sparkr` / `build` / docker / k8s. Wiring `tpcds-1g` to the existing artifact removes that duplicate compile for free (precompile is already running). ### Does this PR introduce _any_ user-facing change? No. CI infrastructure change only. ### How was this patch tested? The change is exercised by the CI run of this PR itself. The Download/Extract steps log the artifact size; if the precompile job is forced to fail (or its artifact is missing), the job falls back to the original local SBT build. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
