zhengruifeng opened a new pull request, #56110:
URL: https://github.com/apache/spark/pull/56110

   ### What changes were proposed in this pull request?
   
   This PR extends the SBT precompile-sharing pattern (parent: 
[SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830); prior 
sub-tasks: [SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768) 
pyspark, [SPARK-56831](https://issues.apache.org/jira/browse/SPARK-56831) 
sparkr, [SPARK-56943](https://issues.apache.org/jira/browse/SPARK-56943) JVM 
build) to the two remaining SBT-compiling jobs in 
`.github/workflows/build_and_test.yml` that still run their own full Spark 
compile:
   
   - `docker-integration-tests`
   - `k8s-integration-tests`
   
   Concretely:
   
   - The existing `precompile` job's `if:` gate is extended to also fire when 
`docker-integration-tests == 'true'` or `k8s-integration-tests == 'true'` in 
the precondition output, so the artifact is available whenever either job needs 
it.
   - `docker-integration-tests`:
     - `needs: precondition` -> `needs: [precondition, precompile]`
     - `if:` extended with `(!cancelled()) &&` so the job still runs if 
precompile is cancelled.
     - Adds "Download precompiled artifact" + "Extract precompiled artifact" 
steps between Java setup and `Run tests`, with graceful fallback 
(`continue-on-error: true`).
     - `Run tests` exports `SKIP_SCALA_BUILD=true` when extraction succeeded; 
`dev/run-tests.py` already honors this flag and skips `build_apache_spark` + 
`build_spark_assembly_sbt`.
   - `k8s-integration-tests`:
     - Same `needs:` and `if:` change.
     - Adds the same Download/Extract steps after Java setup.
     - The actual test runs via a direct `build/sbt ... 
"kubernetes-integration-tests/test"` call rather than `dev/run-tests.py`, so no 
`SKIP_SCALA_BUILD` is set. SBT sees the extracted `target/` and skips 
compilation of the already-built modules (Spark Core, SQL, etc.); only the 
`kubernetes-integration-tests` test module itself compiles incrementally.
   
   ### Optional: graceful fallback if precompile fails
   
   Same pattern as the prior sub-tasks:
   - `precompile` keeps `continue-on-error: true`.
   - Both consumers' "Download precompiled artifact" step is gated on 
`needs.precompile.result == 'success'` and has `continue-on-error: true`.
   - "Extract precompiled artifact" is gated on the download succeeding and has 
`continue-on-error: true`.
   - For docker, `SKIP_SCALA_BUILD=true` is exported only when 
`steps.extract-precompiled.outcome == 'success'`; otherwise `dev/run-tests.py` 
runs the original local SBT build.
   - For k8s, if extraction fails, SBT compiles from scratch as before.
   
   Worst case is degraded to the pre-PR behavior, not a workflow failure.
   
   ### Profile coverage
   
   The precompile job runs:
   ```
   ./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
     -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
     -Pdocker-integration-tests -Pvolcano \
     Test/package streaming-kinesis-asl-assembly/assembly connect/assembly 
assembly/package
   ```
   
   - `docker-integration-tests`: profile is in the precompile invocation; the 
module's `target/` is pre-built, so `dev/run-tests --modules 
docker-integration-tests` only runs the test phase.
   - `k8s-integration-tests`: `-Pkubernetes` is in the precompile so the parent 
module is pre-built. The job itself adds `-Pkubernetes-integration-tests` to 
enable the integration test submodule, which SBT compiles incrementally on top 
of the reused `target/`. Net work in this job drops from "compile all of Spark 
+ integration tests" to "compile only the integration-tests module".
   
   ### Why are the changes needed?
   
   Today every scheduled / dispatched run of `build_and_test.yml` that requires 
`docker-integration-tests` or `k8s-integration-tests` re-runs the same SBT 
compile that `precompile` already produced for `pyspark` / `sparkr` / `build`. 
Wiring these two consumers to the existing artifact removes that duplicate work 
for free (precompile is already running).
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI infrastructure change only.
   
   ### How was this patch tested?
   
   The change is exercised by the CI run of this PR itself. The 
Download/Extract steps log artifact size; the Run tests step prints `Reusing 
precompiled artifact, skipping local SBT build.` for the docker job when the 
fast path is taken. If the precompile job is forced to fail (or its artifact is 
missing), both consumers fall back to the original local SBT build.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to