zhengruifeng opened a new pull request, #56107:
URL: https://github.com/apache/spark/pull/56107

   ### What changes were proposed in this pull request?
   
   This PR extends the SBT precompile-sharing pattern (parent: 
[SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830), pyspark: 
[SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768)) to the 
python-only macOS / ARM workflows that run via 
`.github/workflows/python_hosted_runner_test.yml`.
   
   Concretely:
   
   - New `precompile` job in `python_hosted_runner_test.yml` runs Spark's SBT 
build once on `${{ inputs.os }}`:
     ```
     ./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
       -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
       -Pdocker-integration-tests -Pvolcano \
       Test/package streaming-kinesis-asl-assembly/assembly connect/assembly 
assembly/package
     ```
     It tars every `target/` directory (excluding `./build/` and `./.git/`) 
with `tar -czf`, uploads as `spark-compile-<os>-<branch>-<run_id>` with 
`retention-days: 1`.
   - The 9 pyspark matrix entries in the same workflow add `precompile` to 
`needs:` and `if: (!cancelled())`, download/extract the artifact (with graceful 
fallback), and export `SKIP_SCALA_BUILD=true` so `dev/run-tests.py` skips 
`build_apache_spark` and `build_spark_assembly_sbt`.
   - Cache steps in the new precompile job are gated `if: ${{ runner.os != 
'macOS' }}` to match the existing TODO(SPARK-54466) pattern in this file: on 
`macos-26` the precompile runs without GHA cache; on `ubuntu-24.04-arm` it 
caches as expected.
   - Artifact name includes `${{ inputs.os }}` so the two callers 
(`build_python_3.12_macos26.yml` and `build_python_3.12_arm.yml`) cannot 
collide.
   
   This benefits both callers of the reusable workflow:
   - `.github/workflows/build_python_3.12_macos26.yml` (macos-26)
   - `.github/workflows/build_python_3.12_arm.yml` (ubuntu-24.04-arm)
   
   ### Optional: graceful fallback if precompile fails
   
   Same pattern as SPARK-56768:
   - `precompile` has `continue-on-error: true` so a failed or cancelled 
precompile does not fail the workflow run.
   - The matrix's "Download precompiled artifact" step is gated on 
`needs.precompile.result == 'success'` and itself has `continue-on-error: true`.
   - The "Extract precompiled artifact" step is gated on the download 
succeeding, and also has `continue-on-error: true`.
   - Inside the "Run tests" bash block, `SKIP_SCALA_BUILD=true` is exported 
only when `steps.extract-precompiled.outcome == 'success'`. Otherwise it stays 
unset and `dev/run-tests.py` falls back to the original local SBT build.
   
   ### Why are the changes needed?
   
   Today every one of the 9 pyspark matrix entries in 
`python_hosted_runner_test.yml` runs the same SBT build from scratch. Sharing 
the compile artifact once across the matrix avoids 8x duplicate SBT compile 
work per scheduled run of `build_python_3.12_macos26.yml` (and 
`build_python_3.12_arm.yml`). This mirrors the savings already realized for the 
Linux pyspark matrix in SPARK-56768.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI infrastructure change only.
   
   ### How was this patch tested?
   
   The change is exercised by the CI run of this PR itself. If the precompile 
job is forced to fail (or its artifact is missing), the matrix entries should 
still pass via the fallback path. The "Run tests" step logs `Reusing 
precompiled artifact, skipping local SBT build.` to make the fast path visible 
per matrix entry.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to