zhengruifeng opened a new pull request, #56393:
URL: https://github.com/apache/spark/pull/56393
### What changes were proposed in this pull request?
Run the "GitHub Pages deployment" documentation job inside the prebuilt
documentation container image
`ghcr.io/apache/spark/apache-spark-github-action-image-docs-cache:master-static`
-- the same image that the documentation job in `build_and_test.yml` builds
and runs in. That image is produced from `dev/spark-test-image/docs/Dockerfile`
and published by `build_infra_images_cache.yml`.
As a result, the following steps now come from the image and are removed
from `pages.yml`:
- `Install Python 3.11` and `Install Python dependencies` (the pinned
Sphinx/pandas/grpcio pip list)
- `Install Ruby for documentation generation`
- `Install Pandoc`
Companion changes required to build inside a container, mirroring the
documentation job in `build_and_test.yml`:
- set `LC_ALL`/`LANG` to `C.UTF-8`
- add a `git config --global --add safe.directory ${GITHUB_WORKSPACE}` step
(the doc build invokes git as root inside the container)
- run `dev/free_disk_space_container` to reclaim runner disk now that the
image also occupies it
- keep `setup-java` (Java 17) so `JAVA_HOME` is set for the Scala/SQL doc
generation, and align the Bundler install with `build_and_test.yml`
### Why are the changes needed?
`pages.yml` duplicated the documentation toolchain setup -- a long pinned
Python dependency list, Ruby, and Pandoc -- that is already captured in
`dev/spark-test-image/docs/Dockerfile` and published as a reusable image.
Reusing that image keeps the documentation dependencies in a single source of
truth, removes the duplicated install steps, and avoids reinstalling the
toolchain on every run.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This workflow only runs on push to `master` in `apache/spark` (`if:
github.repository == 'apache/spark'`), so it cannot be exercised from a fork or
a pull request. The workflow YAML was validated locally. The reused image is
the same one already proven by the documentation job in `build_and_test.yml`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (model: claude-opus-4-8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]