zhengruifeng opened a new pull request, #48520: URL: https://github.com/apache/spark/pull/48520
### What changes were proposed in this pull request? Add a separate docker file for doc build ### Why are the changes needed? currently we only have single test image, for `pyspark`, `sparkr`, `lint` and `docs`, in has two major issues: 1, disk space limitation: we are adding more and more packages in it, the disk space left for testing is very limited, and cause `No space left on device` from time to time; 2, environment conflicts: for example, even though we already install some packages for `docs` in the docker file, we still need to install some additional python packages in `build_and_test`, due to the conflicts between `docs` and `pyspark`. It is hard to maintain because the related packages are installed in two different places. so I am think spin off some installations (e.g. `docs`) from the base image, so that: 1, we can completely cache all the dependencies for `docs`; 2, the related installations are centralized; 3, we can free up disk space on the base image (after we spin off other dependency, we can remove unneeded packages from it); Furthermore, if we want to apply multiple images, we can easily support different environments, e.g. adding a separate image for old versions of `pandas/pyarrow/etc`. ### Does this PR introduce _any_ user-facing change? no, infra-only ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
