dongjoon-hyun opened a new pull request #30253: URL: https://github.com/apache/spark/pull/30253
### What changes were proposed in this pull request? This is a backport of https://github.com/apache/spark/pull/30059 . This PR aims to use `pre-built image` at Github Action PySpark jobs. To isolate the changes, `pyspark` jobs are split from the main job. The docker image is built by the following. | Item | URL | | --------------- | ------------- | | Dockerfile | https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/Dockerfile | | Builder | https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/blob/main/.github/workflows/build.yml | | Image Location | https://hub.docker.com/r/dongjoon/apache-spark-github-action-image | Please note that. 1. The community still will use `build_and_test.yml` to add new features like as we did until now. The `Dockerfile` will be updated regularly. 2. When Apache Spark gets an official docker repository location, we will use it. 3. Also, it's the best if we keep this docker file and builder script at a new Apache Spark dev branch instead of outside GitHub repository. ### Why are the changes needed? Currently, two `pyspark` test jobs take over one and half hour always. In total, 3 hours 14 minutes. - https://github.com/apache/spark/runs/1240470628 (1 hour 35 mins) - https://github.com/apache/spark/runs/1240470634 (1 hour 39 mins) This PR will remove the package installation steps which takes 16 minutes and causes flakiness. Note that `Python 3.6 package installation` is not included in the pre-built image and it only takes `20s`. **BEFORE**  **AFTER**  In short, `pyspark` GitHub jobs take shorter time. In total, 2 hours 23 minutes (<- 3 hours 14 minutes, previously). - https://github.com/apache/spark/pull/30059/checks?check_run_id=1260512568 (1 hour 18 mins) - https://github.com/apache/spark/pull/30059/checks?check_run_id=1260512582 (1 hour 5 mins) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GitHub Action on this PR without `package installation steps`. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
