zhengruifeng opened a new pull request, #55872:
URL: https://github.com/apache/spark/pull/55872

   ### What changes were proposed in this pull request?
   
   This PR consolidates the `python-ps-minimum` Docker image and its CI 
workflow into the existing `python-minimum` image, eliminating a near-duplicate.
   
   Specifically:
   - Updates the label on `dev/spark-test-image/python-minimum/Dockerfile` to 
cover both PySpark and Pandas API on Spark.
   - Deletes `dev/spark-test-image/python-ps-minimum/Dockerfile`.
   - Deletes `.github/workflows/build_python_ps_minimum.yml`.
   - Adds `"pyspark-pandas": "true"` to 
`.github/workflows/build_python_minimum.yml` so Pandas API on Spark 
minimum-deps coverage is preserved.
   - Drops the `python-ps-minimum` entries from 
`.github/workflows/build_infra_images_cache.yml` (the `paths` trigger and the 
build/push step).
   - Removes the `build_python_ps_minimum.yml` badge from `README.md`.
   
   ### Why are the changes needed?
   
   The two Dockerfiles were nearly identical. The only functional differences 
were in `BASIC_PIP_PKGS`:
   
   | Package | python-minimum | python-ps-minimum |
   |---|---|---|
   | `numpy` | pinned `==1.22.4` | unpinned |
   | `scikit-learn` | included | omitted |
   
   Everything else (base image, apt packages, Python version, venv setup, 
`CONNECT_PIP_PKGS`) was the same. Maintaining both images doubles the 
build/cache cost and surface area without commensurate test value. Reusing 
`python-minimum` (which has the stricter pin and a superset of packages) for 
the Pandas API on Spark minimum-deps job keeps coverage while halving the image 
footprint.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI-only change.
   
   ### How was this patch tested?
   
   Existing CI. The merged `build_python_minimum.yml` now runs both `pyspark` 
and `pyspark-pandas` jobs against the `python-minimum` image.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (model: claude-opus-4-7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to