zhengruifeng commented on code in PR #54130:
URL: https://github.com/apache/spark/pull/54130#discussion_r2762327344
##########
dev/spark-test-image/python-312/Dockerfile:
##########
@@ -41,27 +41,24 @@ RUN apt-get update && apt-get install -y \
libopenblas-dev \
libssl-dev \
openjdk-17-jdk-headless \
+ python3.12 \
+ python3-pip \
+ python3-psutil \
+ python3-venv \
pkg-config \
tzdata \
software-properties-common \
zlib1g-dev
-# Install Python 3.12
-RUN add-apt-repository ppa:deadsnakes/ppa
-RUN apt-get update && apt-get install -y \
- python3.12 \
- && apt-get autoremove --purge -y \
- && apt-get clean \
- && rm -rf /var/lib/apt/lists/*
-
-
ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0
scikit-learn>=1.3.2"
# Python deps for Spark Connect
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
# Install Python 3.12 packages
-RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
-RUN python3.12 -m pip install --ignore-installed 'blinker>=1.6.2' # mlflow
needs this
+ENV VIRTUAL_ENV /opt/spark-venv
+RUN python3.12 -m venv $VIRTUAL_ENV
+ENV PATH="$VIRTUAL_ENV/bin:$PATH"
Review Comment:
The "pip fail" on Ubuntu 24.04 is
due to a change that marks the system Python environment as "externally
managed" (following [PEP 668](https://peps.python.org/pep-0668/)), which
prevents pip from installing packages into the system's global Python
environment by default.
To resolve this, I tried:
1, the `--break-system-packages` flag, but still hitting some issues;
2, This PR, using a virtual environment. The is also the recommended
approach;
3, using a different image, the docker official python 3.12 image, which is
based on ubuntu 24.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]