zhengruifeng commented on code in PR #54130:
URL: https://github.com/apache/spark/pull/54130#discussion_r2762327344


##########
dev/spark-test-image/python-312/Dockerfile:
##########
@@ -41,27 +41,24 @@ RUN apt-get update && apt-get install -y \
     libopenblas-dev \
     libssl-dev \
     openjdk-17-jdk-headless \
+    python3.12 \
+    python3-pip \
+    python3-psutil \
+    python3-venv \
     pkg-config \
     tzdata \
     software-properties-common \
     zlib1g-dev
 
-# Install Python 3.12
-RUN add-apt-repository ppa:deadsnakes/ppa
-RUN apt-get update && apt-get install -y \
-    python3.12 \
-    && apt-get autoremove --purge -y \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/*
-
-
 ARG BASIC_PIP_PKGS="numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy 
plotly<6.0.0 mlflow>=2.8.1 coverage matplotlib openpyxl memory-profiler>=0.61.0 
scikit-learn>=1.3.2"
 # Python deps for Spark Connect
 ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 protobuf==6.33.5 
googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20.3"
 
 # Install Python 3.12 packages
-RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12
-RUN python3.12 -m pip install --ignore-installed 'blinker>=1.6.2' # mlflow 
needs this
+ENV VIRTUAL_ENV /opt/spark-venv
+RUN python3.12 -m venv $VIRTUAL_ENV
+ENV PATH="$VIRTUAL_ENV/bin:$PATH"

Review Comment:
   The "pip fail" on Ubuntu 24.04 is
   due to a change that marks the system Python environment as "externally 
managed" (following [PEP 668](https://peps.python.org/pep-0668/)), which 
prevents pip from installing packages into the system's global Python 
environment by default.
   
   To resolve this, I tried:
   1, the `--break-system-packages` flag, but still hitting some issues;
   
   2, This PR, using a virtual environment. The is also the recommended 
approach;
   
   3, using a different image, the Docker Official Python 3.12 image, which is 
based on ubuntu 24.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to