rvesse opened a new pull request #23613: [SPARK-26687][K8S] Fix handling of 
custom Dockerfile paths
URL: https://github.com/apache/spark/pull/23613
 
 
   ## What changes were proposed in this pull request?
   
   With the changes from @vanzin's PR #23019 (SPARK-26025) we use a pared down 
temporary Docker build context which significantly improves build times.  
However the way this is implemented leads to non-intuitive behaviour when 
supplying custom Docker file paths.  This is because of the following code 
snippets:
   
   ```
   (cd $(img_ctx_dir base) && docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
       -t $(image_ref spark) \
       -f "$BASEDOCKERFILE" .)
   ```
   
   Since the script changes to the temporary build context directory and then 
runs `docker build` there any path given for the Docker file is taken as 
relative to the temporary build context directory rather than to the directory 
where the user invoked the script.  This is rather unintuitive and produces 
somewhat unhelpful errors e.g.
   
   ```
   > ./bin/docker-image-tool.sh -r rvesse -t badpath -p 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 build
   Sending build context to Docker daemon  218.4MB
   Step 1/15 : FROM openjdk:8-alpine
    ---> 5801f7d008e5
   Step 2/15 : ARG spark_uid=185
    ---> Using cache
    ---> 5fd63df1ca39
   ...
   Successfully tagged rvesse/spark:badpath
   unable to prepare context: unable to evaluate symlinks in Dockerfile path: 
lstat 
/Users/rvesse/Documents/Work/Code/spark/target/tmp/docker/pyspark/resource-managers:
 no such file or directory
   Failed to build PySpark Docker image, please refer to Docker build output 
for details.
   ```
   
   Here we can see that the relative path that was valid where the user typed 
the command was not valid inside the build context directory.
   
   To resolve this we need to ensure that we are resolving relative paths to 
Docker files appropriately which we do by adding a `resolve_file` function to 
the script and invoking that on the supplied Docker file paths
   
   ## How was this patch tested?
   
   Validated that relative paths now work as expected:
   
   ```
   > ./bin/docker-image-tool.sh -r rvesse -t badpath -p 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 build
   Sending build context to Docker daemon  218.4MB
   Step 1/15 : FROM openjdk:8-alpine
    ---> 5801f7d008e5
   Step 2/15 : ARG spark_uid=185
    ---> Using cache
    ---> 5fd63df1ca39
   Step 3/15 : RUN set -ex &&     apk upgrade --no-cache &&     apk add 
--no-cache bash tini libc6-compat linux-pam krb5 krb5-libs &&     mkdir -p 
/opt/spark &&     mkdir -p /opt/spark/examples &&     mkdir -p 
/opt/spark/work-dir &&     touch /opt/spark/RELEASE &&     rm /bin/sh &&     ln 
-sv /bin/bash /bin/sh &&     echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su &&     chgrp root /etc/passwd && chmod ug+rw /etc/passwd
    ---> Using cache
    ---> eb0a568e032f
   Step 4/15 : COPY jars /opt/spark/jars
   ...
   Successfully tagged rvesse/spark:badpath
   Sending build context to Docker daemon  6.599MB
   Step 1/13 : ARG base_img
   Step 2/13 : ARG spark_uid=185
   Step 3/13 : FROM $base_img
    ---> 8f4fff16f903
   Step 4/13 : WORKDIR /
    ---> Running in 25466e66f27f
   Removing intermediate container 25466e66f27f
    ---> 1470b6efae61
   Step 5/13 : USER 0
    ---> Running in b094b739df37
   Removing intermediate container b094b739df37
    ---> 6a27eb4acad3
   Step 6/13 : RUN mkdir ${SPARK_HOME}/python
    ---> Running in bc8002c5b17c
   Removing intermediate container bc8002c5b17c
    ---> 19bb12f4286a
   Step 7/13 : RUN apk add --no-cache python &&     apk add --no-cache python3 
&&     python -m ensurepip &&     python3 -m ensurepip &&     rm -r 
/usr/lib/python*/ensurepip &&     pip install --upgrade pip setuptools &&     
rm -r /root/.cache
    ---> Running in 12dcba5e527f
   ...
   Successfully tagged rvesse/spark-py:badpath
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to