Yikun opened a new pull request, #11:
URL: https://github.com/apache/spark-docker/pull/11

   ### What changes were proposed in this pull request?
   This patch:
   - Add spark uid/gid in dockerfile. (used in entrypoint)
   - Use `spark` user in `entrypoint.sh` rather than Dockerfile. (make sure the 
spark process is executed as non-root users)
   - Remove `USER` setting in Dockerfile. (make sure base image has root to 
help developers)
   - Chown script to `spark` instead of `root`. (avoid permission issue such 
like standalone mode)
   - Add `gosu` deps, a `sudo` replacement recommanded by 
[docker](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user)
 and [docker official 
image](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency),
 and also are used by other DOI images.
   
   This change also follow the rules of docker official images, see also 
[consistency](https://github.com/docker-library/official-images/blob/9a4d54f1a42ea82970baa4e6f3d0bc75e98fc961/README.md#consistency).
   
   ### Why are the changes needed?
   
   The below issues are what I have found so far
   
   - **Irregular login username**
     Docker images username is not very standard, docker run with `185` 
username is a little bit wired.
   
     ```
     $ docker run -ti apache/spark bash
     185@d88a24357413:/opt/spark/work-dir$
     ```
   
   - **Permission issue of spark sbin**
   And also there are some permission issue when running some spark script, 
such as standalone mode:
   
     ```
     $ docker run -ti apache/spark /opt/spark/sbin/start-master.sh
     
     mkdir: cannot create directory ‘/opt/spark/logs’: Permission denied
     chown: cannot access '/opt/spark/logs': No such file or directory
     starting org.apache.spark.deploy.master.Master, logging to 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
     /opt/spark/sbin/spark-daemon.sh: line 135: 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out:
 No such file or directory
     failed to launch: nice -n 0 /opt/spark/bin/spark-class 
org.apache.spark.deploy.master.Master --host 1c345a00e312 --port 7077 
--webui-port 8080
     tail: cannot open 
'/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out'
 for reading: No such file or directory
     full log in 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-1c345a00e312.out
     ```
     
     </details>
   
   - **spark as base image case is not supported well**
     
     ```
     $ cat Dockerfile
     FROM apache/spark
     RUN apt update
     
     $  docker build -t spark-test:1015 .
     // ... 
     ------
      > [2/2] RUN apt update:
     #5 0.405 E: Could not open lock file /var/lib/apt/lists/lock - open (13: 
Permission denied)
     #5 0.405 E: Unable to lock directory /var/lib/apt/lists/
     ------
     executor failed running [/bin/sh -c apt update]: exit code: 100
     
     ```
   
   ### Does this PR introduce _any_ user-facing change?
   Yes.
   
   
   ### How was this patch tested?
   - CI passed: all k8s test
   
   - Regression test:
   ```
   # Username is set to spark rather than 185
   docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
   spark@27bbfca0a581:/opt/spark/work-dir$
   ```
   ```
   # start-master.sh no permission issue
   $ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
   
   spark@8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
   starting org.apache.spark.deploy.master.Master, logging to 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
   ```
   ```
   # Image as parent case 
   $ cat Dockerfile
   FROM spark:scala2.12-java11-python3-r-ubuntu
   RUN apt update
   $ docker build -t spark-test:1015 .
   [+] Building 7.8s (6/6) FINISHED
    => [1/2] FROM docker.io/library/spark:scala2.12-java11-python3-r-ubuntu     
                                                                                
                         0.0s
    => [2/2] RUN apt update                                                     
                                                                                
                         7.7s
   ```
   
   - Other test:
   ```
   # Test on pyspark
   $ cd spark-docker/3.3.0/scala2.12-java11-python3-r-ubuntu
   $ docker build -t spark-no-chrown:scala2.12-java11-python3-r-ubuntu .
   $ docker run -p 4040:4040 -ti spark:scala2.12-java11-python3-r-ubuntu 
/opt/spark/bin/pyspark
   ```
   
   ```
   # A simple test for `start-master.sh` (standalone mode)
   $ docker run -ti spark:scala2.12-java11-python3-r-ubuntu bash
   spark@8d1118e26766:~/work-dir$ /opt/spark/sbin/start-master.sh
   starting org.apache.spark.deploy.master.Master, logging to 
/opt/spark/logs/spark--org.apache.spark.deploy.master.Master-1-8d1118e26766.out
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to