[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20192 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r161020380 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile --- @@ -1,35 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -FROM spark-base - -# Before building the docker image, first build and make a Spark distribution following -# the instructions in http://spark.apache.org/docs/latest/building-spark.html. -# If this docker file is being used in the context of building your images from a Spark -# distribution, the docker build command should be invoked from the top level directory -# of the Spark distribution. E.g.: -# docker build -t spark-executor:latest -f kubernetes/dockerfiles/executor/Dockerfile . - -COPY examples /opt/spark/examples - -CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \ -env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \ -readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \ -if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \ -if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \ --- End diff -- Yes you do. The submission code sets `SPARK_CLASSPATH` to `spark.driver.extraClassPath` in the driver case, and to `spark.executor.extraClassPath` in the executor case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160874070 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile --- @@ -1,35 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -FROM spark-base - -# Before building the docker image, first build and make a Spark distribution following -# the instructions in http://spark.apache.org/docs/latest/building-spark.html. -# If this docker file is being used in the context of building your images from a Spark -# distribution, the docker build command should be invoked from the top level directory -# of the Spark distribution. E.g.: -# docker build -t spark-executor:latest -f kubernetes/dockerfiles/executor/Dockerfile . - -COPY examples /opt/spark/examples - -CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \ -env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \ -readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \ -if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \ -if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \ --- End diff -- to clarify, by that I mean we no longer have the ability to customize different classpath for executor and driver. for reference, see spark.driver.extraClassPath vs spark.executor.extraClassPath --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160874300 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -0,0 +1,97 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# echo commands to the terminal output +set -ex + +# Check whether there is a passwd entry for the container UID +myuid=$(id -u) +mygid=$(id -g) +uidentry=$(getent passwd $myuid) + +# If there is no passwd entry for the container UID, attempt to create one +if [ -z "$uidentry" ] ; then +if [ -w /etc/passwd ] ; then +echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd +else +echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID" +fi +fi + +SPARK_K8S_CMD="$1" +if [ -z "$SPARK_K8S_CMD" ]; then + echo "No command to execute has been provided." 1>&2 --- End diff -- we can revisit when we have proper client support. overriding the entrypoint won't do if I want everything else set (eg. SPARK_CLASSPATH) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160512758 --- Diff: docs/running-on-kubernetes.md --- @@ -56,14 +56,13 @@ be run in a container runtime environment that Kubernetes supports. Docker is a frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized and built for your usage. --- End diff -- OK, I am fine with adding this in a future change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160512231 --- Diff: docs/running-on-kubernetes.md --- @@ -56,14 +56,13 @@ be run in a container runtime environment that Kubernetes supports. Docker is a frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized and built for your usage. --- End diff -- Still, that sounds like something that should be added in a separate change. I'm not changing the customizability of images in this change. And not having a contract means people will have no idea of how to customize images, so you can't even write proper documentation for that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160511180 --- Diff: docs/running-on-kubernetes.md --- @@ -56,14 +56,13 @@ be run in a container runtime environment that Kubernetes supports. Docker is a frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized and built for your usage. --- End diff -- I agree that we don't have a solid story around customizing images here. But I do think that we need something clearly telling people that we do support using custom images if they want to and the properties they should use to configure custom images. It just doesn't need to be opinionated on things like the contact you mentioned. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160504833 --- Diff: docs/running-on-kubernetes.md --- @@ -56,14 +56,13 @@ be run in a container runtime environment that Kubernetes supports. Docker is a frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized and built for your usage. --- End diff -- Separate change. I don't even know what you'd write there. The whole "custom image" thing needs to be properly specified first - what exactly is the contract between the submission code and the images, for example. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160504887 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala --- @@ -75,8 +73,8 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite { test("Submission steps with an init-container.") { val sparkConf = new SparkConf(false) - .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE) - .set(INIT_CONTAINER_IMAGE, IC_IMAGE) + .set(CONTAINER_IMAGE, DRIVER_IMAGE) + .set(INIT_CONTAINER_IMAGE.key, IC_IMAGE) --- End diff -- Yes, the test is checking different values for the default and init container images. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160503103 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -29,17 +29,23 @@ private[spark] object Config extends Logging { .stringConf .createWithDefault("default") + val CONTAINER_IMAGE = +ConfigBuilder("spark.kubernetes.container.image") + .doc("Container image to use for Spark containers. Individual container types " + +"(e.g. driver or executor) can also be configured to use different images if desired, " + +"by setting the container-specific image name.") --- End diff -- Why would I mention just one specific way of overriding this? I also have half a mind to just remove this since this documentation is not visible anywhere... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160502618 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -0,0 +1,97 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# echo commands to the terminal output +set -ex + +# Check whether there is a passwd entry for the container UID +myuid=$(id -u) +mygid=$(id -g) +uidentry=$(getent passwd $myuid) + +# If there is no passwd entry for the container UID, attempt to create one +if [ -z "$uidentry" ] ; then +if [ -w /etc/passwd ] ; then +echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd +else +echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID" +fi +fi + +SPARK_K8S_CMD="$1" +if [ -z "$SPARK_K8S_CMD" ]; then + echo "No command to execute has been provided." 1>&2 --- End diff -- You can do that with `docker container create --entrypoint blah`, right? Otherwise you have to add code here to specify what command to run when no arguments are provided. I'd rather have a proper error, since the entry point is tightly coupled with the submission code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160502410 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile --- @@ -41,7 +41,8 @@ COPY ${spark_jars} /opt/spark/jars COPY bin /opt/spark/bin COPY sbin /opt/spark/sbin COPY conf /opt/spark/conf -COPY ${img_path}/spark-base/entrypoint.sh /opt/ +COPY ${img_path}/spark/entrypoint.sh /opt/ +COPY examples /opt/spark/examples --- End diff -- Didn't know about that directory, but sounds like it should be added. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160502356 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile --- @@ -1,35 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -FROM spark-base - -# Before building the docker image, first build and make a Spark distribution following -# the instructions in http://spark.apache.org/docs/latest/building-spark.html. -# If this docker file is being used in the context of building your images from a Spark -# distribution, the docker build command should be invoked from the top level directory -# of the Spark distribution. E.g.: -# docker build -t spark-executor:latest -f kubernetes/dockerfiles/executor/Dockerfile . - -COPY examples /opt/spark/examples - -CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \ -env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \ -readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \ -if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \ -if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \ --- End diff -- The difference is handled in the submission code; `SPARK_CLASSPATH` is set to the appropriate value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160324121 --- Diff: docs/running-on-kubernetes.md --- @@ -56,14 +56,13 @@ be run in a container runtime environment that Kubernetes supports. Docker is a frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized and built for your usage. --- End diff -- I think it worths having a dedicated sub-section here on how to use custom images, e.g., named `Customizing Container Images`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160323223 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala --- @@ -75,8 +73,8 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite { test("Submission steps with an init-container.") { val sparkConf = new SparkConf(false) - .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE) - .set(INIT_CONTAINER_IMAGE, IC_IMAGE) + .set(CONTAINER_IMAGE, DRIVER_IMAGE) + .set(INIT_CONTAINER_IMAGE.key, IC_IMAGE) --- End diff -- Do you still need to set this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160324036 --- Diff: docs/running-on-kubernetes.md --- @@ -56,14 +56,13 @@ be run in a container runtime environment that Kubernetes supports. Docker is a frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized --- End diff -- It should be called out that a single Dockerfile is shipped with Spark. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160322848 --- Diff: bin/docker-image-tool.sh --- @@ -60,35 +42,53 @@ function image_ref { } function build { - docker build \ ---build-arg "spark_jars=$SPARK_JARS" \ ---build-arg "img_path=$IMG_PATH" \ --t spark-base \ --f "$IMG_PATH/spark-base/Dockerfile" . - for image in "${!path[@]}"; do -docker build -t "$(image_ref $image)" -f ${path[$image]} . - done + # Detect whether this is a git clone or a Spark distribution and adjust paths + # accordingly. + local BUILD_ARGS + local IMG_PATH + + if [ ! -f "$SPARK_HOME/RELEASE" ]; then +IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles +BUILD_ARGS=( + --build-arg + img_path=$IMG_PATH + --build-arg + spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars +) + else +# Not passed as an argument to docker, but used to validate the Spark directory. +IMG_PATH="kubernetes/dockerfiles" + fi + + local DOCKERFILE=${DOCKERFILE:-"$IMG_PATH/spark/Dockerfile"} + + if [ ! -d "$IMG_PATH" ]; then +error "Cannot find docker images. This script must be run from a runnable distribution of Apache Spark." + fi + + docker build "${BUILD_ARGS[@]}" \ +-t $(image_ref spark) \ +-f "$DOCKERFILE" . } function push { - for image in "${!path[@]}"; do -docker push "$(image_ref $image)" - done + docker push "$(image_ref spark)" } function usage { cat <
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160323806 --- Diff: bin/docker-image-tool.sh --- @@ -60,35 +42,53 @@ function image_ref { } function build { - docker build \ ---build-arg "spark_jars=$SPARK_JARS" \ ---build-arg "img_path=$IMG_PATH" \ --t spark-base \ --f "$IMG_PATH/spark-base/Dockerfile" . - for image in "${!path[@]}"; do -docker build -t "$(image_ref $image)" -f ${path[$image]} . - done + # Detect whether this is a git clone or a Spark distribution and adjust paths --- End diff -- "paths" => "values of the following variables?. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160316799 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -0,0 +1,97 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# echo commands to the terminal output +set -ex + +# Check whether there is a passwd entry for the container UID +myuid=$(id -u) +mygid=$(id -g) +uidentry=$(getent passwd $myuid) + +# If there is no passwd entry for the container UID, attempt to create one +if [ -z "$uidentry" ] ; then +if [ -w /etc/passwd ] ; then +echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd +else +echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID" +fi +fi + +SPARK_K8S_CMD="$1" +if [ -z "$SPARK_K8S_CMD" ]; then + echo "No command to execute has been provided." 1>&2 + exit 1 +fi +shift 1 + +SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*" +env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt +readarray -t SPARK_DRIVER_JAVA_OPTS < /tmp/java_opts.txt +if [ -n "$SPARK_MOUNTED_CLASSPATH" ]; then + SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_MOUNTED_CLASSPATH" +fi +if [ -n "$SPARK_MOUNTED_FILES_DIR" ]; then + cp -R "$SPARK_MOUNTED_FILES_DIR/." . +fi + +case "$SPARK_K8S_CMD" in + driver) +CMD=( + ${JAVA_HOME}/bin/java + "${SPARK_DRIVER_JAVA_OPTS[@]}" + -cp "$SPARK_CLASSPATH" + -Xms$SPARK_DRIVER_MEMORY + -Xmx$SPARK_DRIVER_MEMORY + -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS + $SPARK_DRIVER_CLASS + $SPARK_DRIVER_ARGS +) +;; + + executor) +CMD=( + ${JAVA_HOME}/bin/java + "${SPARK_EXECUTOR_JAVA_OPTS[@]}" + -Xms$SPARK_EXECUTOR_MEMORY + -Xmx$SPARK_EXECUTOR_MEMORY + -cp "$SPARK_CLASSPATH" + org.apache.spark.executor.CoarseGrainedExecutorBackend + --driver-url $SPARK_DRIVER_URL + --executor-id $SPARK_EXECUTOR_ID + --cores $SPARK_EXECUTOR_CORES + --app-id $SPARK_APPLICATION_ID + --hostname $SPARK_EXECUTOR_POD_IP +) +;; + + init) +CMD=( + "/opt/spark/bin/spark-class" --- End diff -- shouldn't this be under `SPARK_HOME`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160316427 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile --- @@ -1,35 +0,0 @@ -# -# Licensed to the Apache Software Foundation (ASF) under one or more -# contributor license agreements. See the NOTICE file distributed with -# this work for additional information regarding copyright ownership. -# The ASF licenses this file to You under the Apache License, Version 2.0 -# (the "License"); you may not use this file except in compliance with -# the License. You may obtain a copy of the License at -# -#http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -FROM spark-base - -# Before building the docker image, first build and make a Spark distribution following -# the instructions in http://spark.apache.org/docs/latest/building-spark.html. -# If this docker file is being used in the context of building your images from a Spark -# distribution, the docker build command should be invoked from the top level directory -# of the Spark distribution. E.g.: -# docker build -t spark-executor:latest -f kubernetes/dockerfiles/executor/Dockerfile . - -COPY examples /opt/spark/examples - -CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \ -env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \ -readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \ -if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \ -if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \ --- End diff -- different classpath addition for different role has been taken out? SPARK_EXECUTOR_EXTRA_CLASSPATH SPARK_SUBMIT_EXTRA_CLASSPATH --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160316166 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh --- @@ -0,0 +1,97 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# echo commands to the terminal output +set -ex + +# Check whether there is a passwd entry for the container UID +myuid=$(id -u) +mygid=$(id -g) +uidentry=$(getent passwd $myuid) + +# If there is no passwd entry for the container UID, attempt to create one +if [ -z "$uidentry" ] ; then +if [ -w /etc/passwd ] ; then +echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" >> /etc/passwd +else +echo "Container ENTRYPOINT failed to add passwd entry for anonymous UID" +fi +fi + +SPARK_K8S_CMD="$1" +if [ -z "$SPARK_K8S_CMD" ]; then + echo "No command to execute has been provided." 1>&2 --- End diff -- this starting container without a command, could be very useful for in-cluster client, or just ad hoc testing outside of k8s --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160316590 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -29,17 +29,23 @@ private[spark] object Config extends Logging { .stringConf .createWithDefault("default") + val CONTAINER_IMAGE = +ConfigBuilder("spark.kubernetes.container.image") + .doc("Container image to use for Spark containers. Individual container types " + +"(e.g. driver or executor) can also be configured to use different images if desired, " + +"by setting the container-specific image name.") --- End diff -- nit: `container-specific` => `container-type-specific` and add `" eg. spark.kubernetes.driver.container.image"`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20192#discussion_r160315955 --- Diff: resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile --- @@ -41,7 +41,8 @@ COPY ${spark_jars} /opt/spark/jars COPY bin /opt/spark/bin COPY sbin /opt/spark/sbin COPY conf /opt/spark/conf -COPY ${img_path}/spark-base/entrypoint.sh /opt/ +COPY ${img_path}/spark/entrypoint.sh /opt/ +COPY examples /opt/spark/examples --- End diff -- a lot of examples depends on data/, should that be included too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/20192 [SPARK-22994][k8s] Use a single image for all Spark containers. This change allows a user to submit a Spark application on kubernetes having to provide a single image, instead of one image for each type of container. The image's entry point now takes an extra argument that identifies the process that is being started. The configuration still allows the user to provide different images for each container type if they so desire. On top of that, the entry point was simplified a bit to share more code; mainly, the same env variable is used to propagate the user-defined classpath to the different containers. Aside from being modified to match the new behavior, the 'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh' to more closely match its purpose; the old name was a little awkward and now also not entirely correct, since there is a single image. It was also moved to 'bin' since it's not necessarily an admin tool. Docs have been updated to match the new behavior. Tested locally with minikube. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-22994 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20192.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20192 commit bfad831de9ddd330674d73ccca759c103af32f20 Author: Marcelo VanzinDate: 2018-01-06T00:50:19Z [SPARK-22994][k8s] Use a single image for all Spark containers. This change allows a user to submit a Spark application on kubernetes having to provide a single image, instead of one image for each type of container. The image's entry point now takes an extra argument that identifies the process that is being started. The configuration still allows the user to provide different images for each container type if they so desire. On top of that, the entry point was simplified a bit to share more code; mainly, the same env variable is used to propagate the user-defined classpath to the different containers. Aside from being modified to match the new behavior, the 'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh' to more closely match its purpose; the old name was a little awkward and now also not entirely correct, since there is a single image. It was also moved to 'bin' since it's not necessarily an admin tool. Docs and scripts have been updated to match the new behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org