[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20192


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-11 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r161020380
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile ---
@@ -1,35 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# Before building the docker image, first build and make a Spark 
distribution following
-# the instructions in 
http://spark.apache.org/docs/latest/building-spark.html.
-# If this docker file is being used in the context of building your images 
from a Spark
-# distribution, the docker build command should be invoked from the top 
level directory
-# of the Spark distribution. E.g.:
-# docker build -t spark-executor:latest -f 
kubernetes/dockerfiles/executor/Dockerfile .
-
-COPY examples /opt/spark/examples
-
-CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
-env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt && \
-readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
-if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then 
SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
-if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then 
SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
--- End diff --

Yes you do.

The submission code sets `SPARK_CLASSPATH` to `spark.driver.extraClassPath` 
in the driver case, and to `spark.executor.extraClassPath` in the executor case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160874070
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile ---
@@ -1,35 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# Before building the docker image, first build and make a Spark 
distribution following
-# the instructions in 
http://spark.apache.org/docs/latest/building-spark.html.
-# If this docker file is being used in the context of building your images 
from a Spark
-# distribution, the docker build command should be invoked from the top 
level directory
-# of the Spark distribution. E.g.:
-# docker build -t spark-executor:latest -f 
kubernetes/dockerfiles/executor/Dockerfile .
-
-COPY examples /opt/spark/examples
-
-CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
-env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt && \
-readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
-if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then 
SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
-if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then 
SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
--- End diff --

to clarify, by that I mean we no longer have the ability to customize 
different classpath for executor and driver.
for reference, see spark.driver.extraClassPath vs 
spark.executor.extraClassPath


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-10 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160874300
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -0,0 +1,97 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# echo commands to the terminal output
+set -ex
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+uidentry=$(getent passwd $myuid)
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+if [ -w /etc/passwd ] ; then
+echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" 
>> /etc/passwd
+else
+echo "Container ENTRYPOINT failed to add passwd entry for 
anonymous UID"
+fi
+fi
+
+SPARK_K8S_CMD="$1"
+if [ -z "$SPARK_K8S_CMD" ]; then
+  echo "No command to execute has been provided." 1>&2
--- End diff --

we can revisit when we have proper client support.
overriding the entrypoint won't do if I want everything else set (eg. 
SPARK_CLASSPATH)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160512758
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -56,14 +56,13 @@ be run in a container runtime environment that 
Kubernetes supports. Docker is a
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
 and built for your usage.
--- End diff --

OK, I am fine with adding this in a future change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160512231
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -56,14 +56,13 @@ be run in a container runtime environment that 
Kubernetes supports. Docker is a
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
 and built for your usage.
--- End diff --

Still, that sounds like something that should be added in a separate 
change. I'm not changing the customizability of images in this change.

And not having a contract means people will have no idea of how to 
customize images, so you can't even write proper documentation for that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160511180
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -56,14 +56,13 @@ be run in a container runtime environment that 
Kubernetes supports. Docker is a
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
 and built for your usage.
--- End diff --

I agree that we don't have a solid story around customizing images here. 
But I do think that we need something clearly telling people that we do support 
using custom images if they want to and the properties they should use to 
configure custom images. It just doesn't need to be opinionated on things like 
the contact you mentioned. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160504833
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -56,14 +56,13 @@ be run in a container runtime environment that 
Kubernetes supports. Docker is a
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
 and built for your usage.
--- End diff --

Separate change. I don't even know what you'd write there. The whole 
"custom image" thing needs to be properly specified first - what exactly is the 
contract between the submission code and the images, for example.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160504887
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala
 ---
@@ -75,8 +73,8 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite 
{
 
   test("Submission steps with an init-container.") {
 val sparkConf = new SparkConf(false)
-  .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
-  .set(INIT_CONTAINER_IMAGE, IC_IMAGE)
+  .set(CONTAINER_IMAGE, DRIVER_IMAGE)
+  .set(INIT_CONTAINER_IMAGE.key, IC_IMAGE)
--- End diff --

Yes, the test is checking different values for the default and init 
container images.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160503103
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -29,17 +29,23 @@ private[spark] object Config extends Logging {
   .stringConf
   .createWithDefault("default")
 
+  val CONTAINER_IMAGE =
+ConfigBuilder("spark.kubernetes.container.image")
+  .doc("Container image to use for Spark containers. Individual 
container types " +
+"(e.g. driver or executor) can also be configured to use different 
images if desired, " +
+"by setting the container-specific image name.")
--- End diff --

Why would I mention just one specific way of overriding this?

I also have half a mind to just remove this since this documentation is not 
visible anywhere...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160502618
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -0,0 +1,97 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# echo commands to the terminal output
+set -ex
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+uidentry=$(getent passwd $myuid)
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+if [ -w /etc/passwd ] ; then
+echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" 
>> /etc/passwd
+else
+echo "Container ENTRYPOINT failed to add passwd entry for 
anonymous UID"
+fi
+fi
+
+SPARK_K8S_CMD="$1"
+if [ -z "$SPARK_K8S_CMD" ]; then
+  echo "No command to execute has been provided." 1>&2
--- End diff --

You can do that with `docker container create --entrypoint blah`, right? 
Otherwise you have to add code here to specify what command to run when no 
arguments are provided. I'd rather have a proper error, since the entry point 
is tightly coupled with the submission code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160502410
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ---
@@ -41,7 +41,8 @@ COPY ${spark_jars} /opt/spark/jars
 COPY bin /opt/spark/bin
 COPY sbin /opt/spark/sbin
 COPY conf /opt/spark/conf
-COPY ${img_path}/spark-base/entrypoint.sh /opt/
+COPY ${img_path}/spark/entrypoint.sh /opt/
+COPY examples /opt/spark/examples
--- End diff --

Didn't know about that directory, but sounds like it should be added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160502356
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile ---
@@ -1,35 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# Before building the docker image, first build and make a Spark 
distribution following
-# the instructions in 
http://spark.apache.org/docs/latest/building-spark.html.
-# If this docker file is being used in the context of building your images 
from a Spark
-# distribution, the docker build command should be invoked from the top 
level directory
-# of the Spark distribution. E.g.:
-# docker build -t spark-executor:latest -f 
kubernetes/dockerfiles/executor/Dockerfile .
-
-COPY examples /opt/spark/examples
-
-CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
-env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt && \
-readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
-if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then 
SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
-if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then 
SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
--- End diff --

The difference is handled in the submission code; `SPARK_CLASSPATH` is set 
to the appropriate value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160324121
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -56,14 +56,13 @@ be run in a container runtime environment that 
Kubernetes supports. Docker is a
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
 and built for your usage.
--- End diff --

I think it worths having a dedicated sub-section here on how to use custom 
images, e.g., named `Customizing Container Images`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160323223
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/DriverConfigOrchestratorSuite.scala
 ---
@@ -75,8 +73,8 @@ class DriverConfigOrchestratorSuite extends SparkFunSuite 
{
 
   test("Submission steps with an init-container.") {
 val sparkConf = new SparkConf(false)
-  .set(DRIVER_CONTAINER_IMAGE, DRIVER_IMAGE)
-  .set(INIT_CONTAINER_IMAGE, IC_IMAGE)
+  .set(CONTAINER_IMAGE, DRIVER_IMAGE)
+  .set(INIT_CONTAINER_IMAGE.key, IC_IMAGE)
--- End diff --

Do you still need to set this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160324036
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -56,14 +56,13 @@ be run in a container runtime environment that 
Kubernetes supports. Docker is a
 frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
--- End diff --

It should be called out that a single Dockerfile is shipped with Spark.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160322848
  
--- Diff: bin/docker-image-tool.sh ---
@@ -60,35 +42,53 @@ function image_ref {
 }
 
 function build {
-  docker build \
---build-arg "spark_jars=$SPARK_JARS" \
---build-arg "img_path=$IMG_PATH" \
--t spark-base \
--f "$IMG_PATH/spark-base/Dockerfile" .
-  for image in "${!path[@]}"; do
-docker build -t "$(image_ref $image)" -f ${path[$image]} .
-  done
+  # Detect whether this is a git clone or a Spark distribution and adjust 
paths
+  # accordingly.
+  local BUILD_ARGS
+  local IMG_PATH
+
+  if [ ! -f "$SPARK_HOME/RELEASE" ]; then
+IMG_PATH=resource-managers/kubernetes/docker/src/main/dockerfiles
+BUILD_ARGS=(
+  --build-arg
+  img_path=$IMG_PATH
+  --build-arg
+  spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
+)
+  else
+# Not passed as an argument to docker, but used to validate the Spark 
directory.
+IMG_PATH="kubernetes/dockerfiles"
+  fi
+
+  local DOCKERFILE=${DOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
+
+  if [ ! -d "$IMG_PATH" ]; then
+error "Cannot find docker images. This script must be run from a 
runnable distribution of Apache Spark."
+  fi
+
+  docker build "${BUILD_ARGS[@]}" \
+-t $(image_ref spark) \
+-f "$DOCKERFILE" .
 }
 
 function push {
-  for image in "${!path[@]}"; do
-docker push "$(image_ref $image)"
-  done
+  docker push "$(image_ref spark)"
 }
 
 function usage {
   cat <

[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160323806
  
--- Diff: bin/docker-image-tool.sh ---
@@ -60,35 +42,53 @@ function image_ref {
 }
 
 function build {
-  docker build \
---build-arg "spark_jars=$SPARK_JARS" \
---build-arg "img_path=$IMG_PATH" \
--t spark-base \
--f "$IMG_PATH/spark-base/Dockerfile" .
-  for image in "${!path[@]}"; do
-docker build -t "$(image_ref $image)" -f ${path[$image]} .
-  done
+  # Detect whether this is a git clone or a Spark distribution and adjust 
paths
--- End diff --

"paths" => "values of the following variables?.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160316799
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -0,0 +1,97 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# echo commands to the terminal output
+set -ex
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+uidentry=$(getent passwd $myuid)
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+if [ -w /etc/passwd ] ; then
+echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" 
>> /etc/passwd
+else
+echo "Container ENTRYPOINT failed to add passwd entry for 
anonymous UID"
+fi
+fi
+
+SPARK_K8S_CMD="$1"
+if [ -z "$SPARK_K8S_CMD" ]; then
+  echo "No command to execute has been provided." 1>&2
+  exit 1
+fi
+shift 1
+
+SPARK_CLASSPATH="$SPARK_CLASSPATH:${SPARK_HOME}/jars/*"
+env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt
+readarray -t SPARK_DRIVER_JAVA_OPTS < /tmp/java_opts.txt
+if [ -n "$SPARK_MOUNTED_CLASSPATH" ]; then
+  SPARK_CLASSPATH="$SPARK_CLASSPATH:$SPARK_MOUNTED_CLASSPATH"
+fi
+if [ -n "$SPARK_MOUNTED_FILES_DIR" ]; then
+  cp -R "$SPARK_MOUNTED_FILES_DIR/." .
+fi
+
+case "$SPARK_K8S_CMD" in
+  driver)
+CMD=(
+  ${JAVA_HOME}/bin/java
+  "${SPARK_DRIVER_JAVA_OPTS[@]}"
+  -cp "$SPARK_CLASSPATH"
+  -Xms$SPARK_DRIVER_MEMORY
+  -Xmx$SPARK_DRIVER_MEMORY
+  -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS
+  $SPARK_DRIVER_CLASS
+  $SPARK_DRIVER_ARGS
+)
+;;
+
+  executor)
+CMD=(
+  ${JAVA_HOME}/bin/java
+  "${SPARK_EXECUTOR_JAVA_OPTS[@]}"
+  -Xms$SPARK_EXECUTOR_MEMORY
+  -Xmx$SPARK_EXECUTOR_MEMORY
+  -cp "$SPARK_CLASSPATH"
+  org.apache.spark.executor.CoarseGrainedExecutorBackend
+  --driver-url $SPARK_DRIVER_URL
+  --executor-id $SPARK_EXECUTOR_ID
+  --cores $SPARK_EXECUTOR_CORES
+  --app-id $SPARK_APPLICATION_ID
+  --hostname $SPARK_EXECUTOR_POD_IP
+)
+;;
+
+  init)
+CMD=(
+  "/opt/spark/bin/spark-class"
--- End diff --

shouldn't this be under `SPARK_HOME`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160316427
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile ---
@@ -1,35 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-FROM spark-base
-
-# Before building the docker image, first build and make a Spark 
distribution following
-# the instructions in 
http://spark.apache.org/docs/latest/building-spark.html.
-# If this docker file is being used in the context of building your images 
from a Spark
-# distribution, the docker build command should be invoked from the top 
level directory
-# of the Spark distribution. E.g.:
-# docker build -t spark-executor:latest -f 
kubernetes/dockerfiles/executor/Dockerfile .
-
-COPY examples /opt/spark/examples
-
-CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \
-env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > 
/tmp/java_opts.txt && \
-readarray -t SPARK_EXECUTOR_JAVA_OPTS < /tmp/java_opts.txt && \
-if ! [ -z ${SPARK_MOUNTED_CLASSPATH}+x} ]; then 
SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi && \
-if ! [ -z ${SPARK_EXECUTOR_EXTRA_CLASSPATH+x} ]; then 
SPARK_CLASSPATH="$SPARK_EXECUTOR_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && \
--- End diff --

different classpath addition for different role has been taken out?
SPARK_EXECUTOR_EXTRA_CLASSPATH
SPARK_SUBMIT_EXTRA_CLASSPATH


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160316166
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh ---
@@ -0,0 +1,97 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# echo commands to the terminal output
+set -ex
+
+# Check whether there is a passwd entry for the container UID
+myuid=$(id -u)
+mygid=$(id -g)
+uidentry=$(getent passwd $myuid)
+
+# If there is no passwd entry for the container UID, attempt to create one
+if [ -z "$uidentry" ] ; then
+if [ -w /etc/passwd ] ; then
+echo "$myuid:x:$myuid:$mygid:anonymous uid:$SPARK_HOME:/bin/false" 
>> /etc/passwd
+else
+echo "Container ENTRYPOINT failed to add passwd entry for 
anonymous UID"
+fi
+fi
+
+SPARK_K8S_CMD="$1"
+if [ -z "$SPARK_K8S_CMD" ]; then
+  echo "No command to execute has been provided." 1>&2
--- End diff --

this starting container without a command, could be very useful for 
in-cluster client, or just ad hoc testing outside of k8s


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160316590
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
 ---
@@ -29,17 +29,23 @@ private[spark] object Config extends Logging {
   .stringConf
   .createWithDefault("default")
 
+  val CONTAINER_IMAGE =
+ConfigBuilder("spark.kubernetes.container.image")
+  .doc("Container image to use for Spark containers. Individual 
container types " +
+"(e.g. driver or executor) can also be configured to use different 
images if desired, " +
+"by setting the container-specific image name.")
--- End diff --

nit: `container-specific` => `container-type-specific`
and add `" eg. spark.kubernetes.driver.container.image"`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20192#discussion_r160315955
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile ---
@@ -41,7 +41,8 @@ COPY ${spark_jars} /opt/spark/jars
 COPY bin /opt/spark/bin
 COPY sbin /opt/spark/sbin
 COPY conf /opt/spark/conf
-COPY ${img_path}/spark-base/entrypoint.sh /opt/
+COPY ${img_path}/spark/entrypoint.sh /opt/
+COPY examples /opt/spark/examples
--- End diff --

a lot of examples depends on data/, should that be included too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20192: [SPARK-22994][k8s] Use a single image for all Spa...

2018-01-08 Thread vanzin
GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/20192

[SPARK-22994][k8s] Use a single image for all Spark containers.

This change allows a user to submit a Spark application on kubernetes
having to provide a single image, instead of one image for each type
of container. The image's entry point now takes an extra argument that
identifies the process that is being started.

The configuration still allows the user to provide different images
for each container type if they so desire.

On top of that, the entry point was simplified a bit to share more
code; mainly, the same env variable is used to propagate the user-defined
classpath to the different containers.

Aside from being modified to match the new behavior, the
'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh'
to more closely match its purpose; the old name was a little awkward
and now also not entirely correct, since there is a single image. It
was also moved to 'bin' since it's not necessarily an admin tool.

Docs have been updated to match the new behavior.

Tested locally with minikube.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-22994

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20192.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20192


commit bfad831de9ddd330674d73ccca759c103af32f20
Author: Marcelo Vanzin 
Date:   2018-01-06T00:50:19Z

[SPARK-22994][k8s] Use a single image for all Spark containers.

This change allows a user to submit a Spark application on kubernetes
having to provide a single image, instead of one image for each type
of container. The image's entry point now takes an extra argument that
identifies the process that is being started.

The configuration still allows the user to provide different images
for each container type if they so desire.

On top of that, the entry point was simplified a bit to share more
code; mainly, the same env variable is used to propagate the user-defined
classpath to the different containers.

Aside from being modified to match the new behavior, the
'build-push-docker-images.sh' script was renamed to 'docker-image-tool.sh'
to more closely match its purpose; the old name was a little awkward
and now also not entirely correct, since there is a single image. It
was also moved to 'bin' since it's not necessarily an admin tool.

Docs and scripts have been updated to match the new behavior.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org