[jira] [Commented] (SPARK-26342) Support for NFS mount for Kubernetes

2018-12-14 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721997#comment-16721997
 ] 

Yinan Li commented on SPARK-26342:
--

Yes, that's true. Feel free to create a PR to add nfs and flex.

> Support for NFS mount for Kubernetes
> 
>
> Key: SPARK-26342
> URL: https://issues.apache.org/jira/browse/SPARK-26342
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Eric Carlson
>Priority: Minor
>
> Currently only hostPath, emptyDir, and PVC volume types are accepted for 
> Kubernetes-deployed drivers and executors.  Possibility to mount NFS paths 
> would allow access to a common and easy-to-deploy shared storage solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26290) [K8s] Driver Pods no mounted volumes on submissions from older spark versions

2018-12-14 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li resolved SPARK-26290.
--
Resolution: Not A Bug

> [K8s] Driver Pods no mounted volumes on submissions from older spark versions
> -
>
> Key: SPARK-26290
> URL: https://issues.apache.org/jira/browse/SPARK-26290
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
> Environment: Kuberentes: 1.10.6
> Container: Spark 2.4.0 
> Spark containers are built from the archive served by 
> [www.apache.org/dist/spark/|http://www.apache.org/dist/spark/] 
> Submission done by older spark versions integrated e.g. in livy
>Reporter: Martin Buchleitner
>Priority: Major
>
> I want to use the volume feature to mount an existing PVC as readonly volume 
> into the driver and also executor. 
> The executor gets the PVC mounted, but the driver is missing the mount 
> {code:java}
> /opt/spark/bin/spark-submit \
> --deploy-mode cluster \
> --class org.apache.spark.examples.SparkPi \
> --conf spark.app.name=spark-pi \
> --conf spark.executor.instances=4 \
> --conf spark.kubernetes.namespace=spark-demo \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.container.image.pullPolicy=Always \
> --conf spark.kubernetes.container.image=kube-spark:2.4.0 \
> --conf spark.master=k8s://https:// \
> --conf 
> spark.kubernetes.driver.volumes.persistentVolumeClaim.ddata.mount.path=/srv \
> --conf 
> spark.kubernetes.driver.volumes.persistentVolumeClaim.ddata.mount.readOnly=true
>  \
> --conf 
> spark.kubernetes.driver.volumes.persistentVolumeClaim.ddata.options.claimName=nfs-pvc
>  \
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.path=/srv \
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.readOnly=true
>  \
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.claimName=nfs-pvc
>  \
> /srv/spark-examples_2.11-2.4.0.jar
> {code}
> When i use the jar included in the container
> {code:java}
> local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar
> {code}
> the call works and i can review the pod descriptions to review the behavior
> *Driver description*
> {code:java}
> Name: spark-pi-1544018157391-driver
> [...]
> Containers:
>   spark-kubernetes-driver:
> Container ID:   
> docker://3a31d867c140183247cb296e13a8b35d03835f7657dd7e625c59083024e51e28
> Image:  kube-spark:2.4.0
> Image ID:   [...]
> Port:   
> Host Port:  
> State:  Terminated
>   Reason:   Completed
>   Exit Code:0
>   Started:  Wed, 05 Dec 2018 14:55:59 +0100
>   Finished: Wed, 05 Dec 2018 14:56:08 +0100
> Ready:  False
> Restart Count:  0
> Limits:
>   memory:  1408Mi
> Requests:
>   cpu: 1
>   memory:  1Gi
> Environment:
>   SPARK_DRIVER_MEMORY:1g
>   SPARK_DRIVER_CLASS: org.apache.spark.examples.SparkPi
>   SPARK_DRIVER_ARGS:
>   SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
>   SPARK_MOUNTED_CLASSPATH:
> /opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar
>   SPARK_JAVA_OPT_1:   
> -Dspark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.path=/srv
>   SPARK_JAVA_OPT_3:   -Dspark.app.name=spark-pi
>   SPARK_JAVA_OPT_4:   
> -Dspark.kubernetes.driver.volumes.persistentVolumeClaim.ddata.mount.path=/srv
>   SPARK_JAVA_OPT_5:   -Dspark.submit.deployMode=cluster
>   SPARK_JAVA_OPT_6:   -Dspark.driver.blockManager.port=7079
>   SPARK_JAVA_OPT_7:   
> -Dspark.kubernetes.driver.volumes.persistentVolumeClaim.ddata.mount.readOnly=true
>   SPARK_JAVA_OPT_8:   
> -Dspark.kubernetes.authenticate.driver.serviceAccountName=spark
>   SPARK_JAVA_OPT_9:   
> -Dspark.driver.host=spark-pi-1544018157391-driver-svc.spark-demo.svc.cluster.local
>   SPARK_JAVA_OPT_10:  
> -Dspark.kubernetes.driver.pod.name=spark-pi-1544018157391-driver
>   SPARK_JAVA_OPT_11:  
> -Dspark.kubernetes.driver.volumes.persistentVolumeClaim.ddata.options.claimName=nfs-pvc
>   SPARK_JAVA_OPT_12:  
> -Dspark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.readOnly=true
>   SPARK_JAVA_OPT_13:  -Dspark.driver.port=7078
>   SPARK_JAVA_OPT_14:  
> -Dspark.jars=/opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar
>   SPARK_JAVA_OPT_15:  
> -Dspark.kubernetes.executor.podNamePrefix=spark-pi-1544018157391
>   SPARK_JAVA_OPT_16:  -Dspark.local.dir=/tmp/spark-local
>   SPARK_JAVA_OPT_17:  -Dspark.master=k8s://https://
>   

[jira] [Commented] (SPARK-26342) Support for NFS mount for Kubernetes

2018-12-14 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721894#comment-16721894
 ] 

Yinan Li commented on SPARK-26342:
--

So basically what you want is a generic way to mount arbitrary types of 
volumes. This is covered by SPARK-24434, which enables using a pod template to 
configure the driver and/or executor pods.

> Support for NFS mount for Kubernetes
> 
>
> Key: SPARK-26342
> URL: https://issues.apache.org/jira/browse/SPARK-26342
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Eric Carlson
>Priority: Minor
>
> Currently only hostPath, emptyDir, and PVC volume types are accepted for 
> Kubernetes-deployed drivers and executors.  Possibility to mount NFS paths 
> would allow access to a common and easy-to-deploy shared storage solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26344) Support for flexVolume mount for Kubernetes

2018-12-14 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721893#comment-16721893
 ] 

Yinan Li commented on SPARK-26344:
--

This is covered by SPARK-24434, which enables using a pod template to configure 
the driver and/or executor pods.

> Support for flexVolume mount for Kubernetes
> ---
>
> Key: SPARK-26344
> URL: https://issues.apache.org/jira/browse/SPARK-26344
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Eric Carlson
>Priority: Minor
>
> Currently only hostPath, emptyDir, and PVC volume types are accepted for 
> Kubernetes-deployed drivers and executors.
> flexVolume types allow for pluggable volume drivers to be used in Kubernetes 
> - a widely used example of this is the Rook deployment of CephFS, which 
> provides a POSIX-compliant distributed filesystem integrated into K8s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25515) Add a config property for disabling auto deletion of PODS for debugging.

2018-12-03 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li resolved SPARK-25515.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add a config property for disabling auto deletion of PODS for debugging.
> 
>
> Key: SPARK-25515
> URL: https://issues.apache.org/jira/browse/SPARK-25515
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, if a pod fails to start due to some failure, it gets removed and 
> new one is attempted. These sequence of events go on, until the app is 
> killed. Given the speed of creation and deletion, it becomes difficult to 
> debug the reason for failure.
> So adding a configuration parameter to disable auto-deletion of pods, will be 
> helpful for debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25922) [K8] Spark Driver/Executor "spark-app-selector" label mismatch

2018-11-06 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16677276#comment-16677276
 ] 

Yinan Li commented on SPARK-25922:
--

The application ID used to set the {{spark-app-selector}} label for the driver 
pod is from this line 
[https://github.com/apache/spark/blob/3404a73f4cf7be37e574026d08ad5cf82cfac871/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L217.]
 The application ID used to set the {{spark-app-selector}} label for the 
executor pod is from this line 
[https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L87|https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L87,].
 Agreed that it's problematic that two different labels are used.

> [K8] Spark Driver/Executor "spark-app-selector" label mismatch
> --
>
> Key: SPARK-25922
> URL: https://issues.apache.org/jira/browse/SPARK-25922
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0 RC4
>Reporter: Anmol Khurana
>Priority: Major
>
> Hi,
> I have been testing Spark 2.4.0 RC4 on Kubernetes  to run Python Spark 
> Applications and running into an issue where the AppId label on the driver 
> and executors mis-match. I am using the 
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator] to run these 
> applications. 
> I see a spark.app.id of the form spark-* as  "spark-app-selector" label on 
> the driver as well as in the K8 config-map which gets created for the driver 
> via spark-submit . My guess is this is coming from 
> [https://github.com/apache/spark/blob/f6cc354d83c2c9a757f9b507aadd4dbdc5825cca/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L211]
>  
> But when the driver actually comes up and brings up executors etc. , I see 
> that the "spark-app-selector" label on the executors as well as the 
> spark.app.Id config within the user-code on the driver is something of the 
> form spark-application-* ( probably from 
> [https://github.com/apache/spark/blob/b19a28dea098c7d6188f8540429c50f42952d678/core/src/main/scala/org/apache/spark/SparkContext.scala#L511]
>  & 
> [https://github.com/apache/spark/blob/bfb74394a5513134ea1da9fcf4a1783b77dd64e4/core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala#L26|https://github.com/apache/spark/blob/bfb74394a5513134ea1da9fcf4a1783b77dd64e4/core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala#L26)]
>  )
> We were consuming this "spark-app-selector" label on the Driver Pod to get 
> the App Id and use it to look-up the app in SparkHistory server (among other 
> use-cases). but due to this mis-match, this logic no longer works. This was 
> working fine in Spark 2.2 fork for Kubernetes which i was using earlier. Is 
> this expected behavior and if yes, what's the correct way to fetch the 
> applicationId from outside the application ?  
> Let me know if I can provide any more details or if I am doing something 
> wrong. Here is an example run with different *spark-app-selector* label on 
> the driver/executor : 
>  
> {code:java}
> Name: pyfiles-driver
> Namespace: default
> Priority: 0
> PriorityClassName: 
> Start Time: Thu, 01 Nov 2018 18:19:46 -0700
> Labels: spark-app-selector=spark-b78bb10feebf4e2d98c11d7b6320e18f
>  spark-role=driver
>  sparkoperator.k8s.io/app-name=pyfiles
>  sparkoperator.k8s.io/launched-by-spark-operator=true
>  version=2.4.0
> Status: Running
> Name: pyfiles-1541121585642-exec-1
> Namespace: default
> Priority: 0
> PriorityClassName: 
> Start Time: Thu, 01 Nov 2018 18:24:02 -0700
> Labels: spark-app-selector=spark-application-1541121829445
>  spark-exec-id=1
>  spark-role=executor
>  sparkoperator.k8s.io/app-name=pyfiles
>  sparkoperator.k8s.io/launched-by-spark-operator=true
>  version=2.4.0
> Status: Pending
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25787) [K8S] Spark can't use data locality information

2018-10-22 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659593#comment-16659593
 ] 

Yinan Li commented on SPARK-25787:
--

Support for data locality on k8s has not been ported to the upstream Spark repo 
yet.

> [K8S] Spark can't use data locality information
> ---
>
> Key: SPARK-25787
> URL: https://issues.apache.org/jira/browse/SPARK-25787
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Maciej Bryński
>Priority: Major
>
> I started experimenting with Spark based on this presentation:
> https://www.slideshare.net/databricks/hdfs-on-kuberneteslessons-learned-with-kimoon-kim
> I'm using excelent https://github.com/apache-spark-on-k8s/kubernetes-HDFS
> charts to deploy HDFS.
> Unfortunately reading from HDFS gives ANY locality for every task.
> Is data locality working on Kubernetes cluster ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25796) Enable external shuffle service for kubernetes mode.

2018-10-22 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659579#comment-16659579
 ] 

Yinan Li commented on SPARK-25796:
--

See https://issues.apache.org/jira/browse/SPARK-24432. 

> Enable external shuffle service for kubernetes mode.
> 
>
> Key: SPARK-25796
> URL: https://issues.apache.org/jira/browse/SPARK-25796
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> This is required to support dynamic scaling for spark jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24432) Add support for dynamic resource allocation

2018-10-22 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-24432:
-
Affects Version/s: 3.0.0

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25742) Is there a way to pass the Azure blob storage credentials to the spark for k8s init-container?

2018-10-16 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652064#comment-16652064
 ] 

Yinan Li commented on SPARK-25742:
--

The k8s secrets you add through the {{spark.kubernetes.driver.secrets.}} config 
option will also get mounted into the init-container in the driver pod. You can 
use that to pass credential for pulling dependencies into the driver 
init-container.

> Is there a way to pass the Azure blob storage credentials to the spark for 
> k8s init-container?
> --
>
> Key: SPARK-25742
> URL: https://issues.apache.org/jira/browse/SPARK-25742
> Project: Spark
>  Issue Type: Question
>  Components: Kubernetes
>Affects Versions: 2.3.2
>Reporter: Oscar Bonilla
>Priority: Minor
>
> I'm trying to run spark on a kubernetes cluster in Azure. The idea is to 
> store the Spark application jars and dependencies in a container in Azure 
> Blob Storage.
> I've tried to do this with a public container and this works OK, but when 
> having a private Blob Storage container, the spark-init init container 
> doesn't download the jars.
> The equivalent in AWS S3 is as simple as adding the key_id and secret as 
> environment variables, but I don't see how to do this for Azure Blob Storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25682) Docker images generated from dev build and from dist tarball are different

2018-10-09 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644198#comment-16644198
 ] 

Yinan Li commented on SPARK-25682:
--

Cool, thanks!

> Docker images generated from dev build and from dist tarball are different
> --
>
> Key: SPARK-25682
> URL: https://issues.apache.org/jira/browse/SPARK-25682
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> There's at least one difference I noticed, because of this line:
> {noformat}
> COPY examples /opt/spark/examples
> {noformat}
> In a dev build, "examples" contains your usual source code and maven-style 
> directories, whereas in the dist version, it's this:
> {code}
> cp "$SPARK_HOME"/examples/target/scala*/jars/* "$DISTDIR/examples/jars"
> {code}
> So the path to the actual jar files ends up being different depending on how 
> you built the image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25682) Docker images generated from dev build and from dist tarball are different

2018-10-09 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644157#comment-16644157
 ] 

Yinan Li commented on SPARK-25682:
--

That looks like to me the only difference. {{bin}}, {{sbin}}, and {{data}} are 
also hard-coded but they appear to be the same between the source and a 
distribution. Are you working on a fix? 

> Docker images generated from dev build and from dist tarball are different
> --
>
> Key: SPARK-25682
> URL: https://issues.apache.org/jira/browse/SPARK-25682
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> There's at least one difference I noticed, because of this line:
> {noformat}
> COPY examples /opt/spark/examples
> {noformat}
> In a dev build, "examples" contains your usual source code and maven-style 
> directories, whereas in the dist version, it's this:
> {code}
> cp "$SPARK_HOME"/examples/target/scala*/jars/* "$DISTDIR/examples/jars"
> {code}
> So the path to the actual jar files ends up being different depending on how 
> you built the image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25500) Specify configmap and secrets in Spark driver and executor pods in Kubernetes

2018-09-23 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625404#comment-16625404
 ] 

Yinan Li edited comment on SPARK-25500 at 9/24/18 5:51 AM:
---

We don't plan to add more configuration properties for pod customization as we 
move to a pod template model. See 
https://issues.apache.org/jira/browse/SPARK-24434. It supports all use cases 
you mentioned above. BTW: we already have 
{{spark.kubernetes.[driver|executor].secrets.[SecretName]=[MountPath]}} since 
Spark 2.3.


was (Author: liyinan926):
We don't plan to add more configuration properties for pod customization as we 
move to a pod template model. See 
https://issues.apache.org/jira/browse/SPARK-24434. It supports all use cases 
you mentioned above. BTW: we already have 
{{spark.kubernetes.\{driver|executor}.secrets.[SecretName]=[MountPath] }}since 
Spark 2.3{{.}}

> Specify configmap and secrets in Spark driver and executor pods in Kubernetes
> -
>
> Key: SPARK-25500
> URL: https://issues.apache.org/jira/browse/SPARK-25500
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Abhishek Rao
>Priority: Minor
>
> This uses SPARK-23529. Support for specifying configmap and secrets as 
> spark-configuration is requested.
> Using PR #22146, the above functionality can be achieved by passing template 
> file. However, for spark properties (like log4j.properties, fairscheduler.xml 
> and metrics.properties), we are proposing this approach as this is native to 
> other configuration options specifications in spark.
> The configmaps and secrets have to be pre-created before using this as spark 
> configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25500) Specify configmap and secrets in Spark driver and executor pods in Kubernetes

2018-09-23 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625404#comment-16625404
 ] 

Yinan Li commented on SPARK-25500:
--

We don't plan to add more configuration properties for pod customization as we 
move to a pod template model. See 
https://issues.apache.org/jira/browse/SPARK-24434. It supports all use cases 
you mentioned above. BTW: we already have 
{{spark.kubernetes.\{driver|executor}.secrets.[SecretName]=[MountPath] }}since 
Spark 2.3{{.}}

> Specify configmap and secrets in Spark driver and executor pods in Kubernetes
> -
>
> Key: SPARK-25500
> URL: https://issues.apache.org/jira/browse/SPARK-25500
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Abhishek Rao
>Priority: Minor
>
> This uses SPARK-23529. Support for specifying configmap and secrets as 
> spark-configuration is requested.
> Using PR #22146, the above functionality can be achieved by passing template 
> file. However, for spark properties (like log4j.properties, fairscheduler.xml 
> and metrics.properties), we are proposing this approach as this is native to 
> other configuration options specifications in spark.
> The configmaps and secrets have to be pre-created before using this as spark 
> configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23200) Reset configuration when restarting from checkpoints

2018-09-18 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li resolved SPARK-23200.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22392
[https://github.com/apache/spark/pull/22392]

> Reset configuration when restarting from checkpoints
> 
>
> Key: SPARK-23200
> URL: https://issues.apache.org/jira/browse/SPARK-23200
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Major
> Fix For: 2.4.0
>
>
> Streaming workloads and restarting from checkpoints may need additional 
> changes, i.e. resetting properties -  see 
> https://github.com/apache-spark-on-k8s/spark/pull/516



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25291) Flakiness of tests in terms of executor memory (SecretsTestSuite)

2018-09-18 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li resolved SPARK-25291.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> Flakiness of tests in terms of executor memory (SecretsTestSuite)
> -
>
> Key: SPARK-25291
> URL: https://issues.apache.org/jira/browse/SPARK-25291
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Ilan Filonenko
>Priority: Major
> Fix For: 2.4.0
>
>
> SecretsTestSuite shows flakiness in terms of correct setting of executor 
> memory: 
> Run SparkPi with env and mount secrets. *** FAILED ***
>  "[884]Mi" did not equal "[1408]Mi" (KubernetesSuite.scala:272)
> When ran with default settings 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25295) Pod names conflicts in client mode, if previous submission was not a clean shutdown.

2018-09-12 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li resolved SPARK-25295.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> Pod names conflicts in client mode, if previous submission was not a clean 
> shutdown.
> 
>
> Key: SPARK-25295
> URL: https://issues.apache.org/jira/browse/SPARK-25295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
> Fix For: 2.4.0
>
>
> If the previous job was killed somehow, by disconnecting the client. It 
> leaves behind the executor pods named spark-exec-#, which cause naming 
> conflicts and failures for the next job submission.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://:6443/api/v1/namespaces/default/pods. Message: pods 
> "spark-exec-4" already exists. Received status: Status(apiVersion=v1, 
> code=409, details=StatusDetails(causes=[], group=null, kind=pods, 
> name=spark-exec-4, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=pods "spark-exec-4" already 
> exists, metadata=ListMeta(resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=AlreadyExists, status=Failure, 
> additionalProperties={}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-25295) Pod names conflicts in client mode, if previous submission was not a clean shutdown.

2018-09-06 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-25295:
-
Comment: was deleted

(was: We made it clear in the documentation of the Kubernetes mode at 
[https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md#client-mode-executor-pod-garbage-collection]
 that when running the client mode, executor pods may be left behind. This is 
by design. If you want to have the executor pods deleted automatically, run the 
driver in a pod inside the cluster and set {{spark.driver.pod.name}} to the 
name of the driver pod so an {{OwnerReference}} pointing to the driver pod gets 
added to the executor pods. This way the executor pods get garbage collected 
when the driver pod is gone.)

> Pod names conflicts in client mode, if previous submission was not a clean 
> shutdown.
> 
>
> Key: SPARK-25295
> URL: https://issues.apache.org/jira/browse/SPARK-25295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> If the previous job was killed somehow, by disconnecting the client. It 
> leaves behind the executor pods named spark-exec-#, which cause naming 
> conflicts and failures for the next job submission.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://:6443/api/v1/namespaces/default/pods. Message: pods 
> "spark-exec-4" already exists. Received status: Status(apiVersion=v1, 
> code=409, details=StatusDetails(causes=[], group=null, kind=pods, 
> name=spark-exec-4, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=pods "spark-exec-4" already 
> exists, metadata=ListMeta(resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=AlreadyExists, status=Failure, 
> additionalProperties={}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25282) Fix support for spark-shell with K8s

2018-08-31 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599310#comment-16599310
 ] 

Yinan Li commented on SPARK-25282:
--

I'm not sure this is a bug and how this should be enforced systematically. When 
you use the client mode and run the driver outside a cluster on a host, you are 
using the Spark distribution on the host, which may or may not have the same 
version as that of the Spark jars in the image. I guess this is not even a 
unique problem to Spark on Kubernetes.

> Fix support for spark-shell with K8s
> 
>
> Key: SPARK-25282
> URL: https://issues.apache.org/jira/browse/SPARK-25282
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Spark shell when run with kubernetes master, gives following errors.
> {noformat}
> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local 
> class incompatible: stream classdesc serialVersionUID = -3720498261147521051, 
> local class serialVersionUID = -6655865447853211720
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
> {noformat}
> Special care was taken to ensure, the same compiled jar was used both in 
> images and the host system. or system running the driver.
> This issue affects, pyspark and R interface as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25295) Pod names conflicts in client mode, if previous submission was not a clean shutdown.

2018-08-31 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599308#comment-16599308
 ] 

Yinan Li commented on SPARK-25295:
--

We made it clear in the documentation of the Kubernetes mode at 
[https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md#client-mode-executor-pod-garbage-collection]
 that when running the client mode, executor pods may be left behind. This is 
by design. If you want to have the executor pods deleted automatically, run the 
driver in a pod inside the cluster and set {{spark.driver.pod.name}} to the 
name of the driver pod so an {{OwnerReference}} pointing to the driver pod gets 
added to the executor pods. This way the executor pods get garbage collected 
when the driver pod is gone.

> Pod names conflicts in client mode, if previous submission was not a clean 
> shutdown.
> 
>
> Key: SPARK-25295
> URL: https://issues.apache.org/jira/browse/SPARK-25295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> If the previous job was killed somehow, by disconnecting the client. It 
> leaves behind the executor pods named spark-exec-#, which cause naming 
> conflicts and failures for the next job submission.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://:6443/api/v1/namespaces/default/pods. Message: pods 
> "spark-exec-4" already exists. Received status: Status(apiVersion=v1, 
> code=409, details=StatusDetails(causes=[], group=null, kind=pods, 
> name=spark-exec-4, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=pods "spark-exec-4" already 
> exists, metadata=ListMeta(resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=AlreadyExists, status=Failure, 
> additionalProperties={}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599304#comment-16599304
 ] 

Yinan Li commented on SPARK-24434:
--

[~skonto] we can understand your feeling and frustration on this, and we really 
appreciate your work driving the design. AFAIK, the PR created by [~onursatici] 
follows the design (you are helping reviewing it so you can judge if this is 
the case). I think the situation was that people wanted to move this forward 
(granted that you were driving this) while you were on vacation and thought it 
would be good to get the ball rolling with a WIP PR that everyone could comment 
and give early feedbacks on. The fact that no one knew how far you had gone on 
the implementation before you started your vacation is probably also a factor 
here. Anyway, with that being said, we really appreciate your work driving the 
design and reviewing the PR! If you want to have further discussion on this and 
have ideas on how to better coordinate on big features in the future, let us 
know and we can bring it up at the next sig meeting. 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-27 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594038#comment-16594038
 ] 

Yinan Li commented on SPARK-24434:
--

It seemed I couldn't change the assignee.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25162) Kubernetes 'in-cluster' client mode and value of spark.driver.host

2018-08-22 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589286#comment-16589286
 ] 

Yinan Li commented on SPARK-25162:
--

> Where the driver is running _outside-cluster client_ mode,  would you 
>recommend a default behavior of deriving the IP address of the host on which 
>the driver  is running (provided that IP address is routable from inside the 
>cluster) and giving the user the option to override and supply a FQDN or 
>routable IP address for the driver?

The philosophy behind the client mode in the Kubernetes deployment mode is to 
not be opinionated on how users setup network connectivity from the executors 
to the driver. So it's really up to the users to decide what's the best way to 
provide such connectivity. Please check out 
https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md#client-mode.

> Kubernetes 'in-cluster' client mode and value of spark.driver.host
> --
>
> Key: SPARK-25162
> URL: https://issues.apache.org/jira/browse/SPARK-25162
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
> Environment: A java program, deployed to kubernetes, that establishes 
> a Spark Context in client mode. 
> Not using spark-submit.
> Kubernetes 1.10
> AWS EKS
>  
>  
>Reporter: James Carter
>Priority: Minor
>
> When creating Kubernetes scheduler 'in-cluster' using client mode, the value 
> for spark.driver.host can be derived from the IP address of the driver pod.
> I observed that the value of _spark.driver.host_ defaulted to the value of 
> _spark.kubernetes.driver.pod.name_, which is not a valid hostname.  This 
> caused the executors to fail to establish a connection back to the driver.
> As a work around, in my configuration I pass the driver's pod name _and_ the 
> driver's ip address to ensure that executors can establish a connection with 
> the driver.
> _spark.kubernetes.driver.pod.name_ := env.valueFrom.fieldRef.fieldPath: 
> metadata.name
> _spark.driver.host_ := env.valueFrom.fieldRef.fieldPath: status.podIp
> e.g.
> Deployment:
> {noformat}
> env:
> - name: DRIVER_POD_NAME
>   valueFrom:
> fieldRef:
>   fieldPath: metadata.name
> - name: DRIVER_POD_IP
>   valueFrom:
> fieldRef:
>   fieldPath: status.podIP
> {noformat}
>  
> Application Properties:
> {noformat}
> config[spark.kubernetes.driver.pod.name]: ${DRIVER_POD_NAME}
> config[spark.driver.host]: ${DRIVER_POD_IP}
> {noformat}
>  
> BasicExecutorFeatureStep.scala:
> {code:java}
> private val driverUrl = RpcEndpointAddress(
> kubernetesConf.get("spark.driver.host"),
> kubernetesConf.sparkConf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
> CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
> {code}
>  
> Ideally only _spark.kubernetes.driver.pod.name_ would need be provided in 
> this deployment scenario.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25194) Kubernetes - Define cpu and memory limit to init container

2018-08-22 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589283#comment-16589283
 ] 

Yinan Li commented on SPARK-25194:
--

The upcoming Spark 2.4 gets rid of the init-container and switch to running 
{{spark-submit}} in client mode in the driver to download remote dependencies. 
Given that 2.4 is coming soon, I would suggest waiting for and using it 
instead. 

> Kubernetes - Define cpu and memory limit to init container
> --
>
> Key: SPARK-25194
> URL: https://issues.apache.org/jira/browse/SPARK-25194
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Daniel Majano
>Priority: Major
>  Labels: features
>
> Hi,
>  
> Recently I have started to work with spark under kubernetes. We have all our 
> kubernetes clusters with resources quotes, so if you want to do a deploy yo 
> need to define container cpu and memory limit.
>  
> With driver and executors this is ok due to with spark submit props you can 
> define this limits. But today for one of my projects, I need to load an 
> external dependency. I have tried to define the dependency with --jars and 
> the link with https so then, the init container will pop up and you don't 
> have the possibility to define limits and the submitter failed due to he 
> can't start the pod with driver + init container.
>  
>  
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25162) Kubernetes 'in-cluster' client mode and value of spark.driver.host

2018-08-21 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588014#comment-16588014
 ] 

Yinan Li commented on SPARK-25162:
--

We actually moved away from using the IP address of the driver pod to set 
{{spark.driver.host}}, to using a headless service to give the driver pod a 
FQDN name and set {{spark.driver.host}} to the FQDN name. Internally, we set 
{{spark.driver.bindAddress}} to the value of environment variable 
{{SPARK_DRIVER_BIND_ADDRESS}} which gets its value from the IP address of the 
pod using the downward API. We could do the same for 
{{spark.kubernetes.driver.pod.name}} as you suggested for in-cluster client 
mode.

> Kubernetes 'in-cluster' client mode and value of spark.driver.host
> --
>
> Key: SPARK-25162
> URL: https://issues.apache.org/jira/browse/SPARK-25162
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
> Environment: A java program, deployed to kubernetes, that establishes 
> a Spark Context in client mode. 
> Not using spark-submit.
> Kubernetes 1.10
> AWS EKS
>  
>  
>Reporter: James Carter
>Priority: Minor
>
> When creating Kubernetes scheduler 'in-cluster' using client mode, the value 
> for spark.driver.host can be derived from the IP address of the driver pod.
> I observed that the value of _spark.driver.host_ defaulted to the value of 
> _spark.kubernetes.driver.pod.name_, which is not a valid hostname.  This 
> caused the executors to fail to establish a connection back to the driver.
> As a work around, in my configuration I pass the driver's pod name _and_ the 
> driver's ip address to ensure that executors can establish a connection with 
> the driver.
> _spark.kubernetes.driver.pod.name_ := env.valueFrom.fieldRef.fieldPath: 
> metadata.name
> _spark.driver.host_ := env.valueFrom.fieldRef.fieldPath: status.podIp
> e.g.
> Deployment:
> {noformat}
> env:
> - name: DRIVER_POD_NAME
>   valueFrom:
> fieldRef:
>   fieldPath: metadata.name
> - name: DRIVER_POD_IP
>   valueFrom:
> fieldRef:
>   fieldPath: status.podIP
> {noformat}
>  
> Application Properties:
> {noformat}
> config[spark.kubernetes.driver.pod.name]: ${DRIVER_POD_NAME}
> config[spark.driver.host]: ${DRIVER_POD_IP}
> {noformat}
>  
> BasicExecutorFeatureStep.scala:
> {code:java}
> private val driverUrl = RpcEndpointAddress(
> kubernetesConf.get("spark.driver.host"),
> kubernetesConf.sparkConf.getInt("spark.driver.port", DEFAULT_DRIVER_PORT),
> CoarseGrainedSchedulerBackend.ENDPOINT_NAME).toString
> {code}
>  
> Ideally only _spark.kubernetes.driver.pod.name_ would need be provided in 
> this deployment scenario.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-20 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16586314#comment-16586314
 ] 

Yinan Li commented on SPARK-24434:
--

[~skonto] I will make sure the assignee gets properly set for future JIRAs. 
[~onursatici], if you would like to work on the implementation, please make 
sure you read through the design doc from [~skonto] and make the implementation 
follow what the design proposes. Thanks! 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25066) Provide Spark R image for deploying Spark on kubernetes.

2018-08-10 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576555#comment-16576555
 ] 

Yinan Li commented on SPARK-25066:
--

R support is still being worked on and will likely go into 2.4. Is this Jira 
for that work?

> Provide Spark R image for deploying Spark on kubernetes.
> 
>
> Key: SPARK-25066
> URL: https://issues.apache.org/jira/browse/SPARK-25066
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Prashant Sharma
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24724) Discuss necessary info and access in barrier mode + Kubernetes

2018-07-27 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560436#comment-16560436
 ] 

Yinan Li commented on SPARK-24724:
--

Sorry haven't got a chance to look into this. What pieces of info and what kind 
of access we need to provide? I saw some comments on thesimilar Jira for Yarn 
and particularly the one quoted below:

"The main problem is how to provide necessary information for barrier tasks to 
start MPI job in a password-less manner".

Is the main problem the same for Kubernetes?

 

 

 

> Discuss necessary info and access in barrier mode + Kubernetes
> --
>
> Key: SPARK-24724
> URL: https://issues.apache.org/jira/browse/SPARK-24724
> Project: Spark
>  Issue Type: Story
>  Components: Kubernetes, ML, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Yinan Li
>Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + Kubernetes. There were 
> some past and on-going attempts from the Kubenetes community. So we should 
> find someone with good knowledge to lead the discussion here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24894) Invalid DNS name due to hostname truncation

2018-07-24 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554602#comment-16554602
 ] 

Yinan Li commented on SPARK-24894:
--

[~mcheah]. We need to make sure the truncation leads to a valid hostname.

> Invalid DNS name due to hostname truncation 
> 
>
> Key: SPARK-24894
> URL: https://issues.apache.org/jira/browse/SPARK-24894
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Dharmesh Kakadia
>Priority: Major
>
> The truncation for hostname happening here 
> [https://github.com/apache/spark/blob/5ff1b9ba1983d5601add62aef64a3e87d07050eb/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala#L77]
>   is a problematic and can lead to DNS names starting with "-". 
> Originally filled here : 
> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/229
> ```
> {{2018-07-23 21:21:42 ERROR Utils:91 - Uncaught exception in thread 
> kubernetes-pod-allocator 
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://kubernetes.default.svc/api/v1/namespaces/default/pods. 
> Message: Pod 
> "user-archetypes-all-weekly-1532380861251850404-1532380862321-exec-9" is 
> invalid: spec.hostname: Invalid value: 
> "-archetypes-all-weekly-1532380861251850404-1532380862321-exec-9": a DNS-1123 
> label must consist of lower case alphanumeric characters or '-', and must 
> start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', 
> regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'). Received 
> status: Status(apiVersion=v1, code=422, 
> details=StatusDetails(causes=[StatusCause(field=spec.hostname, 
> message=Invalid value: 
> "-archetypes-all-weekly-1532380861251850404-1532380862321-exec-9": a DNS-1123 
> label must consist of lower case alphanumeric characters or '-', and must 
> start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', 
> regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), 
> reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, 
> name=user-archetypes-all-weekly-1532380861251850404-1532380862321-exec-9, 
> retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
> message=Pod 
> "user-archetypes-all-weekly-1532380861251850404-1532380862321-exec-9" is 
> invalid: spec.hostname: Invalid value: 
> "-archetypes-all-weekly-1532380861251850404-1532380862321-exec-9": a DNS-1123 
> label must consist of lower case alphanumeric characters or '-', and must 
> start and end with an alphanumeric character (e.g. 'my-name', or '123-abc', 
> regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?'), 
> metadata=ListMeta(resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=Invalid, status=Failure, 
> additionalProperties={}). at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:470)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:409)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:379)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:343)
>  at 
> io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:226)
>  at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:769)
>  at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:356)
>  at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend$$anon$1$$anonfun$3$$anonfun$apply$3.apply(KubernetesClusterSchedulerBackend.scala:140)
>  at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend$$anon$1$$anonfun$3$$anonfun$apply$3.apply(KubernetesClusterSchedulerBackend.scala:140)
>  at org.apache.spark.util.Utils$.tryLog(Utils.scala:1922) at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend$$anon$1$$anonfun$3.apply(KubernetesClusterSchedulerBackend.scala:139)
>  at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend$$anon$1$$anonfun$3.apply(KubernetesClusterSchedulerBackend.scala:138)
>  at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>  at 
> scala.collection.MapLike$MappedValues$$anonfun$foreach$3.apply(MapLike.scala:245)
>  at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>  at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) 
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) 
> at 

[jira] [Updated] (SPARK-24724) Discuss necessary info and access in barrier mode + Kubernetes

2018-07-12 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-24724:
-
Component/s: Kubernetes

> Discuss necessary info and access in barrier mode + Kubernetes
> --
>
> Key: SPARK-24724
> URL: https://issues.apache.org/jira/browse/SPARK-24724
> Project: Spark
>  Issue Type: Story
>  Components: Kubernetes, ML, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Yinan Li
>Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + Kubernetes. There were 
> some past and on-going attempts from the Kubenetes community. So we should 
> find someone with good knowledge to lead the discussion here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542109#comment-16542109
 ] 

Yinan Li edited comment on SPARK-24793 at 7/12/18 7:11 PM:
---

Oh, yeah, {{kill}} and {{status}} are existing options of {{spark-submit}}. 
Agreed we should add support for them into the k8s submission client. But 
options that are k8s backend specific probably should need a better place.


was (Author: liyinan926):
Oh, yeah, {{kill}} and {{status}} are existing options of {{spark-submit}}. 
Agreed we should add support for them into the k8s submission client.

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542109#comment-16542109
 ] 

Yinan Li commented on SPARK-24793:
--

Oh, yeah, {{kill}} and {{status}} are existing options of {{spark-submit}}. 
Agreed we should add support for them into the k8s submission client.

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542003#comment-16542003
 ] 

Yinan Li commented on SPARK-24793:
--

Good points, Erik. I think 
[sparkctl|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/tree/master/sparkctl]
 is also a good alternative for supporting the set of functionalities proposed. 
It is positioned better to operate on the driver pods than kubectl, while still 
looks familiar to users who are used to kubectl. 

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24432) Add support for dynamic resource allocation

2018-07-11 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540252#comment-16540252
 ] 

Yinan Li commented on SPARK-24432:
--

No one is working on this right now, but I think foxish planned to work on this 
although I'm not sure where he's at. The existing implementation in the fork 
has some issue that we need to solve. A redesign might be needed.

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24765) Add custom Kubernetes scheduler config parameter to spark-submit

2018-07-10 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538999#comment-16538999
 ] 

Yinan Li commented on SPARK-24765:
--

Check out https://issues.apache.org/jira/browse/SPARK-24434 and 
https://docs.google.com/document/d/1pcyH5f610X2jyJW9WbWHnj8jktQPLlbbmmUwdeK4fJk/edit#.

> Add custom Kubernetes scheduler config parameter to spark-submit 
> -
>
> Key: SPARK-24765
> URL: https://issues.apache.org/jira/browse/SPARK-24765
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Nihal Harish
>Priority: Minor
>
> spark submit currently does not accept any config parameter that can enable 
> the driver and executor pods to be scheduled by a custom scheduler as opposed 
> to just the default-scheduler.
> I propose the addition of a new config parameter:
> spark.kubernetes.schedulerName 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-06-14 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513064#comment-16513064
 ] 

Yinan Li commented on SPARK-24434:
--

[~skonto] Thanks! Will take a look at the design doc once I'm back from 
vacation.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-06-02 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499220#comment-16499220
 ] 

Yinan Li commented on SPARK-24434:
--

[~skonto] Thanks for the detailed thoughts! I agree with you that we can start 
with allowing users to pass in a YAML file that stores the pod template. YAML 
is more familiar to K8s users and this is key to make the experience as 
idiomatic as possible for k8s users, who are the ones that are aware of what a 
pod template is and what purpose it serves, and know what they would like to 
put into the template.  

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-06-01 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498307#comment-16498307
 ] 

Yinan Li edited comment on SPARK-24434 at 6/1/18 5:39 PM:
--

The pod template is basically a pod specification and can contain every 
possible pieces of information about a pod. It should look similar to what the 
core workload types (deployments and statefulsets for example) use, which 
contains a 
{{[PodSpec|https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2636]}}.
  

The problem is unique for the Kubernetes mode as there are many things to 
customize for a pod. Currently we basically just introduce a new Spark config 
property for each new customization aspect of a pod. Given the number of things 
to customize, this will soon become hard to maintain if we keep introducing new 
config properties. 


was (Author: liyinan926):
The pod template is basically a pod specification and can contain every 
possible pieces of information about a pod. It should look similar to what the 
core workload types (deployments and statefulsets for example) use, which 
contains a 
{{[PodSpec|https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2636]}}.
 

 

The problem is unique for the Kubernetes mode as there are many things to 
customize for a pod. Currently we basically just introduce a new Spark config 
property for each new customization aspect of a pod. Given the number of things 
to customize, this will soon become hard to maintain if we keep introducing new 
config properties. 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-06-01 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498307#comment-16498307
 ] 

Yinan Li edited comment on SPARK-24434 at 6/1/18 5:38 PM:
--

The pod template is basically a pod specification and can contain every 
possible pieces of information about a pod. It should look similar to what the 
core workload types (deployments and statefulsets for example) use, which 
contains a 
{{[PodSpec|https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2636]}}.
 

 

The problem is unique for the Kubernetes mode as there are many things to 
customize for a pod. Currently we basically just introduce a new Spark config 
property for each new customization aspect of a pod. Given the number of things 
to customize, this will soon become hard to maintain if we keep introducing new 
config properties. 


was (Author: liyinan926):
The pod template is basically a pod specification and can contain every 
possible pieces of information about a pod. It should look similar to what the 
core workload types (deployments and statefulsets for example) use, which 
contains a 
{{[PodSpec|https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2636]}}.
 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-06-01 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498307#comment-16498307
 ] 

Yinan Li commented on SPARK-24434:
--

The pod template is basically a pod specification and can contain every 
possible pieces of information about a pod. It should look similar to what the 
core workload types (deployments and statefulsets for example) use, which 
contains a 
{{[PodSpec|https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2636]}}.
 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-05-31 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496967#comment-16496967
 ] 

Yinan Li commented on SPARK-24434:
--

[~foxish] that sounds like the approach to go. 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-05-30 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495637#comment-16495637
 ] 

Yinan Li commented on SPARK-24434:
--

[~eje] That's a good question. I think we need to compare both and have a 
thorough discussion in the community once the design is out. There are pros and 
cons with each of them.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24434) Support user-specified driver and executor pod templates

2018-05-30 Thread Yinan Li (JIRA)
Yinan Li created SPARK-24434:


 Summary: Support user-specified driver and executor pod templates
 Key: SPARK-24434
 URL: https://issues.apache.org/jira/browse/SPARK-24434
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Yinan Li


With more requests for customizing the driver and executor pods coming, the 
current approach of adding new Spark configuration options has some serious 
drawbacks: 1) it means more Kubernetes specific configuration options to 
maintain, and 2) it widens the gap between the declarative model used by 
Kubernetes and the configuration model used by Spark. We should start designing 
a solution that allows users to specify pod templates as central places for all 
customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24433) Add Spark R support

2018-05-30 Thread Yinan Li (JIRA)
Yinan Li created SPARK-24433:


 Summary: Add Spark R support
 Key: SPARK-24433
 URL: https://issues.apache.org/jira/browse/SPARK-24433
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Yinan Li


This is the ticket to track work on adding support for R binding into the 
Kubernetes mode. The feature is available in our fork at 
github.com/apache-spark-on-k8s/spark and needs to be upstreamed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24432) Support for dynamic resource allocation

2018-05-30 Thread Yinan Li (JIRA)
Yinan Li created SPARK-24432:


 Summary: Support for dynamic resource allocation
 Key: SPARK-24432
 URL: https://issues.apache.org/jira/browse/SPARK-24432
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Yinan Li


This is an umbrella ticket for work on adding support for dynamic resource 
allocation into the Kubernetes mode. This requires a Kubernetes-specific 
external shuffle service. The feature is available in our fork at 
github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24432) Add support for dynamic resource allocation

2018-05-30 Thread Yinan Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-24432:
-
Summary: Add support for dynamic resource allocation  (was: Support for 
dynamic resource allocation)

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24122) Allow automatic driver restarts on K8s

2018-05-25 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491338#comment-16491338
 ] 

Yinan Li commented on SPARK-24122:
--

The operator does cover automatic restart of an application with a configurable 
restart policy. For batch ETL jobs, this is probably sufficient for common 
needs to restart jobs on failures. For streaming jobs, checkpointing is needed. 
https://issues.apache.org/jira/browse/SPARK-23980 is also relevant. 

> Allow automatic driver restarts on K8s
> --
>
> Key: SPARK-24122
> URL: https://issues.apache.org/jira/browse/SPARK-24122
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>
> [~foxish]
> Right now SparkSubmit creates the driver as a bare pod, rather than a managed 
> controller like a Deployment or a StatefulSet. This means there is no way to 
> guarantee automatic restarts, eg in case a node has an issue. Note Pod 
> RestartPolicy does not apply if a node fails. A StatefulSet would allow us to 
> guarantee that, and keep the ability for executors to find the driver using 
> DNS.
> This is particularly helpful for long-running streaming workloads, where we 
> currently use {{yarn.resourcemanager.am.max-attempts}} with YARN. I can 
> confirm that Spark Streaming and Structured Streaming applications can be 
> made to recover from such a restart, with the help of checkpointing. The 
> executors will have to be started again by the driver, but this should not be 
> a problem.
> For batch processing, we could alternatively use Kubernetes {{Job}} objects, 
> which restart pods on failure but not success. For example, note the 
> semantics provided by the {{kubectl run}} 
> [command|https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#run]
>  * {{--restart=Never}}: bare Pod
>  * {{--restart=Always}}: Deployment
>  * {{--restart=OnFailure}}: Job
> https://github.com/apache-spark-on-k8s/spark/issues/288



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24091) Internally used ConfigMap prevents use of user-specified ConfigMaps carrying Spark configs files

2018-05-25 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491279#comment-16491279
 ] 

Yinan Li commented on SPARK-24091:
--

Thanks [~tmckay]! I think the first approach is a good way of handling override 
and customization. 

> Internally used ConfigMap prevents use of user-specified ConfigMaps carrying 
> Spark configs files
> 
>
> Key: SPARK-24091
> URL: https://issues.apache.org/jira/browse/SPARK-24091
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> The recent PR [https://github.com/apache/spark/pull/20669] for removing the 
> init-container introduced a internally used ConfigMap carrying Spark 
> configuration properties in a file for the driver. This ConfigMap gets 
> mounted under {{$SPARK_HOME/conf}} and the environment variable 
> {{SPARK_CONF_DIR}} is set to point to the mount path. This pretty much 
> prevents users from mounting their own ConfigMaps that carry custom Spark 
> configuration files, e.g., {{log4j.properties}} and {{spark-env.sh}} and 
> leaves users with only the option of building custom images. IMO, it is very 
> useful to support mounting user-specified ConfigMaps for custom Spark 
> configuration files. This worths further discussions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-25 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491106#comment-16491106
 ] 

Yinan Li commented on SPARK-24383:
--

OK, then garbage collection should kick in and delete the service when the 
driver pod is gone unless there's some issue with the GC.

> spark on k8s: "driver-svc" are not getting deleted
> --
>
> Key: SPARK-24383
> URL: https://issues.apache.org/jira/browse/SPARK-24383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Lenin
>Priority: Major
>
> When the driver pod exists, the "*driver-svc" services created for the driver 
> are not cleaned up. This causes accumulation of services in the k8s layer, at 
> one point no more services can be created. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489942#comment-16489942
 ] 

Yinan Li commented on SPARK-24383:
--

You can use {{kubectl get service  -o=yaml}} to get a 
YAML-formatted representation of the service and check if the {{metadata}} 
section contains a {{OwnerReference}} pointing to the driver pod. 

> spark on k8s: "driver-svc" are not getting deleted
> --
>
> Key: SPARK-24383
> URL: https://issues.apache.org/jira/browse/SPARK-24383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Lenin
>Priority: Major
>
> When the driver pod exists, the "*driver-svc" services created for the driver 
> are not cleaned up. This causes accumulation of services in the k8s layer, at 
> one point no more services can be created. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489776#comment-16489776
 ] 

Yinan Li commented on SPARK-24383:
--

Can you double check if the services have an {{OwnerReference}} pointing to a 
driver pod?

> spark on k8s: "driver-svc" are not getting deleted
> --
>
> Key: SPARK-24383
> URL: https://issues.apache.org/jira/browse/SPARK-24383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Lenin
>Priority: Major
>
> When the driver pod exists, the "*driver-svc" services created for the driver 
> are not cleaned up. This causes accumulation of services in the k8s layer, at 
> one point no more services can be created. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24383) spark on k8s: "driver-svc" are not getting deleted

2018-05-24 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489601#comment-16489601
 ] 

Yinan Li commented on SPARK-24383:
--

The Kubernetes specific submission client adds an {{OwnerReference}} 
referencing the driver pod to the service so if you delete the driver pod, the 
corresponding service should be garbage collected. 

> spark on k8s: "driver-svc" are not getting deleted
> --
>
> Key: SPARK-24383
> URL: https://issues.apache.org/jira/browse/SPARK-24383
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Lenin
>Priority: Major
>
> When the driver pod exists, the "*driver-svc" services created for the driver 
> are not cleaned up. This causes accumulation of services in the k8s layer, at 
> one point no more services can be created. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-16 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477825#comment-16477825
 ] 

Yinan Li commented on SPARK-24248:
--

Re-sync is not a fallback nor a replacement, but a complement to the watcher. 
Re-sync runs periodically. There won't be race conditions if we use a 
concurrent queue.

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24232) Allow referring to kubernetes secrets as env variable

2018-05-11 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472574#comment-16472574
 ] 

Yinan Li commented on SPARK-24232:
--

As long as we document it clearly what is for, I think it's OK, particularly 
given that `secretKeyRef` is a well-known field name used by k8s.

> Allow referring to kubernetes secrets as env variable
> -
>
> Key: SPARK-24232
> URL: https://issues.apache.org/jira/browse/SPARK-24232
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Dharmesh Kakadia
>Priority: Major
>
> Allow referring to kubernetes secrets in the driver process via environment 
> variables. This will allow developers to use secretes without leaking them in 
> the code and at the same time secrets can be decoupled and managed 
> separately. This can be used to refer to passwords, certificates etc while 
> talking to other service (jdbc passwords, storage keys etc).
> So, at the deployment time, something like 
> ``spark.kubernetes.driver.secretKeyRef.[EnvName]=`` can be specified 
> which will make [EnvName].[key] available as an environment variable and in 
> the code its always referred as env variable [key].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24232) Allow referring to kubernetes secrets as env variable

2018-05-11 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472561#comment-16472561
 ] 

Yinan Li edited comment on SPARK-24232 at 5/11/18 7:55 PM:
---

We should keep the current semantics of 
`spark.kubernetes.driver.secrets.=`. The proposal you have 
above is likely confusing to existing users who already use 
`spark.kubernetes.driver.secrets.=`. It also makes the code 
unnecessarily complicated. Like what I said on Slack, it's better to do this 
through a new property prefix, e.g., `spark.kubernetes.driver.secretKeyRef.`. 
We also need the same for executors. See 
[http://spark.apache.org/docs/latest/running-on-kubernetes.html#secret-management].


was (Author: liyinan926):
We should keep the current semantics of 
`spark.kubernetes.driver.secrets.=`. The proposal you have 
above is a breaking change for existing users who already use 
`spark.kubernetes.driver.secrets.=`. Like what I said on 
Slack, it's better to do this through a new property prefix, e.g., 
`spark.kubernetes.driver.secretKeyRef.`. We also need the same for executors. 
See 
http://spark.apache.org/docs/latest/running-on-kubernetes.html#secret-management.

> Allow referring to kubernetes secrets as env variable
> -
>
> Key: SPARK-24232
> URL: https://issues.apache.org/jira/browse/SPARK-24232
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Dharmesh Kakadia
>Priority: Major
>
> Allow referring to kubernetes secrets in the driver process via environment 
> variables. This will allow developers to use secretes without leaking them in 
> the code and at the same time secrets can be decoupled and managed 
> separately. This can be used to refer to passwords, certificates etc while 
> talking to other service (jdbc passwords, storage keys etc).
> So, at the deployment time, something like 
> ``spark.kubernetes.driver.secretKeyRef.[EnvName]=`` can be specified 
> which will make [EnvName].[key] available as an environment variable and in 
> the code its always referred as env variable [key].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24232) Allow referring to kubernetes secrets as env variable

2018-05-11 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16472561#comment-16472561
 ] 

Yinan Li commented on SPARK-24232:
--

We should keep the current semantics of 
`spark.kubernetes.driver.secrets.=`. The proposal you have 
above is a breaking change for existing users who already use 
`spark.kubernetes.driver.secrets.=`. Like what I said on 
Slack, it's better to do this through a new property prefix, e.g., 
`spark.kubernetes.driver.secretKeyRef.`. We also need the same for executors. 
See 
http://spark.apache.org/docs/latest/running-on-kubernetes.html#secret-management.

> Allow referring to kubernetes secrets as env variable
> -
>
> Key: SPARK-24232
> URL: https://issues.apache.org/jira/browse/SPARK-24232
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Dharmesh Kakadia
>Priority: Major
>
> Allow referring to kubernetes secrets in the driver process via environment 
> variables. This will allow developers to use secretes without leaking them in 
> the code and at the same time secrets can be decoupled and managed 
> separately. This can be used to refer to passwords, certificates etc while 
> talking to other service (jdbc passwords, storage keys etc).
> So, at the deployment time, something like 
> ``spark.kubernetes.driver.secretKeyRef.[EnvName]=`` can be specified 
> which will make [EnvName].[key] available as an environment variable and in 
> the code its always referred as env variable [key].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471479#comment-16471479
 ] 

Yinan Li commented on SPARK-24248:
--

I think it's both more robust and easier to implement with a periodic resync, 
which is what most of the core controllers use. With this setup, you can use a 
queue to hold executor pod updates to be processed. The resync and watcher both 
enqueues pod updates, whereas a thread dequeues and processes each update 
sequentially. This avoids the need for explicit synchronization. The queue also 
serves as a cache.

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471288#comment-16471288
 ] 

Yinan Li commented on SPARK-24248:
--

Just realized one thing: solely replying on the watcher poses risks of losing 
executor pod updates. This can potentially happen for example if the API server 
gets restarted or if the watch connection is interrupted temporarily while the 
pods are running. So periodic polling is still needed. This is referred to as 
resync in controller term. Enabling resync is almost always a good thing. 

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471259#comment-16471259
 ] 

Yinan Li commented on SPARK-24248:
--

Actually even if the fabric8 client does not support caching, we can 
effectively achieve that and greatly simplify our code logic by doing the 
following:
 # Get rid of the existing in-memory data structures and replace them with a 
single in-memory cache of all live executor pod objects.
 # The cache is updated on every watch events. A new pod event adds one entry 
to the cache, a modification event updates an existing object, and a deletion 
event deletes the object.
 # Always get status of an executor pod by retrieving the pod object from the 
cache, falling back to talking to the API server if there's a cache miss (due 
to the delay of the watch event).

Thoughts?

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471244#comment-16471244
 ] 

Yinan Li commented on SPARK-24248:
--

It's potentially possible to get rid of the in-memory state in favor of getting 
pod state from the pod objects directly if we are fine with the performance 
penalty of communicating with the API server for each state check. One 
optimization is to cache executor pod objects so retrieving them doesn't 
involve network communication. This is possible with the golang client library, 
but I'm not sure about the Java client we use.  

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24137) [K8s] Mount temporary directories in emptydir volumes

2018-05-10 Thread Yinan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-24137:
-
Fix Version/s: (was: 2.3.1)

> [K8s] Mount temporary directories in emptydir volumes
> -
>
> Key: SPARK-24137
> URL: https://issues.apache.org/jira/browse/SPARK-24137
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently the Spark local directories do not get any volumes and volume 
> mounts, which means we're writing Spark shuffle and cache contents to the 
> file system mounted by Docker. This can be terribly inefficient. We should 
> use emptydir volumes for these directories instead for significant 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24137) [K8s] Mount temporary directories in emptydir volumes

2018-05-10 Thread Yinan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-24137:
-
Fix Version/s: 2.3.1

> [K8s] Mount temporary directories in emptydir volumes
> -
>
> Key: SPARK-24137
> URL: https://issues.apache.org/jira/browse/SPARK-24137
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 2.3.1, 3.0.0
>
>
> Currently the Spark local directories do not get any volumes and volume 
> mounts, which means we're writing Spark shuffle and cache contents to the 
> file system mounted by Docker. This can be terribly inefficient. We should 
> use emptydir volumes for these directories instead for significant 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-01 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460066#comment-16460066
 ] 

Yinan Li edited comment on SPARK-24135 at 5/1/18 7:53 PM:
--

I agree that we should add detection for initialization errors. But I'm not 
sure if requesting new executors to replace the ones that failed initialization 
is a good idea. External webhooks or initializers are typically installed by 
cluster admins and there's always risks of bugs in the webhooks or initializers 
that cause pods to fail initialization. In case of initializers, things are 
worse as pods will not be able to get out of pending status if for whatever 
reasons the controller that's handling a particular initializer is down. For 
the reasons [~mcheah] mentioned above, it's not obvious if initialization 
errors should count towards job failures. I think keeping track of how many 
initialization errors are seen and stopping requesting new executors after 
certain threshold might be a good idea.


was (Author: liyinan926):
I agree that we should add detection for initialization errors. But I'm not 
sure if requesting new executors to replace the ones that failed initialization 
is a good idea. External webhooks or initializers are typically installed by 
cluster admins and there's always risks of bugs in the webhooks or initializers 
that cause pods to fail initialization. In case of initializers, things are 
worse as pods will not be able to get out of pending status if for whatever 
reasons the controller that's handling a particular initializer is down. For 
the reasons [~mcheah] mentioned above, it's not obvious if initialization 
errors should count towards job failures. I think keeping track of how many 
initialization errors are seen and stopping requesting new executors might be a 
good idea.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-01 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460066#comment-16460066
 ] 

Yinan Li commented on SPARK-24135:
--

I agree that we should add detection for initialization errors. But I'm not 
sure if requesting new executors to replace the ones that failed initialization 
is a good idea. External webhooks or initializers are typically installed by 
cluster admins and there's always risks of bugs in the webhooks or initializers 
that cause pods to fail initialization. In case of initializers, things are 
worse as pods will not be able to get out of pending status if for whatever 
reasons the controller that's handling a particular initializer is down. For 
the reasons [~mcheah] mentioned above, it's not obvious if initialization 
errors should count towards job failures. I think keeping track of how many 
initialization errors are seen and stopping requesting new executors might be a 
good idea.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24137) [K8s] Mount temporary directories in emptydir volumes

2018-05-01 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459900#comment-16459900
 ] 

Yinan Li commented on SPARK-24137:
--

Yeah, {{LocalDirectoryMountConfigurationStep}} was missed in the upstream PRs. 
We probably should try to get it into 2.3.1.

> [K8s] Mount temporary directories in emptydir volumes
> -
>
> Key: SPARK-24137
> URL: https://issues.apache.org/jira/browse/SPARK-24137
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> Currently the Spark local directories do not get any volumes and volume 
> mounts, which means we're writing Spark shuffle and cache contents to the 
> file system mounted by Docker. This can be terribly inefficient. We should 
> use emptydir volumes for these directories instead for significant 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-01 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459892#comment-16459892
 ] 

Yinan Li commented on SPARK-24135:
--

I think it's fine detecting and deleting the executor pods that failed 
initialization. But I'm not sure how much this buys us because very likely the 
newly requested executors will fail to be initialized, in particular if the 
init-container is added by an external webhook or an initializer. The job won't 
be able to proceed in case of this and effectively the bottleneck still exists.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24091) Internally used ConfigMap prevents use of user-specified ConfigMaps carrying Spark configs files

2018-04-25 Thread Yinan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-24091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-24091:
-
Affects Version/s: (was: 2.3.0)
   2.4.0

> Internally used ConfigMap prevents use of user-specified ConfigMaps carrying 
> Spark configs files
> 
>
> Key: SPARK-24091
> URL: https://issues.apache.org/jira/browse/SPARK-24091
> Project: Spark
>  Issue Type: Brainstorming
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> The recent PR [https://github.com/apache/spark/pull/20669] for removing the 
> init-container introduced a internally used ConfigMap carrying Spark 
> configuration properties in a file for the driver. This ConfigMap gets 
> mounted under {{$SPARK_HOME/conf}} and the environment variable 
> {{SPARK_CONF_DIR}} is set to point to the mount path. This pretty much 
> prevents users from mounting their own ConfigMaps that carry custom Spark 
> configuration files, e.g., {{log4j.properties}} and {{spark-env.sh}} and 
> leaves users with only the option of building custom images. IMO, it is very 
> useful to support mounting user-specified ConfigMaps for custom Spark 
> configuration files. This worths further discussions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24091) Internally used ConfigMap prevents use of user-specified ConfigMaps carrying Spark configs files

2018-04-25 Thread Yinan Li (JIRA)
Yinan Li created SPARK-24091:


 Summary: Internally used ConfigMap prevents use of user-specified 
ConfigMaps carrying Spark configs files
 Key: SPARK-24091
 URL: https://issues.apache.org/jira/browse/SPARK-24091
 Project: Spark
  Issue Type: Brainstorming
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Yinan Li


The recent PR [https://github.com/apache/spark/pull/20669] for removing the 
init-container introduced a internally used ConfigMap carrying Spark 
configuration properties in a file for the driver. This ConfigMap gets mounted 
under {{$SPARK_HOME/conf}} and the environment variable {{SPARK_CONF_DIR}} is 
set to point to the mount path. This pretty much prevents users from mounting 
their own ConfigMaps that carry custom Spark configuration files, e.g., 
{{log4j.properties}} and {{spark-env.sh}} and leaves users with only the option 
of building custom images. IMO, it is very useful to support mounting 
user-specified ConfigMaps for custom Spark configuration files. This worths 
further discussions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23638) Spark on k8s: spark.kubernetes.initContainer.image has no effect

2018-04-23 Thread Yinan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li resolved SPARK-23638.
--
Resolution: Not A Problem

> Spark on k8s: spark.kubernetes.initContainer.image has no effect
> 
>
> Key: SPARK-23638
> URL: https://issues.apache.org/jira/browse/SPARK-23638
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
> Environment: K8 server: Ubuntu 16.04
> Submission client: macOS Sierra 10.12.x
> Client Version: version.Info\{Major:"1", Minor:"9", GitVersion:"v1.9.3", 
> GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", 
> BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", 
> Platform:"darwin/amd64"}
> Server Version: version.Info\{Major:"1", Minor:"8", GitVersion:"v1.8.3", 
> GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", 
> BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", 
> Platform:"linux/amd64"}
>Reporter: maheshvra
>Priority: Major
>
> Hi all - I am trying to use initContainer to download remote dependencies. To 
> begin with, I ran a test with initContainer which basically "echo hello 
> world". However, when i triggered the pod deployment via spark-submit, I did 
> not see any trace of initContainer execution in my kubernetes cluster.
>  
> {code:java}
> SPARK_DRIVER_MEMORY: 1g 
> SPARK_DRIVER_CLASS: com.bigdata.App SPARK_DRIVER_ARGS: -c 
> /opt/spark/work-dir/app/main/environments/int -w 
> ./../../workflows/workflow_main.json -e prod -n features -v off 
> SPARK_DRIVER_BIND_ADDRESS:  
> SPARK_JAVA_OPT_0: -Dspark.submit.deployMode=cluster 
> SPARK_JAVA_OPT_1: -Dspark.driver.blockManager.port=7079 
> SPARK_JAVA_OPT_2: -Dspark.app.name=fg-am00-raw12 
> SPARK_JAVA_OPT_3: 
> -Dspark.kubernetes.container.image=docker.com/cmapp/fg-am00-raw:1.0.0 
> SPARK_JAVA_OPT_4: -Dspark.app.id=spark-4fa9a5ce1b1d401fa9c1e413ff030d44 
> SPARK_JAVA_OPT_5: 
> -Dspark.jars=/opt/spark/jars/aws-java-sdk-1.7.4.jar,/opt/spark/jars/hadoop-aws-2.7.3.jar,/opt/spark/jars/guava-14.0.1.jar,/opt/spark/jars/SparkApp.jar,/opt/spark/jars/datacleanup-component-1.0-SNAPSHOT.jar
>  
> SPARK_JAVA_OPT_6: -Dspark.driver.port=7078 
> SPARK_JAVA_OPT_7: 
> -Dspark.kubernetes.initContainer.image=docker.com/cmapp/custombusybox:1.0.0 
> SPARK_JAVA_OPT_8: 
> -Dspark.kubernetes.executor.podNamePrefix=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615
>  
> SPARK_JAVA_OPT_9: 
> -Dspark.kubernetes.driver.pod.name=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver
>  
> SPARK_JAVA_OPT_10: 
> -Dspark.driver.host=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver-svc.experimental.svc
>  SPARK_JAVA_OPT_11: -Dspark.executor.instances=5 
> SPARK_JAVA_OPT_12: 
> -Dspark.hadoop.fs.s3a.server-side-encryption-algorithm=AES256 
> SPARK_JAVA_OPT_13: -Dspark.kubernetes.namespace=experimental 
> SPARK_JAVA_OPT_14: 
> -Dspark.kubernetes.authenticate.driver.serviceAccountName=experimental-service-account
>  SPARK_JAVA_OPT_15: -Dspark.master=k8s://https://bigdata
> {code}
>  
> Further, I did not see spec.initContainers section in the generated pod. 
> Please see the details below
>  
> {code:java}
>  
> {
> "kind": "Pod",
> "apiVersion": "v1",
> "metadata": {
> "name": "fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver",
> "namespace": "experimental",
> "selfLink": 
> "/api/v1/namespaces/experimental/pods/fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver",
> "uid": "adc5a50a-2342-11e8-87dc-12c5b3954044",
> "resourceVersion": "299054",
> "creationTimestamp": "2018-03-09T02:36:32Z",
> "labels": {
> "spark-app-selector": "spark-4fa9a5ce1b1d401fa9c1e413ff030d44",
> "spark-role": "driver"
> },
> "annotations": {
> "spark-app-name": "fg-am00-raw12"
> }
> },
> "spec": {
> "volumes": [
> {
> "name": "experimental-service-account-token-msmth",
> "secret": {
> "secretName": "experimental-service-account-token-msmth",
> "defaultMode": 420
> }
> }
> ],
> "containers": [
> {
> "name": "spark-kubernetes-driver",
> "image": "docker.com/cmapp/fg-am00-raw:1.0.0",
> "args": [
> "driver"
> ],
> "env": [
> {
> "name": "SPARK_DRIVER_MEMORY",
> "value": "1g"
> },
> {
> "name": "SPARK_DRIVER_CLASS",
> "value": "com.myapp.App"
> },
> {
> "name": "SPARK_DRIVER_ARGS",
> "value": "-c /opt/spark/work-dir/app/main/environments/int -w 
> ./../../workflows/workflow_main.json -e prod -n features -v off"
> },
> {
> "name": "SPARK_DRIVER_BIND_ADDRESS",
> "valueFrom": {
> "fieldRef": {
> "apiVersion": "v1",
> "fieldPath": "status.podIP"
> }
> }
> },
> {
> "name": "SPARK_MOUNTED_CLASSPATH",
> "value": 
> "/opt/spark/jars/aws-java-sdk-1.7.4.jar:/opt/spark/jars/hadoop-aws-2.7.3.jar:/opt/spark/jars/guava-14.0.1.jar:/opt/spark/jars/datacleanup-component-1.0-SNAPSHOT.jar:/opt/spark/jars/SparkApp.jar"
> },
> {

[jira] [Commented] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444899#comment-16444899
 ] 

Yinan Li commented on SPARK-24028:
--

2.3.0 does create a configmap for the init-container if one is used. See 
[https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/DriverInitContainerBootstrapStep.scala#L54.]
 The content of this configmap is used when the init-container starts.

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444890#comment-16444890
 ] 

Yinan Li edited comment on SPARK-24028 at 4/19/18 10:14 PM:


I run a 1.9.6 cluster. No, I was using the 2.3.0 release. The configmap I was 
referring to was for the init-container.


was (Author: liyinan926):
I run a 1.9.6 cluster. No, I was using the 2.3.0 release.

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444890#comment-16444890
 ] 

Yinan Li commented on SPARK-24028:
--

I run a 1.9.6 cluster. No, I was using the 2.3.0 release.

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444856#comment-16444856
 ] 

Yinan Li commented on SPARK-24028:
--

I am also running a 1.9 cluster on GKE and I have never run into the issue you 
mentioned above. I do often see events on the driver pod showing that the 
configmap failed to mount, but eventually retries just succeeded. I believe a 
pod won't start running if any of the specified volumes (being it a secret 
volume, a configmap volume, or something else) fail to mount, and Kubernetes 
also retries mounting volumes that it failed to mount when the pod first 
started. 

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23638) Spark on k8s: spark.kubernetes.initContainer.image has no effect

2018-04-16 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440067#comment-16440067
 ] 

Yinan Li commented on SPARK-23638:
--

Can this be closed?

> Spark on k8s: spark.kubernetes.initContainer.image has no effect
> 
>
> Key: SPARK-23638
> URL: https://issues.apache.org/jira/browse/SPARK-23638
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
> Environment: K8 server: Ubuntu 16.04
> Submission client: macOS Sierra 10.12.x
> Client Version: version.Info\{Major:"1", Minor:"9", GitVersion:"v1.9.3", 
> GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", 
> BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", 
> Platform:"darwin/amd64"}
> Server Version: version.Info\{Major:"1", Minor:"8", GitVersion:"v1.8.3", 
> GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", 
> BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", 
> Platform:"linux/amd64"}
>Reporter: maheshvra
>Priority: Major
>
> Hi all - I am trying to use initContainer to download remote dependencies. To 
> begin with, I ran a test with initContainer which basically "echo hello 
> world". However, when i triggered the pod deployment via spark-submit, I did 
> not see any trace of initContainer execution in my kubernetes cluster.
>  
> {code:java}
> SPARK_DRIVER_MEMORY: 1g 
> SPARK_DRIVER_CLASS: com.bigdata.App SPARK_DRIVER_ARGS: -c 
> /opt/spark/work-dir/app/main/environments/int -w 
> ./../../workflows/workflow_main.json -e prod -n features -v off 
> SPARK_DRIVER_BIND_ADDRESS:  
> SPARK_JAVA_OPT_0: -Dspark.submit.deployMode=cluster 
> SPARK_JAVA_OPT_1: -Dspark.driver.blockManager.port=7079 
> SPARK_JAVA_OPT_2: -Dspark.app.name=fg-am00-raw12 
> SPARK_JAVA_OPT_3: 
> -Dspark.kubernetes.container.image=docker.com/cmapp/fg-am00-raw:1.0.0 
> SPARK_JAVA_OPT_4: -Dspark.app.id=spark-4fa9a5ce1b1d401fa9c1e413ff030d44 
> SPARK_JAVA_OPT_5: 
> -Dspark.jars=/opt/spark/jars/aws-java-sdk-1.7.4.jar,/opt/spark/jars/hadoop-aws-2.7.3.jar,/opt/spark/jars/guava-14.0.1.jar,/opt/spark/jars/SparkApp.jar,/opt/spark/jars/datacleanup-component-1.0-SNAPSHOT.jar
>  
> SPARK_JAVA_OPT_6: -Dspark.driver.port=7078 
> SPARK_JAVA_OPT_7: 
> -Dspark.kubernetes.initContainer.image=docker.com/cmapp/custombusybox:1.0.0 
> SPARK_JAVA_OPT_8: 
> -Dspark.kubernetes.executor.podNamePrefix=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615
>  
> SPARK_JAVA_OPT_9: 
> -Dspark.kubernetes.driver.pod.name=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver
>  
> SPARK_JAVA_OPT_10: 
> -Dspark.driver.host=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver-svc.experimental.svc
>  SPARK_JAVA_OPT_11: -Dspark.executor.instances=5 
> SPARK_JAVA_OPT_12: 
> -Dspark.hadoop.fs.s3a.server-side-encryption-algorithm=AES256 
> SPARK_JAVA_OPT_13: -Dspark.kubernetes.namespace=experimental 
> SPARK_JAVA_OPT_14: 
> -Dspark.kubernetes.authenticate.driver.serviceAccountName=experimental-service-account
>  SPARK_JAVA_OPT_15: -Dspark.master=k8s://https://bigdata
> {code}
>  
> Further, I did not see spec.initContainers section in the generated pod. 
> Please see the details below
>  
> {code:java}
>  
> {
> "kind": "Pod",
> "apiVersion": "v1",
> "metadata": {
> "name": "fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver",
> "namespace": "experimental",
> "selfLink": 
> "/api/v1/namespaces/experimental/pods/fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver",
> "uid": "adc5a50a-2342-11e8-87dc-12c5b3954044",
> "resourceVersion": "299054",
> "creationTimestamp": "2018-03-09T02:36:32Z",
> "labels": {
> "spark-app-selector": "spark-4fa9a5ce1b1d401fa9c1e413ff030d44",
> "spark-role": "driver"
> },
> "annotations": {
> "spark-app-name": "fg-am00-raw12"
> }
> },
> "spec": {
> "volumes": [
> {
> "name": "experimental-service-account-token-msmth",
> "secret": {
> "secretName": "experimental-service-account-token-msmth",
> "defaultMode": 420
> }
> }
> ],
> "containers": [
> {
> "name": "spark-kubernetes-driver",
> "image": "docker.com/cmapp/fg-am00-raw:1.0.0",
> "args": [
> "driver"
> ],
> "env": [
> {
> "name": "SPARK_DRIVER_MEMORY",
> "value": "1g"
> },
> {
> "name": "SPARK_DRIVER_CLASS",
> "value": "com.myapp.App"
> },
> {
> "name": "SPARK_DRIVER_ARGS",
> "value": "-c /opt/spark/work-dir/app/main/environments/int -w 
> ./../../workflows/workflow_main.json -e prod -n features -v off"
> },
> {
> "name": "SPARK_DRIVER_BIND_ADDRESS",
> "valueFrom": {
> "fieldRef": {
> "apiVersion": "v1",
> "fieldPath": "status.podIP"
> }
> }
> },
> {
> "name": "SPARK_MOUNTED_CLASSPATH",
> "value": 
> 

[jira] [Commented] (SPARK-23638) Spark on k8s: spark.kubernetes.initContainer.image has no effect

2018-03-16 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402070#comment-16402070
 ] 

Yinan Li commented on SPARK-23638:
--

The Kubernetes-specific submission client will only add an init-container to 
the driver and executor pods if there is any remote dependencies to download. 
Otherwise, it won't regardless if you specify 
\{{spark.kubernetes.initContainer.image}}.

> Spark on k8s: spark.kubernetes.initContainer.image has no effect
> 
>
> Key: SPARK-23638
> URL: https://issues.apache.org/jira/browse/SPARK-23638
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
> Environment: K8 server: Ubuntu 16.04
> Submission client: macOS Sierra 10.12.x
> Client Version: version.Info\{Major:"1", Minor:"9", GitVersion:"v1.9.3", 
> GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", 
> BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", 
> Platform:"darwin/amd64"}
> Server Version: version.Info\{Major:"1", Minor:"8", GitVersion:"v1.8.3", 
> GitCommit:"f0efb3cb883751c5ffdbe6d515f3cb4fbe7b7acd", GitTreeState:"clean", 
> BuildDate:"2017-11-08T18:27:48Z", GoVersion:"go1.8.3", Compiler:"gc", 
> Platform:"linux/amd64"}
>Reporter: maheshvra
>Priority: Major
>
> Hi all - I am trying to use initContainer to download remote dependencies. To 
> begin with, I ran a test with initContainer which basically "echo hello 
> world". However, when i triggered the pod deployment via spark-submit, I did 
> not see any trace of initContainer execution in my kubernetes cluster.
>  
> {code:java}
> SPARK_DRIVER_MEMORY: 1g 
> SPARK_DRIVER_CLASS: com.bigdata.App SPARK_DRIVER_ARGS: -c 
> /opt/spark/work-dir/app/main/environments/int -w 
> ./../../workflows/workflow_main.json -e prod -n features -v off 
> SPARK_DRIVER_BIND_ADDRESS:  
> SPARK_JAVA_OPT_0: -Dspark.submit.deployMode=cluster 
> SPARK_JAVA_OPT_1: -Dspark.driver.blockManager.port=7079 
> SPARK_JAVA_OPT_2: -Dspark.app.name=fg-am00-raw12 
> SPARK_JAVA_OPT_3: 
> -Dspark.kubernetes.container.image=docker.com/cmapp/fg-am00-raw:1.0.0 
> SPARK_JAVA_OPT_4: -Dspark.app.id=spark-4fa9a5ce1b1d401fa9c1e413ff030d44 
> SPARK_JAVA_OPT_5: 
> -Dspark.jars=/opt/spark/jars/aws-java-sdk-1.7.4.jar,/opt/spark/jars/hadoop-aws-2.7.3.jar,/opt/spark/jars/guava-14.0.1.jar,/opt/spark/jars/SparkApp.jar,/opt/spark/jars/datacleanup-component-1.0-SNAPSHOT.jar
>  
> SPARK_JAVA_OPT_6: -Dspark.driver.port=7078 
> SPARK_JAVA_OPT_7: 
> -Dspark.kubernetes.initContainer.image=docker.com/cmapp/custombusybox:1.0.0 
> SPARK_JAVA_OPT_8: 
> -Dspark.kubernetes.executor.podNamePrefix=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615
>  
> SPARK_JAVA_OPT_9: 
> -Dspark.kubernetes.driver.pod.name=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver
>  
> SPARK_JAVA_OPT_10: 
> -Dspark.driver.host=fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver-svc.experimental.svc
>  SPARK_JAVA_OPT_11: -Dspark.executor.instances=5 
> SPARK_JAVA_OPT_12: 
> -Dspark.hadoop.fs.s3a.server-side-encryption-algorithm=AES256 
> SPARK_JAVA_OPT_13: -Dspark.kubernetes.namespace=experimental 
> SPARK_JAVA_OPT_14: 
> -Dspark.kubernetes.authenticate.driver.serviceAccountName=experimental-service-account
>  SPARK_JAVA_OPT_15: -Dspark.master=k8s://https://bigdata
> {code}
>  
> Further, I did not see spec.initContainers section in the generated pod. 
> Please see the details below
>  
> {code:java}
>  
> {
> "kind": "Pod",
> "apiVersion": "v1",
> "metadata": {
> "name": "fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver",
> "namespace": "experimental",
> "selfLink": 
> "/api/v1/namespaces/experimental/pods/fg-am00-raw12-b1c8112b8536304ab0fc64fcc41e0615-driver",
> "uid": "adc5a50a-2342-11e8-87dc-12c5b3954044",
> "resourceVersion": "299054",
> "creationTimestamp": "2018-03-09T02:36:32Z",
> "labels": {
> "spark-app-selector": "spark-4fa9a5ce1b1d401fa9c1e413ff030d44",
> "spark-role": "driver"
> },
> "annotations": {
> "spark-app-name": "fg-am00-raw12"
> }
> },
> "spec": {
> "volumes": [
> {
> "name": "experimental-service-account-token-msmth",
> "secret": {
> "secretName": "experimental-service-account-token-msmth",
> "defaultMode": 420
> }
> }
> ],
> "containers": [
> {
> "name": "spark-kubernetes-driver",
> "image": "docker.com/cmapp/fg-am00-raw:1.0.0",
> "args": [
> "driver"
> ],
> "env": [
> {
> "name": "SPARK_DRIVER_MEMORY",
> "value": "1g"
> },
> {
> "name": "SPARK_DRIVER_CLASS",
> "value": "com.myapp.App"
> },
> {
> "name": "SPARK_DRIVER_ARGS",
> "value": "-c /opt/spark/work-dir/app/main/environments/int -w 
> ./../../workflows/workflow_main.json -e prod -n features -v off"
> },
> {
> "name": "SPARK_DRIVER_BIND_ADDRESS",
> "valueFrom": {
> "fieldRef": {
> "apiVersion": "v1",
> "fieldPath": "status.podIP"
> }
> }
> },
> 

[jira] [Updated] (SPARK-23571) Delete auxiliary Kubernetes resources upon application completion

2018-03-02 Thread Yinan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-23571:
-
Affects Version/s: 2.3.1

> Delete auxiliary Kubernetes resources upon application completion
> -
>
> Key: SPARK-23571
> URL: https://issues.apache.org/jira/browse/SPARK-23571
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Yinan Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23571) Delete auxiliary Kubernetes resources upon application completion

2018-03-02 Thread Yinan Li (JIRA)
Yinan Li created SPARK-23571:


 Summary: Delete auxiliary Kubernetes resources upon application 
completion
 Key: SPARK-23571
 URL: https://issues.apache.org/jira/browse/SPARK-23571
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Yinan Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23485) Kubernetes should support node blacklist

2018-02-23 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374757#comment-16374757
 ] 

Yinan Li edited comment on SPARK-23485 at 2/23/18 6:22 PM:
---

It's not that I'm too confident on the capability of Kubernetes to detect node 
problems. I just don't see it as a good practice of worrying about node 
problems at application level in a containerized environment running on a 
container orchestration system. For that reason, yes, I don't think Spark on 
Kubernetes should really need to worry about blacklisting nodes.


was (Author: liyinan926):
It's not that I'm too confident on the capability of Kubernetes to detect node 
problems. I just don't see it as a good practice of worrying about node 
problems at application level in a containerized environment running on a 
container orchestration system. Yes, I don't think Spark on Kubernetes should 
really need to worry about blacklisting nodes.

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23485) Kubernetes should support node blacklist

2018-02-23 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374757#comment-16374757
 ] 

Yinan Li commented on SPARK-23485:
--

It's not that I'm too confident on the capability of Kubernetes to detect node 
problems. I just don't see it as a good practice of worrying about node 
problems at application level in a containerized environment running on a 
container orchestration system. Yes, I don't think Spark on Kubernetes should 
really need to worry about blacklisting nodes.

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23485) Kubernetes should support node blacklist

2018-02-23 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374708#comment-16374708
 ] 

Yinan Li commented on SPARK-23485:
--

In the Yarn case, yes, it's possible that a node is missing a jar commonly 
needed by applications. In the Kubernetes mode, this will never be the case 
because containers either all have a particular jar locally or none of them has 
it. An image missing a dependency is problematic by itself. This consistency is 
one of the benefit of being containerized. Talking about node problems, 
detecting node problems and avoid scheduling pods onto problematic nodes are 
the concerns of the kubelets and the scheduler. Applications should not need to 
worry about if nodes are healthy or not. Node problems happening at runtime 
cause pods to be evicted from the problematic nodes and rescheduled somewhere 
else. Having applications be responsible for keeping track of problematic nodes 
and maintain a blacklist means unnecessarily jumping into the business of 
kubelets and the scheduler.

 

[~foxish]

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23485) Kubernetes should support node blacklist

2018-02-22 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373620#comment-16373620
 ] 

Yinan Li commented on SPARK-23485:
--

The Kubernetes scheduler backend simply creates executor pods through the 
Kubernetes API server, and the pods are scheduled by the Kubernetes scheduler 
to run on the available nodes. The scheduler backend is not interested nor it 
should know about the mapping from pods to nodes. Affinity and anti-affinity, 
or taint and toleration can be used to influence pod scheduling. But it's the 
Kubernetes scheduler and Kubelets' responsibilities to keep track of node 
problems and avoid scheduling pods onto problematic nodes.

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23485) Kubernetes should support node blacklist

2018-02-22 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373544#comment-16373544
 ] 

Yinan Li commented on SPARK-23485:
--

I'm not sure if node blacklisting applies to Kubernetes. In the Kubernetes 
mode, executors run in containers that in turn run in Kubernetes pods scheduled 
to run on available cluster nodes by the Kubernetes scheduler. The Kubernetes 
Spark scheduler backend does not keep track of nor really care about which 
nodes the pods run on. This is a concern of the Kubernetes scheduler.

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-02-08 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357500#comment-16357500
 ] 

Yinan Li edited comment on SPARK-23285 at 2/8/18 8:22 PM:
--

Given the complexity and significant impact of the changes proposed in 
[https://github.com/apache/spark/pull/20460] to the way Spark handles task 
scheduling, task parallelism, and dynamic resource allocation, etc., I'm 
thinking if we should instead introduce a K8s specific configuration property 
for specifying the executor cores that follows the Kubernetes 
[convention|https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu].
 It seems Mesos fine-grained mode does this with 
{{spark.mesos.mesosExecutor.cores}}. We can have something like 
{{spark.kubernetes.executor.cores}} that is only used for specifying the CPU 
core request for the executor pods. Existing configuration properties 
{{spark.executor.cores}} and {{spark.task.cpus}} still play their roles in task 
parallelism, task scheduling, etc. That is, {{spark.kubernetes.executor.cores}} 
only determines the physical CPU cores available to an executor. An executor 
can still run multiple tasks simultaneously if {{spark.executor.cores}} is a 
multiple of {{spark.task.cpus}}. If not set, 
{{spark.kubernetes.executor.cores}} falls back to {{spark.executor.cores}}. 
WDYT? 

 

[~felixcheung] [~jerryshao] [~jiangxb1987]


was (Author: liyinan926):
Given the complexity and significant impact of the changes proposed in 
[https://github.com/apache/spark/pull/20460] to the way Spark handles task 
scheduling, task parallelism, and dynamic resource allocation, etc., I'm 
thinking if we should instead introduce a K8s specific configuration property 
for specifying the executor cores that follows the Kubernetes 
[convention|https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu].
 It seems Mesos fine-grained mode does this with 
{{spark.mesos.mesosExecutor.cores}}. We can have something like 
{{spark.kubernetes.executor.cores}} that is only used for specifying the CPU 
core request for the executor pods. Existing configuration properties 
{{spark.executor.cores}} and {{spark.task.cpus}} still play their roles in task 
parallelism, task scheduling, etc. That is, {{spark.kubernetes.executor.cores}} 
only determines the physical CPU cores available to an executor. An executor 
can still run multiple tasks simultaneously if {{spark.executor.cores}} is a 
multiple of {{spark.task.cpus}}. If not set, 
{{spark.kubernetes.executor.cores}} falls back to {{spark.executor.cores}}. 
WDYT? 

> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-02-08 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357500#comment-16357500
 ] 

Yinan Li commented on SPARK-23285:
--

Given the complexity and significant impact of the changes proposed in 
[https://github.com/apache/spark/pull/20460] to the way Spark handles task 
scheduling, task parallelism, and dynamic resource allocation, etc., I'm 
thinking if we should instead introduce a K8s specific configuration property 
for specifying the executor cores that follows the Kubernetes 
[convention|https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#meaning-of-cpu].
 It seems Mesos fine-grained mode does this with 
{{spark.mesos.mesosExecutor.cores}}. We can have something like 
{{spark.kubernetes.executor.cores}} that is only used for specifying the CPU 
core request for the executor pods. Existing configuration properties 
{{spark.executor.cores}} and {{spark.task.cpus}} still play their roles in task 
parallelism, task scheduling, etc. That is, {{spark.kubernetes.executor.cores}} 
only determines the physical CPU cores available to an executor. An executor 
can still run multiple tasks simultaneously if {{spark.executor.cores}} is a 
multiple of {{spark.task.cpus}}. If not set, 
{{spark.kubernetes.executor.cores}} falls back to {{spark.executor.cores}}. 
WDYT? 

> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-01-31 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347484#comment-16347484
 ] 

Yinan Li commented on SPARK-23285:
--

Another option is to bypass that check for Kubernetes mode. This minimizes the 
code changes. Thoughts?

> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-01-31 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347267#comment-16347267
 ] 

Yinan Li commented on SPARK-23285:
--

FYI: we did this in our fork: 
https://github.com/apache-spark-on-k8s/spark/pull/361.

> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23257) Implement Kerberos Support in Kubernetes resource manager

2018-01-30 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345488#comment-16345488
 ] 

Yinan Li commented on SPARK-23257:
--

[~RJKeevil] AFAIK, no one is working on upstreaming this yet. However, I think 
the consensus is that we need to first address 
https://issues.apache.org/jira/browse/SPARK-22839 before pushing more features 
upstream. The work in [https://github.com/apache-spark-on-k8s/spark/pull/540] 
adds more configuration steps to the mix, so probably is not going to be 
upstreamed until the refactoring is done.   

> Implement Kerberos Support in Kubernetes resource manager
> -
>
> Key: SPARK-23257
> URL: https://issues.apache.org/jira/browse/SPARK-23257
> Project: Spark
>  Issue Type: Wish
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Rob Keevil
>Priority: Major
>
> On the forked k8s branch of Spark at 
> [https://github.com/apache-spark-on-k8s/spark/pull/540] , Kerberos support 
> has been added to the Kubernetes resource manager.  The Kubernetes code 
> between these two repositories appears to have diverged, so this commit 
> cannot be merged in easily.  Are there any plans to re-implement this work on 
> the main Spark repository?
>  
> [ifilonenko|https://github.com/ifilonenko] [~liyinan926] I am happy to help 
> with the development and testing of this, but i wanted to confirm that this 
> isn't already in progress -  I could not find any discussion about this 
> specific topic online.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23153) Support application dependencies in submission client's local file system

2018-01-18 Thread Yinan Li (JIRA)
Yinan Li created SPARK-23153:


 Summary: Support application dependencies in submission client's 
local file system
 Key: SPARK-23153
 URL: https://issues.apache.org/jira/browse/SPARK-23153
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Yinan Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22962) Kubernetes app fails if local files are used

2018-01-18 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331132#comment-16331132
 ] 

Yinan Li commented on SPARK-22962:
--

I agree that before we upstream the staging server, we should fail the 
submission if a user uses local resources. [~vanzin], if it's not too late to 
get into 2.3, I'm gonna file a PR for this.

> Kubernetes app fails if local files are used
> 
>
> Key: SPARK-22962
> URL: https://issues.apache.org/jira/browse/SPARK-22962
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> If you try to start a Spark app on kubernetes using a local file as the app 
> resource, for example, it will fail:
> {code}
> ./bin/spark-submit [[bunch of arguments]] /path/to/local/file.jar
> {code}
> {noformat}
> + /sbin/tini -s -- /bin/sh -c 'SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && 
> env | grep SPARK_JAVA_OPT_ | sed '\''s/[^=]*=\(.*\)/\1/g'
> \'' > /tmp/java_opts.txt && readarray -t SPARK_DRIVER_JAVA_OPTS < 
> /tmp/java_opts.txt && if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x}
>  ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi &&   
>   if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SP
> ARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if 
> ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK
> _MOUNTED_FILES_DIR/." .; fi && ${JAVA_HOME}/bin/java 
> "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMOR
> Y -Xmx$SPARK_DRIVER_MEMORY 
> -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS 
> $SPARK_DRIVER_ARGS'
> Error: Could not find or load main class com.cloudera.spark.tests.Sleeper
> {noformat}
> Using an http server to provide the app jar solves the problem.
> The k8s backend should either somehow make these files available to the 
> cluster or error out with a more user-friendly message if that feature is not 
> yet available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23137) spark.kubernetes.executor.podNamePrefix is ignored

2018-01-17 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329713#comment-16329713
 ] 

Yinan Li commented on SPARK-23137:
--

It's actually marked as an \{{internal}} config property. So the fix could be 
either removing it from the docs, or removing the \{{internal}} mark and 
respecting what users set.

> spark.kubernetes.executor.podNamePrefix is ignored
> --
>
> Key: SPARK-23137
> URL: https://issues.apache.org/jira/browse/SPARK-23137
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> [~liyinan926] is fixing this as we speak. Should be a very minor change.
> It's also a non-critical option, so, if we decide that the safer thing is to 
> just remove it, we can do that as well. Will leave that decision to the 
> release czar and reviewers.
>  
> [~vanzin] [~felixcheung] [~sameerag]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22998) Value for SPARK_MOUNTED_CLASSPATH in executor pods is not set

2018-01-08 Thread Yinan Li (JIRA)
Yinan Li created SPARK-22998:


 Summary: Value for SPARK_MOUNTED_CLASSPATH in executor pods is not 
set
 Key: SPARK-22998
 URL: https://issues.apache.org/jira/browse/SPARK-22998
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Yinan Li
 Fix For: 2.3.0


The environment variable {{SPARK_MOUNTED_CLASSPATH}} is referenced by the 
executor's Dockerfile, but is not set by the k8s scheduler backend.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22953) Duplicated secret volumes in Spark pods when init-containers are used

2018-01-03 Thread Yinan Li (JIRA)
Yinan Li created SPARK-22953:


 Summary: Duplicated secret volumes in Spark pods when 
init-containers are used
 Key: SPARK-22953
 URL: https://issues.apache.org/jira/browse/SPARK-22953
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Yinan Li
 Fix For: 2.3.0


User-specified secrets are mounted into both the main container and 
init-container (when it is used) in a Spark driver/executor pod, using the 
{{MountSecretsBootstrap}}. Because {{MountSecretsBootstrap}} always adds the 
secret volumes to the pod, the same secret volumes get added twice, one when 
mounting the secrets to the main container, and the other when mounting the 
secrets to the init-container. See 
https://github.com/apache-spark-on-k8s/spark/issues/594.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22839) Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction

2017-12-19 Thread Yinan Li (JIRA)
Yinan Li created SPARK-22839:


 Summary: Refactor Kubernetes code for configuring driver/executor 
pods to use consistent and cleaner abstraction
 Key: SPARK-22839
 URL: https://issues.apache.org/jira/browse/SPARK-22839
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Yinan Li


As discussed in https://github.com/apache/spark/pull/19954, the current code 
for configuring the driver pod vs the code for configuring the executor pods 
are using the same abstraction. Besides that, the current code leaves a lot to 
be desired in terms of the level and cleaness of abstraction. For example, the 
current code is passing around many pieces of information around different 
class hierarchies, which makes code review and maintenance challenging. We need 
some thorough refactoring of the current code to achieve better, cleaner, and 
consistent abstraction.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22778) Kubernetes scheduler at master failing to run applications successfully

2017-12-13 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289907#comment-16289907
 ] 

Yinan Li commented on SPARK-22778:
--

Just verified that the fix worked. I'm gonna send a PR soon.

> Kubernetes scheduler at master failing to run applications successfully
> ---
>
> Key: SPARK-22778
> URL: https://issues.apache.org/jira/browse/SPARK-22778
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Critical
>
> Building images based on master and deploying Spark PI results in the 
> following error.
> 2017-12-13 19:57:19 INFO  SparkContext:54 - Successfully stopped SparkContext
> Exception in thread "main" org.apache.spark.SparkException: Could not parse 
> Master URL: 'k8s:https://xx.yy.zz.ww'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
>   at org.apache.spark.SparkContext.(SparkContext.scala:496)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2490)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:927)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:918)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:918)
>   at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>   at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Shutdown hook called
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Deleting directory 
> /tmp/spark-b47515c2-6750-4a37-aa68-6ee12da5d2bd
> This is likely an artifact seen because of changes in master, or our 
> submission code in the reviews. We haven't seen this on our fork. Hopefully 
> once integration tests are ported against upstream/master, we will catch 
> these issues earlier. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22778) Kubernetes scheduler at master failing to run applications successfully

2017-12-13 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289876#comment-16289876
 ] 

Yinan Li commented on SPARK-22778:
--

Ah, yes, the PR missed that. OK, I'm gonna give that a try and submit a PR to 
fix it.

> Kubernetes scheduler at master failing to run applications successfully
> ---
>
> Key: SPARK-22778
> URL: https://issues.apache.org/jira/browse/SPARK-22778
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Critical
>
> Building images based on master and deploying Spark PI results in the 
> following error.
> 2017-12-13 19:57:19 INFO  SparkContext:54 - Successfully stopped SparkContext
> Exception in thread "main" org.apache.spark.SparkException: Could not parse 
> Master URL: 'k8s:https://xx.yy.zz.ww'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
>   at org.apache.spark.SparkContext.(SparkContext.scala:496)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2490)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:927)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:918)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:918)
>   at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>   at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Shutdown hook called
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Deleting directory 
> /tmp/spark-b47515c2-6750-4a37-aa68-6ee12da5d2bd
> This is likely an artifact seen because of changes in master, or our 
> submission code in the reviews. We haven't seen this on our fork. Hopefully 
> once integration tests are ported against upstream/master, we will catch 
> these issues earlier. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22778) Kubernetes scheduler at master failing to run applications successfully

2017-12-13 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289822#comment-16289822
 ] 

Yinan Li edited comment on SPARK-22778 at 12/13/17 8:24 PM:


Just some background on this. The validation and parsing of k8s master url has 
been moved to SparkSubmit as being suggested in the review. The parsed master 
URL (https://... for example) is appended a {{k8s}} prefix after the parsing to 
satisfy {{KubernetesClusterManager}}, whose {{canCreate}} method is based on if 
the master URL starts {{k8s}}. That's why you see the {{k8s:}} prefix. The 
issue seems that in the driver pod {{SparkContext}} could not find 
{{KubernetesClusterManager}} based on the debug messages I added. The code that 
triggered the error (with the debugging I added) is as follows:

{code:java}
private def getClusterManager(url: String): Option[ExternalClusterManager] = {
val loader = Utils.getContextOrSparkClassLoader
val serviceLoaders =
  ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala
serviceLoaders.foreach { loader =>
  logInfo(s"Found the following external cluster manager: $loader")
}

val filteredServiceLoaders = serviceLoaders.filter(_.canCreate(url))
if (filteredServiceLoaders.size > 1) {
  throw new SparkException(
s"Multiple external cluster managers registered for the url $url: 
$serviceLoaders")
} else if (filteredServiceLoaders.isEmpty) {
  logWarning(s"No external cluster manager registered for url $url")
}
filteredServiceLoaders.headOption
  }
{code}

And I got the following:
{code:java}
No external cluster manager registered for url k8s:https://35.226.8.173
{code}



was (Author: liyinan926):
Just some background on this. The validation and parsing of k8s master url has 
been moved to SparkSubmit as being suggested in the review. The parsed master 
URL (https://... for example) is appended a {{k8s}} prefix after the parsing to 
satisfy {{KubernetesClusterManager}}, whose {{canCreate}} method is based on if 
the master URL starts {{k8s}}. That's why you see the {{k8s:}} prefix. The 
issue seems that in the driver pod {{SparkContext}} could not find 
{{KubernetesClusterManager}} based on the debug messages I added:

{code:scala}
private def getClusterManager(url: String): Option[ExternalClusterManager] = {
val loader = Utils.getContextOrSparkClassLoader
val serviceLoaders =
  ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala
serviceLoaders.foreach { loader =>
  logInfo(s"Found the following external cluster manager: $loader")
}

val filteredServiceLoaders = serviceLoaders.filter(_.canCreate(url))
if (filteredServiceLoaders.size > 1) {
  throw new SparkException(
s"Multiple external cluster managers registered for the url $url: 
$serviceLoaders")
} else if (filteredServiceLoaders.isEmpty) {
  logWarning(s"No external cluster manager registered for url $url")
}
filteredServiceLoaders.headOption
  }
{code}

And I got the following:
{code:java}
No external cluster manager registered for url k8s:https://35.226.8.173
{code}


> Kubernetes scheduler at master failing to run applications successfully
> ---
>
> Key: SPARK-22778
> URL: https://issues.apache.org/jira/browse/SPARK-22778
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>
> Building images based on master and deploying Spark PI results in the 
> following error.
> 2017-12-13 19:57:19 INFO  SparkContext:54 - Successfully stopped SparkContext
> Exception in thread "main" org.apache.spark.SparkException: Could not parse 
> Master URL: 'k8s:https://xx.yy.zz.ww'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
>   at org.apache.spark.SparkContext.(SparkContext.scala:496)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2490)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:927)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:918)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:918)
>   at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>   at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Shutdown hook called
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Deleting directory 
> /tmp/spark-b47515c2-6750-4a37-aa68-6ee12da5d2bd
> This is likely an artifact seen because of changes in master, or our 
> submission code in the reviews. 

[jira] [Commented] (SPARK-22778) Kubernetes scheduler at master failing to run applications successfully

2017-12-13 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289822#comment-16289822
 ] 

Yinan Li commented on SPARK-22778:
--

Just some background on this. The validation and parsing of k8s master url has 
been moved to SparkSubmit as being suggested in the review. The parsed master 
URL (https://... for example) is appended a {{k8s}} prefix after the parsing to 
satisfy {{KubernetesClusterManager}}, whose {{canCreate}} method is based on if 
the master URL starts {{k8s}}. That's why you see the {{k8s:}} prefix. The 
issue seems that in the driver pod {{SparkContext}} could not find 
{{KubernetesClusterManager}} based on the debug messages I added:

{code:scala}
private def getClusterManager(url: String): Option[ExternalClusterManager] = {
val loader = Utils.getContextOrSparkClassLoader
val serviceLoaders =
  ServiceLoader.load(classOf[ExternalClusterManager], loader).asScala
serviceLoaders.foreach { loader =>
  logInfo(s"Found the following external cluster manager: $loader")
}

val filteredServiceLoaders = serviceLoaders.filter(_.canCreate(url))
if (filteredServiceLoaders.size > 1) {
  throw new SparkException(
s"Multiple external cluster managers registered for the url $url: 
$serviceLoaders")
} else if (filteredServiceLoaders.isEmpty) {
  logWarning(s"No external cluster manager registered for url $url")
}
filteredServiceLoaders.headOption
  }
{code}

And I got the following:
{code:java}
No external cluster manager registered for url k8s:https://35.226.8.173
{code}


> Kubernetes scheduler at master failing to run applications successfully
> ---
>
> Key: SPARK-22778
> URL: https://issues.apache.org/jira/browse/SPARK-22778
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>
> Building images based on master and deploying Spark PI results in the 
> following error.
> 2017-12-13 19:57:19 INFO  SparkContext:54 - Successfully stopped SparkContext
> Exception in thread "main" org.apache.spark.SparkException: Could not parse 
> Master URL: 'k8s:https://xx.yy.zz.ww'
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2741)
>   at org.apache.spark.SparkContext.(SparkContext.scala:496)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2490)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:927)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:918)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:918)
>   at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
>   at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Shutdown hook called
> 2017-12-13 19:57:19 INFO  ShutdownHookManager:54 - Deleting directory 
> /tmp/spark-b47515c2-6750-4a37-aa68-6ee12da5d2bd
> This is likely an artifact seen because of changes in master, or our 
> submission code in the reviews. We haven't seen this on our fork. Hopefully 
> once integration tests are ported against upstream/master, we will catch 
> these issues earlier. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18278) SPIP: Support native submission of spark jobs to a kubernetes cluster

2017-12-13 Thread Yinan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-18278:
-
Component/s: Kubernetes

> SPIP: Support native submission of spark jobs to a kubernetes cluster
> -
>
> Key: SPARK-18278
> URL: https://issues.apache.org/jira/browse/SPARK-18278
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Deploy, Documentation, Kubernetes, Scheduler, 
> Spark Core
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>  Labels: SPIP
> Attachments: SPARK-18278 Spark on Kubernetes Design Proposal Revision 
> 2 (1).pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting 
> Spark applications to a kubernetes cluster.   The submitted application runs 
> in a driver executing on a kubernetes pod, and executors lifecycles are also 
> managed as pods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22757) Init-container in the driver/executor pods for downloading remote dependencies

2017-12-13 Thread Yinan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16289378#comment-16289378
 ] 

Yinan Li commented on SPARK-22757:
--

Yes, this is also targeting 2.3. 

> Init-container in the driver/executor pods for downloading remote dependencies
> --
>
> Key: SPARK-22757
> URL: https://issues.apache.org/jira/browse/SPARK-22757
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22757) Init-container in the driver/executor pods for downloading remote dependencies

2017-12-11 Thread Yinan Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yinan Li updated SPARK-22757:
-
Component/s: Kubernetes

> Init-container in the driver/executor pods for downloading remote dependencies
> --
>
> Key: SPARK-22757
> URL: https://issues.apache.org/jira/browse/SPARK-22757
> Project: Spark
>  Issue Type: Sub-task
>  Components: Deploy, Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >