from:"Anirudh Ramanathan \(JIRA\)"

[jira] [Commented] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542135#comment-16542135
 ] 

Anirudh Ramanathan commented on SPARK-24793:


Great! I'll take a stab at a PR in a few days.




> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541990#comment-16541990
 ] 

Anirudh Ramanathan commented on SPARK-24793:


Good points Erik. These options are not without precedent - they exist and work 
for Mesos and Standalone mode. I agree that the operator is the desired way to 
build more automation, this item is more focused on the end user of spark 
submit who now has to learn two different commandline tools (spark-submit and 
kubectl) to effectively use Spark on k8s.

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-24793:
---
Description: 
Support controlling the lifecycle of Spark Application through spark-submit. 
For example:

{{ 
  --kill app_name   If given, kills the driver specified.
  --status app_name  If given, requests the status of the driver specified.
}}

Potentially also --list to list all spark drivers running.

Given that our submission client can actually launch jobs into many different 
namespaces, we'll need an additional specification of the namespace through a 
--namespace flag potentially.
I think this is pretty useful to have instead of forcing a user to use kubectl 
to manage the lifecycle of any k8s Spark Application.

  was:
Support controlling the lifecycle of Spark Application through spark-submit. 
For example:

```
  --kill app_name   If given, kills the driver specified.
  --status app_name  If given, requests the status of the driver specified.
```
Potentially also --list to list all spark drivers running.

Given that our submission client can actually launch jobs into many different 
namespaces, we'll need an additional specification of the namespace through a 
--namespace flag potentially.
I think this is pretty useful to have instead of forcing a user to use kubectl 
to manage the lifecycle of any k8s Spark Application.


> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541297#comment-16541297
 ] 

Anirudh Ramanathan commented on SPARK-24793:


This came up in our spark summit BoF session as a point of friction.
[~mccheah] [~liyinan926] [~eje] wdyt?

> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> {{ 
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> }}
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-24793:
---
Description: 
Support controlling the lifecycle of Spark Application through spark-submit. 
For example:

```
  --kill app_name   If given, kills the driver specified.
  --status app_name  If given, requests the status of the driver specified.
```
Potentially also --list to list all spark drivers running.

Given that our submission client can actually launch jobs into many different 
namespaces, we'll need an additional specification of the namespace through a 
--namespace flag potentially.
I think this is pretty useful to have instead of forcing a user to use kubectl 
to manage the lifecycle of any k8s Spark Application.

  was:
Support controlling the lifecycle of Spark Application through spark-submit. 
For example:

```
  --kill app_name   If given, kills the driver specified.
  --status app_name  If given, requests the status of the driver specified.
```

I think this is pretty useful to have instead of forcing a user to use kubectl 
to manage the lifecycle of any k8s Spark Application.


> Make spark-submit more useful with k8s
> --
>
> Key: SPARK-24793
> URL: https://issues.apache.org/jira/browse/SPARK-24793
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> Support controlling the lifecycle of Spark Application through spark-submit. 
> For example:
> ```
>   --kill app_name   If given, kills the driver specified.
>   --status app_name  If given, requests the status of the driver 
> specified.
> ```
> Potentially also --list to list all spark drivers running.
> Given that our submission client can actually launch jobs into many different 
> namespaces, we'll need an additional specification of the namespace through a 
> --namespace flag potentially.
> I think this is pretty useful to have instead of forcing a user to use 
> kubectl to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24793) Make spark-submit more useful with k8s

2018-07-12 Thread Anirudh Ramanathan (JIRA)

Anirudh Ramanathan created SPARK-24793:
--

 Summary: Make spark-submit more useful with k8s
 Key: SPARK-24793
 URL: https://issues.apache.org/jira/browse/SPARK-24793
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Anirudh Ramanathan
Assignee: Anirudh Ramanathan


Support controlling the lifecycle of Spark Application through spark-submit. 
For example:

```
  --kill app_name   If given, kills the driver specified.
  --status app_name  If given, requests the status of the driver specified.
```

I think this is pretty useful to have instead of forcing a user to use kubectl 
to manage the lifecycle of any k8s Spark Application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-07-12 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541272#comment-16541272
 ] 

Anirudh Ramanathan commented on SPARK-24434:


Thanks for kicking this design off Stavros. 
The proposal is a great start - I made some comments about some of the details.


> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24432) Add support for dynamic resource allocation

2018-07-12 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541239#comment-16541239
 ] 

Anirudh Ramanathan commented on SPARK-24432:


Hi Mark, we did it once before in our fork in a certain way, but it seems like 
some major changes are afoot in the general structure of dynamic allocation and 
the external shuffle service (at the Spark level, and not just k8s). Given 
that, we're holding off on this for 2.4. cc/ [~mcheah] 

The weekly meetings we have on SIG Big Data might be a good place to engage if 
you want to meet the community 
(https://github.com/kubernetes/community/tree/master/sig-big-data). 
We have a weekly Zoom meeting. All technical discussions and decisions happen 
on the Apache mailing list.

> Add support for dynamic resource allocation
> ---
>
> Key: SPARK-24432
> URL: https://issues.apache.org/jira/browse/SPARK-24432
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> This is an umbrella ticket for work on adding support for dynamic resource 
> allocation into the Kubernetes mode. This requires a Kubernetes-specific 
> external shuffle service. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23885) trying to spark submit 2.3.0 on minikube

2018-07-12 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-23885.

Resolution: Not A Bug

> trying to spark submit 2.3.0 on minikube
> 
>
> Key: SPARK-23885
> URL: https://issues.apache.org/jira/browse/SPARK-23885
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 2.3.0
>Reporter: anant pukale
>Assignee: Anirudh Ramanathan
>Priority: Major
>
>  spark-submit on minikube(kubernets) failing .
> Kindly refere link for details 
>  
> [https://stackoverflow.com/questions/49689298/exception-in-thread-main-org-apache-spark-sparkexception-must-specify-the-dri|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24428) Remove unused code and fix any related doc in K8s module

2018-07-02 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530391#comment-16530391
 ] 

Anirudh Ramanathan commented on SPARK-24428:


Resolved by https://github.com/apache/spark/pull/21462

> Remove unused code and fix any related doc in K8s module
> 
>
> Key: SPARK-24428
> URL: https://issues.apache.org/jira/browse/SPARK-24428
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.4.0
>
>
> There are some relics of previous refactoring like: 
> [https://github.com/apache/spark/blob/9e7bad0edd9f6c59c0af21c95e5df98cf82150d3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala#L63]
> Target is to cleanup anything not used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24428) Remove unused code and fix any related doc in K8s module

2018-07-02 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-24428.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Remove unused code and fix any related doc in K8s module
> 
>
> Key: SPARK-24428
> URL: https://issues.apache.org/jira/browse/SPARK-24428
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.4.0
>
>
> There are some relics of previous refactoring like: 
> [https://github.com/apache/spark/blob/9e7bad0edd9f6c59c0af21c95e5df98cf82150d3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala#L63]
> Target is to cleanup anything not used.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24547) Spark on K8s docker-image-tool.sh improvements

2018-06-20 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-24547.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Spark on K8s docker-image-tool.sh improvements
> --
>
> Key: SPARK-24547
> URL: https://issues.apache.org/jira/browse/SPARK-24547
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Ray Burgemeestre
>Priority: Minor
>  Labels: docker, kubernetes, spark
> Fix For: 2.4.0
>
>
> *Context*
> PySpark support for Spark on k8s was merged with 
> [https://github.com/apache/spark/pull/21092/files] few days ago
> There is a helper script that can be used to create docker containers to run 
> java and now also python jobs. It works like this:
> {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> build}}
>  {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> push}}
> *Problem*
> I ran into three two issues. First time I generated images for 2.4.0 Docker 
> was using it's cache, so actually when running jobs, old jars where still in 
> the Docker image. This produces errors like this in the executors:
> {code:java}
> 2018-06-13 10:27:52 INFO NettyBlockTransferService:54 - Server created on 
> 172.29.3.4:44877^M 2018-06-13 10:27:52 INFO BlockManager:54 - Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy^M 2018-06-13 10:27:52 INFO BlockManagerMaster:54 - Registering 
> BlockManager BlockManagerId(1, 172.29.3.4, 44877, None)^M 2018-06-13 10:27:52 
> ERROR CoarseGrainedExecutorBackend:91 - Executor self-exiting due to : Unable 
> to create executor due to Exception thrown in awaitResult: ^M 
> org.apache.spark.SparkException: Exception thrown in awaitResult: ^M ^Iat 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)^M ^Iat 
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)^M ^Iat 
> org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)^M
>  ^Iat 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:241)^M 
> ^Iat org.apache.spark.executor.Executor.(Executor.scala:116)^M ^Iat 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)^M
>  ^Iat 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)^M
>  ^Iat org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)^M ^Iat 
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)^M ^Iat 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)^M 
> ^Iat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)^M
>  ^Iat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)^M
>  ^Iat java.lang.Thread.run(Thread.java:748)^M Caused by: 
> java.lang.RuntimeException: java.io.InvalidClassException: 
> org.apache.spark.storage.BlockManagerId; local class incompatible: stream 
> classdesc serialVersionUID = 6155820641931972169, local class 
> serialVersionUID = -3720498261147521051^M ^Iat 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)^M ^Iat 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)^M 
> ^Iat java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)^M
> {code}
> To avoid that Docker has to build without it's cache, but only if you have 
> build for an older version in the past...
> The second problem was that the spark container is pushed, but the spark-py 
> container wasn't yet. This was just forgotten in the initial PR.
> (A third problem I also ran into because I had an older docker was 
> [https://github.com/apache/spark/pull/21551] so I have not included a fix for 
> that in this ticket.)
> Other than that it works great!
> *Solution*
> I've added an extra flag so it's possible to call build with `-n` for 
> --no-cache`.
> And I've added the extra push for the spark-py container.
> *Example*
> ./bin/docker-image-tool.sh -r docker.io/myrepo -t v2.3.0 -n build
> Snippet from the help output:
> {code:java}
> Options:
> -f file Dockerfile to build for JVM based Jobs. By default builds the 
> Dockerfile shipped with Spark.
> -p file Dockerfile with Python baked in. By default builds the Dockerfile 
> shipped with Spark.
> -r repo Repository address.
> -t tag Tag to apply to the built image, or to identify the image to be pushed.
> -m Use minikube's Docker daemon.
> -n Build docker image with --no-cache{code}



--
This message was

[jira] [Commented] (SPARK-24547) Spark on K8s docker-image-tool.sh improvements

2018-06-20 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518738#comment-16518738
 ] 

Anirudh Ramanathan commented on SPARK-24547:


Resolved by https://github.com/apache/spark/pull/21555

> Spark on K8s docker-image-tool.sh improvements
> --
>
> Key: SPARK-24547
> URL: https://issues.apache.org/jira/browse/SPARK-24547
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Ray Burgemeestre
>Priority: Minor
>  Labels: docker, kubernetes, spark
> Fix For: 2.4.0
>
>
> *Context*
> PySpark support for Spark on k8s was merged with 
> [https://github.com/apache/spark/pull/21092/files] few days ago
> There is a helper script that can be used to create docker containers to run 
> java and now also python jobs. It works like this:
> {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> build}}
>  {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> push}}
> *Problem*
> I ran into three two issues. First time I generated images for 2.4.0 Docker 
> was using it's cache, so actually when running jobs, old jars where still in 
> the Docker image. This produces errors like this in the executors:
> {code:java}
> 2018-06-13 10:27:52 INFO NettyBlockTransferService:54 - Server created on 
> 172.29.3.4:44877^M 2018-06-13 10:27:52 INFO BlockManager:54 - Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy^M 2018-06-13 10:27:52 INFO BlockManagerMaster:54 - Registering 
> BlockManager BlockManagerId(1, 172.29.3.4, 44877, None)^M 2018-06-13 10:27:52 
> ERROR CoarseGrainedExecutorBackend:91 - Executor self-exiting due to : Unable 
> to create executor due to Exception thrown in awaitResult: ^M 
> org.apache.spark.SparkException: Exception thrown in awaitResult: ^M ^Iat 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)^M ^Iat 
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)^M ^Iat 
> org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)^M
>  ^Iat 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:241)^M 
> ^Iat org.apache.spark.executor.Executor.(Executor.scala:116)^M ^Iat 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)^M
>  ^Iat 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)^M
>  ^Iat org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)^M ^Iat 
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)^M ^Iat 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)^M 
> ^Iat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)^M
>  ^Iat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)^M
>  ^Iat java.lang.Thread.run(Thread.java:748)^M Caused by: 
> java.lang.RuntimeException: java.io.InvalidClassException: 
> org.apache.spark.storage.BlockManagerId; local class incompatible: stream 
> classdesc serialVersionUID = 6155820641931972169, local class 
> serialVersionUID = -3720498261147521051^M ^Iat 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)^M ^Iat 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)^M 
> ^Iat java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)^M
> {code}
> To avoid that Docker has to build without it's cache, but only if you have 
> build for an older version in the past...
> The second problem was that the spark container is pushed, but the spark-py 
> container wasn't yet. This was just forgotten in the initial PR.
> (A third problem I also ran into because I had an older docker was 
> [https://github.com/apache/spark/pull/21551] so I have not included a fix for 
> that in this ticket.)
> Other than that it works great!
> *Solution*
> I've added an extra flag so it's possible to call build with `-n` for 
> --no-cache`.
> And I've added the extra push for the spark-py container.
> *Example*
> ./bin/docker-image-tool.sh -r docker.io/myrepo -t v2.3.0 -n build
> Snippet from the help output:
> {code:java}
> Options:
> -f file Dockerfile to build for JVM based Jobs. By default builds the 
> Dockerfile shipped with Spark.
> -p file Dockerfile with Python baked in. By default builds the Dockerfile 
> shipped with Spark.
> -r repo Repository address.
> -t tag Tag to apply to the built image, or to identify the image to be pushed.
> -m Use minikube's Docker daemon.
> -n Build docker image

[jira] [Assigned] (SPARK-24547) Spark on K8s docker-image-tool.sh improvements

2018-06-20 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-24547:
--

Assignee: (was: Anirudh Ramanathan)

> Spark on K8s docker-image-tool.sh improvements
> --
>
> Key: SPARK-24547
> URL: https://issues.apache.org/jira/browse/SPARK-24547
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Ray Burgemeestre
>Priority: Minor
>  Labels: docker, kubernetes, spark
> Fix For: 2.4.0
>
>
> *Context*
> PySpark support for Spark on k8s was merged with 
> [https://github.com/apache/spark/pull/21092/files] few days ago
> There is a helper script that can be used to create docker containers to run 
> java and now also python jobs. It works like this:
> {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> build}}
>  {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> push}}
> *Problem*
> I ran into three two issues. First time I generated images for 2.4.0 Docker 
> was using it's cache, so actually when running jobs, old jars where still in 
> the Docker image. This produces errors like this in the executors:
> {code:java}
> 2018-06-13 10:27:52 INFO NettyBlockTransferService:54 - Server created on 
> 172.29.3.4:44877^M 2018-06-13 10:27:52 INFO BlockManager:54 - Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy^M 2018-06-13 10:27:52 INFO BlockManagerMaster:54 - Registering 
> BlockManager BlockManagerId(1, 172.29.3.4, 44877, None)^M 2018-06-13 10:27:52 
> ERROR CoarseGrainedExecutorBackend:91 - Executor self-exiting due to : Unable 
> to create executor due to Exception thrown in awaitResult: ^M 
> org.apache.spark.SparkException: Exception thrown in awaitResult: ^M ^Iat 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)^M ^Iat 
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)^M ^Iat 
> org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)^M
>  ^Iat 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:241)^M 
> ^Iat org.apache.spark.executor.Executor.(Executor.scala:116)^M ^Iat 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)^M
>  ^Iat 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)^M
>  ^Iat org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)^M ^Iat 
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)^M ^Iat 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)^M 
> ^Iat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)^M
>  ^Iat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)^M
>  ^Iat java.lang.Thread.run(Thread.java:748)^M Caused by: 
> java.lang.RuntimeException: java.io.InvalidClassException: 
> org.apache.spark.storage.BlockManagerId; local class incompatible: stream 
> classdesc serialVersionUID = 6155820641931972169, local class 
> serialVersionUID = -3720498261147521051^M ^Iat 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)^M ^Iat 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)^M 
> ^Iat java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)^M
> {code}
> To avoid that Docker has to build without it's cache, but only if you have 
> build for an older version in the past...
> The second problem was that the spark container is pushed, but the spark-py 
> container wasn't yet. This was just forgotten in the initial PR.
> (A third problem I also ran into because I had an older docker was 
> [https://github.com/apache/spark/pull/21551] so I have not included a fix for 
> that in this ticket.)
> Other than that it works great!
> *Solution*
> I've added an extra flag so it's possible to call build with `-n` for 
> --no-cache`.
> And I've added the extra push for the spark-py container.
> *Example*
> ./bin/docker-image-tool.sh -r docker.io/myrepo -t v2.3.0 -n build
> Snippet from the help output:
> {code:java}
> Options:
> -f file Dockerfile to build for JVM based Jobs. By default builds the 
> Dockerfile shipped with Spark.
> -p file Dockerfile with Python baked in. By default builds the Dockerfile 
> shipped with Spark.
> -r repo Repository address.
> -t tag Tag to apply to the built image, or to identify the image to be pushed.
> -m Use minikube's Docker daemon.
> -n Build docker image with --no-cache{code}



--
This message was

[jira] [Assigned] (SPARK-24547) Spark on K8s docker-image-tool.sh improvements

2018-06-20 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-24547:
--

Assignee: Anirudh Ramanathan

> Spark on K8s docker-image-tool.sh improvements
> --
>
> Key: SPARK-24547
> URL: https://issues.apache.org/jira/browse/SPARK-24547
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Ray Burgemeestre
>Assignee: Anirudh Ramanathan
>Priority: Minor
>  Labels: docker, kubernetes, spark
>
> *Context*
> PySpark support for Spark on k8s was merged with 
> [https://github.com/apache/spark/pull/21092/files] few days ago
> There is a helper script that can be used to create docker containers to run 
> java and now also python jobs. It works like this:
> {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> build}}
>  {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 
> push}}
> *Problem*
> I ran into three two issues. First time I generated images for 2.4.0 Docker 
> was using it's cache, so actually when running jobs, old jars where still in 
> the Docker image. This produces errors like this in the executors:
> {code:java}
> 2018-06-13 10:27:52 INFO NettyBlockTransferService:54 - Server created on 
> 172.29.3.4:44877^M 2018-06-13 10:27:52 INFO BlockManager:54 - Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy^M 2018-06-13 10:27:52 INFO BlockManagerMaster:54 - Registering 
> BlockManager BlockManagerId(1, 172.29.3.4, 44877, None)^M 2018-06-13 10:27:52 
> ERROR CoarseGrainedExecutorBackend:91 - Executor self-exiting due to : Unable 
> to create executor due to Exception thrown in awaitResult: ^M 
> org.apache.spark.SparkException: Exception thrown in awaitResult: ^M ^Iat 
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)^M ^Iat 
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)^M ^Iat 
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)^M ^Iat 
> org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)^M
>  ^Iat 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:241)^M 
> ^Iat org.apache.spark.executor.Executor.(Executor.scala:116)^M ^Iat 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)^M
>  ^Iat 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)^M
>  ^Iat org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)^M ^Iat 
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)^M ^Iat 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)^M 
> ^Iat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)^M
>  ^Iat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)^M
>  ^Iat java.lang.Thread.run(Thread.java:748)^M Caused by: 
> java.lang.RuntimeException: java.io.InvalidClassException: 
> org.apache.spark.storage.BlockManagerId; local class incompatible: stream 
> classdesc serialVersionUID = 6155820641931972169, local class 
> serialVersionUID = -3720498261147521051^M ^Iat 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)^M ^Iat 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)^M 
> ^Iat java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)^M
> {code}
> To avoid that Docker has to build without it's cache, but only if you have 
> build for an older version in the past...
> The second problem was that the spark container is pushed, but the spark-py 
> container wasn't yet. This was just forgotten in the initial PR.
> (A third problem I also ran into because I had an older docker was 
> [https://github.com/apache/spark/pull/21551] so I have not included a fix for 
> that in this ticket.)
> Other than that it works great!
> *Solution*
> I've added an extra flag so it's possible to call build with `-n` for 
> --no-cache`.
> And I've added the extra push for the spark-py container.
> *Example*
> ./bin/docker-image-tool.sh -r docker.io/myrepo -t v2.3.0 -n build
> Snippet from the help output:
> {code:java}
> Options:
> -f file Dockerfile to build for JVM based Jobs. By default builds the 
> Dockerfile shipped with Spark.
> -p file Dockerfile with Python baked in. By default builds the Dockerfile 
> shipped with Spark.
> -r repo Repository address.
> -t tag Tag to apply to the built image, or to identify the image to be pushed.
> -m Use minikube's Docker daemon.
> -n Build docker image with --no-cache{code}



--
This message was

[jira] [Updated] (SPARK-24600) Improve support for building different types of images in dockerfile

2018-06-20 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-24600:
---
Description: 
Our docker images currently build and push docker images for pyspark and 
java/scala.
We should be able to build/push either one of them. In the future, we'll have 
this extended to sparkR, the shuffle service, etc.

  was:
Our docker images currently build and push docker images for pyspark and 
java/scala.
We should be able to build/push either one of them. 


> Improve support for building different types of images in dockerfile
> 
>
> Key: SPARK-24600
> URL: https://issues.apache.org/jira/browse/SPARK-24600
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> Our docker images currently build and push docker images for pyspark and 
> java/scala.
> We should be able to build/push either one of them. In the future, we'll have 
> this extended to sparkR, the shuffle service, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24600) Improve support for building subset of images in dockerfile

2018-06-20 Thread Anirudh Ramanathan (JIRA)

Anirudh Ramanathan created SPARK-24600:
--

 Summary: Improve support for building subset of images in 
dockerfile
 Key: SPARK-24600
 URL: https://issues.apache.org/jira/browse/SPARK-24600
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Anirudh Ramanathan


Our docker images currently build and push docker images for pyspark and 
java/scala.
We should be able to build/push either one of them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24600) Improve support for building different types of images in dockerfile

2018-06-20 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-24600:
---
Summary: Improve support for building different types of images in 
dockerfile  (was: Improve support for building subset of images in dockerfile)

> Improve support for building different types of images in dockerfile
> 
>
> Key: SPARK-24600
> URL: https://issues.apache.org/jira/browse/SPARK-24600
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> Our docker images currently build and push docker images for pyspark and 
> java/scala.
> We should be able to build/push either one of them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24232) Allow referring to kubernetes secrets as env variable

2018-05-31 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-24232.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 21317
[https://github.com/apache/spark/pull/21317]

> Allow referring to kubernetes secrets as env variable
> -
>
> Key: SPARK-24232
> URL: https://issues.apache.org/jira/browse/SPARK-24232
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Dharmesh Kakadia
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Allow referring to kubernetes secrets in the driver process via environment 
> variables. This will allow developers to use secretes without leaking them in 
> the code and at the same time secrets can be decoupled and managed 
> separately. This can be used to refer to passwords, certificates etc while 
> talking to other service (jdbc passwords, storage keys etc).
> So, at the deployment time, something like 
> ``spark.kubernetes.driver.secretKeyRef.[EnvName]=`` can be specified 
> which will make [EnvName].[key] available as an environment variable and in 
> the code its always referred as env variable [key].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-24232) Allow referring to kubernetes secrets as env variable

2018-05-31 Thread Anirudh Ramanathan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-24232:
--

Assignee: Stavros Kontopoulos

> Allow referring to kubernetes secrets as env variable
> -
>
> Key: SPARK-24232
> URL: https://issues.apache.org/jira/browse/SPARK-24232
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Dharmesh Kakadia
>Assignee: Stavros Kontopoulos
>Priority: Major
>
> Allow referring to kubernetes secrets in the driver process via environment 
> variables. This will allow developers to use secretes without leaking them in 
> the code and at the same time secrets can be decoupled and managed 
> separately. This can be used to refer to passwords, certificates etc while 
> talking to other service (jdbc passwords, storage keys etc).
> So, at the deployment time, something like 
> ``spark.kubernetes.driver.secretKeyRef.[EnvName]=`` can be specified 
> which will make [EnvName].[key] available as an environment variable and in 
> the code its always referred as env variable [key].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-05-31 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497014#comment-16497014
 ] 

Anirudh Ramanathan commented on SPARK-24434:


Open to suggestions on what could be intuitive in this particular case. Perhaps 
there's also precedent for multi-line low-level configuration in other parts of 
Spark. 

cc/ [~felixcheung] 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-05-31 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497009#comment-16497009
 ] 

Anirudh Ramanathan edited comment on SPARK-24434 at 5/31/18 6:54 PM:
-

I was basing my suggestion of JSON on allowing specifying JSON strings inline 
as configuration, but I guess one could also specify a YAML file with the 
template and have spark configuration point to that file. [~skonto], you make a 
good point, it is another configuration mechanism that people may have to 
learn. This decision should be based more on UX and consistency with what Spark 
users expect in general. [~eje], to your point, I think we could support both 
if needed, but it might be prudent to find the one that's more intuitive to 
users in order to do first.

Sidenote: There's also [jsonpath|https://kubernetes.io/docs/reference/kubectl/] 
that kubectl supports but that could be overkill here.


was (Author: foxish):
Good point. I was basing my suggestion of JSON on allowing specifying JSON 
strings inline as configuration, but I guess one could also specify a YAML file 
with the template and have spark configuration point to that file. [~skonto], 
you make a good point, it is another configuration mechanism that people may 
have to learn. This decision should be based more on UX and consistency with 
what Spark users expect in general. [~eje], to your point, I think we could 
support both if needed, but it might be prudent to find the one that's more 
intuitive to users in order to do first.

Sidenote: There's also [jsonpath|https://kubernetes.io/docs/reference/kubectl/] 
that kubectl supports but that could be overkill here.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-05-31 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497009#comment-16497009
 ] 

Anirudh Ramanathan commented on SPARK-24434:


Good point. I was basing my suggestion of JSON on allowing specifying JSON 
strings inline as configuration, but I guess one could also specify a YAML file 
with the template and have spark configuration point to that file. [~skonto], 
you make a good point, it is another configuration mechanism that people may 
have to learn. This decision should be based more on UX and consistency with 
what Spark users expect in general. [~eje], to your point, I think we could 
support both if needed, but it might be prudent to find the one that's more 
intuitive to users in order to do first.

Sidenote: There's also [jsonpath|https://kubernetes.io/docs/reference/kubectl/] 
that kubectl supports but that could be overkill here.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-05-31 Thread Anirudh Ramanathan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496962#comment-16496962
 ] 

Anirudh Ramanathan commented on SPARK-24434:


The way several custom APIs have done this before is having a PodTemplate field 
that uses the Kubernetes API to provide a rich type-safe interface to add 
arbitrary modifications to pods. It's typically easier with golang structs to 
do that, but we should investigate if from openapi, there's a way for the Java 
client to expose the same. Given that we will want it to map back to 
stringified configuration, supporting JSON strings seems like a good choice 
there. 
 
So, the flow I see is JSON strings converted into valid (type-checked) and 
supported PodTemplate specifications that are eventually added to driver and 
executor pods.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-14 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474774#comment-16474774
 ] 

Anirudh Ramanathan edited comment on SPARK-24248 at 5/14/18 8:44 PM:
-

Somewhat related - https://issues.apache.org/jira/browse/SPARK-24266
As I understood it, there were a few in-memory data structures, but I agree 
with the general notion that in case there is a disconnect with the API, the 
driver should be resilient enough to recompute state when it can reconnect and 
proceed. That should be the end goal here IMO.


was (Author: foxish):
Somewhat related - https://issues.apache.org/jira/browse/SPARK-24266
As I understood it, there were a few in-memory data structures, but I agree 
with the general notion that in case there is a disconnect with the API, the 
driver should be resilient enough to recompute state and proceed. that should 
be the end goal here IMO.

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-14 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474774#comment-16474774
 ] 

Anirudh Ramanathan commented on SPARK-24248:


Somewhat related - https://issues.apache.org/jira/browse/SPARK-24266
As I understood it, there were a few in-memory data structures, but I agree 
with the general notion that in case there is a disconnect with the API, the 
driver should be resilient enough to recompute state and proceed. that should 
be the end goal here IMO.

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24249) Spark on kubernetes, pods crashes with spark sql job.

2018-05-14 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474772#comment-16474772
 ] 

Anirudh Ramanathan commented on SPARK-24249:


cc/ [~mccheah]

> Spark on kubernetes, pods crashes with spark sql job.
> -
>
> Key: SPARK-24249
> URL: https://issues.apache.org/jira/browse/SPARK-24249
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.2.0
> Environment: Spark version : spark-2.2.0-k8s-0.5.0-bin-2.7.3
> Kubernetes version : Kubernetes 1.9.7
> Spark sql configuration :
> Set 1 :
> spark.executor.heartbeatInterval 20s
> spark.executor.cores 4
> spark.driver.cores 4
> spark.driver.memory 15g
> spark.executor.memory 15g
> spark.cores.max 220
> spark.rpc.numRetries 5
> spark.rpc.retry.wait 5
> spark.network.timeout 1800
> spark.sql.broadcastTimeout 1200
> spark.sql.crossJoin.enabled true
> spark.sql.starJoinOptimization true
> spark.eventLog.enabled true
> spark.eventLog.dir hdfs://namenodeHA/tmp/spark-history
> spark.sql.codegen true
> spark.kubernetes.allocation.batch.size 30
> Set 2 :
> spark.executor.heartbeatInterval 20s
> spark.executor.cores 4
> spark.driver.cores 4
> spark.driver.memory 11g
> spark.driver.memoryOverhead 4g
> spark.executor.memory 11g
> spark.executor.memoryOverhead 4g
> spark.cores.max 220
> spark.rpc.numRetries 5
> spark.rpc.retry.wait 5
> spark.network.timeout 1800
> spark.sql.broadcastTimeout 1200
> spark.sql.crossJoin.enabled true
> spark.sql.starJoinOptimization true
> spark.eventLog.enabled true
> spark.eventLog.dir hdfs://namenodeHA/tmp/spark-history
> spark.sql.codegen true
> spark.kubernetes.allocation.batch.size 30
> Kryoserialiser is being used and with "spark.kryoserializer.buffer.mb" value 
> of 64mb.
> 50 executors are being spawned using spark.executor.instances=50 submit 
> argument.
>Reporter: kaushik srinivas
>Priority: Major
> Attachments: StackTrace1.txt, StackTrace2.txt, StackTrace3.txt, 
> StackTrace4.txt
>
>
> Below is the scenario being tested,
> Job :
>  Spark sql job is written in scala, and to run on 1TB TPCDS BENCHMARK DATA 
> which is in parquet,snappy format and hive tables created on top of it.
> Cluster manager :
>  Kubernetes
> Spark sql configuration :
> Set 1 :
>  spark.executor.heartbeatInterval 20s
>  spark.executor.cores 4
>  spark.driver.cores 4
>  spark.driver.memory 15g
>  spark.executor.memory 15g
>  spark.cores.max 220
>  spark.rpc.numRetries 5
>  spark.rpc.retry.wait 5
>  spark.network.timeout 1800
>  spark.sql.broadcastTimeout 1200
>  spark.sql.crossJoin.enabled true
>  spark.sql.starJoinOptimization true
>  spark.eventLog.enabled true
>  spark.eventLog.dir hdfs://namenodeHA/tmp/spark-history
>  spark.sql.codegen true
>  spark.kubernetes.allocation.batch.size 30
> Set 2 :
>  spark.executor.heartbeatInterval 20s
>  spark.executor.cores 4
>  spark.driver.cores 4
>  spark.driver.memory 11g
>  spark.driver.memoryOverhead 4g
>  spark.executor.memory 11g
>  spark.executor.memoryOverhead 4g
>  spark.cores.max 220
>  spark.rpc.numRetries 5
>  spark.rpc.retry.wait 5
>  spark.network.timeout 1800
>  spark.sql.broadcastTimeout 1200
>  spark.sql.crossJoin.enabled true
>  spark.sql.starJoinOptimization true
>  spark.eventLog.enabled true
>  spark.eventLog.dir hdfs://namenodeHA/tmp/spark-history
>  spark.sql.codegen true
>  spark.kubernetes.allocation.batch.size 30
> Kryoserialiser is being used and with "spark.kryoserializer.buffer.mb" value 
> of 64mb.
>  50 executors are being spawned using spark.executor.instances=50 submit 
> argument.
> Issues Observed:
> Spark sql job is terminating abruptly and the drivers,executors are being 
> killed randomly.
>  driver and executors pods gets killed suddenly the job fails.
> Few different stack traces are found across different runs,
> Stack Trace 1:
>  "2018-05-10 06:31:28 ERROR ContextCleaner:91 - Error cleaning broadcast 136
>  org.apache.spark.SparkException: Exception thrown in awaitResult:
>  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)"
>  File attached : [^StackTrace1.txt]
> Stack Trace 2: 
>  "org.apache.spark.shuffle.FetchFailedException: Failed to connect to 
> /192.178.1.105:38039^M
>  at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:442)^M
>  at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:418)"
>  File attached : [^StackTrace2.txt]
> Stack Trace 3:
>  "18/05/10 11:21:17 WARN KubernetesTaskSetManager: Lost task 3.0 in stage 
> 48.0 (TID 16486, 192.178.1.35, executor 41): FetchFailed(null, shuffleId=29, 
> mapId=-1, reduceId=3, message=^M
>  org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output 
> location for shuffle 29^M
>  at 
>

[jira] [Commented] (SPARK-24266) Spark client terminates while driver is still running

2018-05-14 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474770#comment-16474770
 ] 

Anirudh Ramanathan commented on SPARK-24266:


Thanks for filing this issue and digging in. It does seem like the client 
should be responsible for reconnecting transparently when it sees this 
particular error, because the compaction has run and the window of valid 
resource versions has changed. I'd double-check with what the go-client does. 
If it does reconnect transparently (as I suspect it does), we should change the 
fabric8 client-side logic to do the same.

> Spark client terminates while driver is still running
> -
>
> Key: SPARK-24266
> URL: https://issues.apache.org/jira/browse/SPARK-24266
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Chun Chen
>Priority: Major
>
> {code}
> Warning: Ignoring non-spark config property: Default=system properties 
> included when running spark-submit.
> 18/05/11 14:50:12 WARN Config: Error reading service account token from: 
> [/var/run/secrets/kubernetes.io/serviceaccount/token]. Ignoring.
> 18/05/11 14:50:12 INFO HadoopStepsOrchestrator: Hadoop Conf directory: 
> Some(/data/tesla/spark-2.2.0-k8s-0.5.0-bin-2.7.3/hadoop-conf)
> 18/05/11 14:50:15 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 18/05/11 14:50:15 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> 18/05/11 14:50:16 INFO HadoopConfBootstrapImpl: HADOOP_CONF_DIR defined. 
> Mounting Hadoop specific files
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: N/A
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:17 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: N/A
>container images: N/A
>phase: Pending
>status: []
> 18/05/11 14:50:18 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
> spark-2843da19c690485b93780ad7992a101e, spark-role -> driver
>pod uid: 90558303-54e7-11e8-9e64-525400da65d8
>creation time: 2018-05-11T06:50:17Z
>service account name: default
>volumes: spark-local-dir-0-spark-local, spark-init-properties, 
> download-jars-volume, download-files, spark-init-secret, hadoop-properties, 
> default-token-xvjt9
>node name: tbds-100-98-45-69
>start time: 2018-05-11T06:50:17Z
>container images: docker.oa.com:8080/gaia/spark-driver-cos:20180503_9
>phase: Pending
>status: [ContainerStatus(containerID=null, 
> image=docker.oa.com:8080/gaia/spark-driver-cos:20180503_9, imageID=, 
> lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=false, 
> restartCount=0, state=ContainerState(running=null, terminated=null, 
> waiting=ContainerStateWaiting(message=null, reason=PodInitializing, 
> additionalProperties={}), additionalProperties={}), additionalProperties={})]
> 18/05/11 14:50:19 INFO Client: Waiting for application spark-64-293-980 to 
> finish...
> 18/05/11 14:50:25 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-64-293-980-1526021412180-driver
>namespace: tione-603074457
>labels: network -> FLOATINGIP, spark-app-selector -> 
>

[jira] [Assigned] (SPARK-24137) [K8s] Mount temporary directories in emptydir volumes

2018-05-10 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-24137:
--

Assignee: Matt Cheah

> [K8s] Mount temporary directories in emptydir volumes
> -
>
> Key: SPARK-24137
> URL: https://issues.apache.org/jira/browse/SPARK-24137
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently the Spark local directories do not get any volumes and volume 
> mounts, which means we're writing Spark shuffle and cache contents to the 
> file system mounted by Docker. This can be terribly inefficient. We should 
> use emptydir volumes for these directories instead for significant 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24137) [K8s] Mount temporary directories in emptydir volumes

2018-05-10 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-24137.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 21238
[https://github.com/apache/spark/pull/21238]

> [K8s] Mount temporary directories in emptydir volumes
> -
>
> Key: SPARK-24137
> URL: https://issues.apache.org/jira/browse/SPARK-24137
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently the Spark local directories do not get any volumes and volume 
> mounts, which means we're writing Spark shuffle and cache contents to the 
> file system mounted by Docker. This can be terribly inefficient. We should 
> use emptydir volumes for these directories instead for significant 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24179) History Server for Kubernetes

2018-05-04 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-24179:
---
Issue Type: New Feature  (was: Task)

> History Server for Kubernetes
> -
>
> Key: SPARK-24179
> URL: https://issues.apache.org/jira/browse/SPARK-24179
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Eric Charles
>Priority: Major
>
> The History server is missing when running on Kubernetes, with the side 
> effect we can not debug post-mortem or analyze after-the-fact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-03 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462145#comment-16462145
 ] 

Anirudh Ramanathan commented on SPARK-24135:


cc/ [~mridulm80] [~irashid] for thoughts on whether this behavior would be 
intuitive to an existing Spark user.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-03 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462139#comment-16462139
 ] 

Anirudh Ramanathan edited comment on SPARK-24135 at 5/3/18 9:01 AM:


It is increasingly common for people to write custom controllers and custom 
resources and not use the built-in controllers, especially when the workloads 
have special characteristics. This is the whole reason why people are working 
on tooling like the [operator 
framework|https://coreos.com/blog/introducing-operator-framework]. I don't 
think the future lies in shoehorning applications to use the existing 
controllers. The existing controllers are a good starting point but for any 
custom orchestration, the recommendation from the k8s community at large would 
be to write an operator which in some sense is what we've done. So, I think 
moving towards the built-in controllers doesn't give us anything more. 

Also, replication controllers and deployments are not used for applications 
with termination semantics. They're suitable for long running services. That's 
the reason why they never give up after seeing failures. However, if you see 
the "batch" type built-in controller, the [job 
controller|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion],
 it does implement a [backoff 
policy|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy]
 that covers the initialization and runtime errors in containers. As I see it, 
we should have safe limits for all kinds of failures to eventually give up. I'm 
ok with having this limit similar to the job controller, as a configurable 
number and one might want to set it very high in your case to do near infinite 
retries, but I'm not convinced that that behavior is a safe choice in the 
general case. 

Also, flakiness due to admission webhooks seems like it should be handled by 
retries in the init container, or by some other automation, since it's outside 
Spark land. That makes me apprehensive about handling such specific cases 
within Spark, instead of dealing with it as "framework error" and "app error".


was (Author: foxish):
It is increasingly common for people to write custom controllers and custom 
resources and not use the built-in controllers, especially when the workloads 
have special characteristics. This is the whole reason why people are working 
on tooling like the [operator 
framework|https://coreos.com/blog/introducing-operator-framework]. I don't 
think the future lies in shoehorning applications to use the existing 
controllers. The existing controllers are a good starting point but for any 
custom orchestration, the recommendation from the k8s community at large would 
be to write an operator which in some sense is what we've done. So, I think 
moving towards the built-in controllers doesn't give us anything more. 

Also, replication controllers and deployments are not used for applications 
with termination semantics. They're suitable for long running services. The 
only "batch" type built-in controller is the [job 
controller|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion],
 which does implement a [backoff 
policy|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy]
 that covers the initialization and runtime errors in containers. As I see it, 
we should have safe limits for all kinds of failures to eventually give up; 
it's more a question of whether this should be treated differently as a 
framework error.

Also, flakiness due to admission webhooks seems like it should be handled by 
retries in the init container, or by some other automation, since it's outside 
Spark land. That makes me apprehensive about handling such specific cases 
within Spark, instead of dealing with it as "framework error" and "app error".

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up

[jira] [Comment Edited] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-03 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462139#comment-16462139
 ] 

Anirudh Ramanathan edited comment on SPARK-24135 at 5/3/18 8:58 AM:


It is increasingly common for people to write custom controllers and custom 
resources and not use the built-in controllers, especially when the workloads 
have special characteristics. This is the whole reason why people are working 
on tooling like the [operator 
framework|https://coreos.com/blog/introducing-operator-framework]. I don't 
think the future lies in shoehorning applications to use the existing 
controllers. The existing controllers are a good starting point but for any 
custom orchestration, the recommendation from the k8s community at large would 
be to write an operator which in some sense is what we've done. So, I think 
moving towards the built-in controllers doesn't give us anything more. 

Also, replication controllers and deployments are not used for applications 
with termination semantics. They're suitable for long running services. The 
only "batch" type built-in controller is the [job 
controller|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion],
 which does implement a [backoff 
policy|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy]
 that covers the initialization and runtime errors in containers. As I see it, 
we should have safe limits for all kinds of failures to eventually give up; 
it's more a question of whether this should be treated differently as a 
framework error.

Also, flakiness due to admission webhooks seems like it should be handled by 
retries in the init container, or by some other automation, since it's outside 
Spark land. That makes me apprehensive about handling such specific cases 
within Spark, instead of dealing with it as "framework error" and "app error".


was (Author: foxish):
It is increasingly common for people to write custom controllers and custom 
resources and not use the built-in controllers, especially when the workloads 
have special characteristics. This is the whole reason why people are working 
on tooling like the [operator 
framework|https://coreos.com/blog/introducing-operator-framework]. I don't 
think the future lies in shoehorning applications to use the existing 
controllers. The existing controllers are a good starting point but for any 
custom orchestration, the recommendation from the k8s community at large would 
be to write an operator which in some sense is what we've done. So, I think 
moving towards the built-in controllers doesn't give us anything more. 

Also, replication controllers and deployments are not used for applications 
with termination semantics. They're suitable for long running services. The 
only "batch" type built-in controller is the [job 
controller|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion],
 which does implement a backoff policy that covers the initialization and 
runtime errors in containers. As I see it, we should have safe limits for all 
kinds of failures to eventually give up; it's more a question of whether this 
should be treated differently as a framework error.

Also, flakiness due to admission webhooks seems like it should be handled by 
retries in the init container, or by some other automation, since it's outside 
Spark land. That makes me apprehensive about handling such specific cases 
within Spark, instead of dealing with it as "framework error" and "app error".

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark

[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-03 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462139#comment-16462139
 ] 

Anirudh Ramanathan commented on SPARK-24135:


It is increasingly common for people to write custom controllers and custom 
resources and not use the built-in controllers, especially when the workloads 
have special characteristics. This is the whole reason why people are working 
on tooling like the [operator 
framework|https://coreos.com/blog/introducing-operator-framework]. I don't 
think the future lies in shoehorning applications to use the existing 
controllers. The existing controllers are a good starting point but for any 
custom orchestration, the recommendation from the k8s community at large would 
be to write an operator which in some sense is what we've done. So, I think 
moving towards the built-in controllers doesn't give us anything more. 

Also, replication controllers and deployments are not used for applications 
with termination semantics. They're suitable for long running services. The 
only "batch" type built-in controller is the [job 
controller|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion],
 which does implement a backoff policy that covers the initialization and 
runtime errors in containers. As I see it, we should have safe limits for all 
kinds of failures to eventually give up; it's more a question of whether this 
should be treated differently as a framework error.

Also, flakiness due to admission webhooks seems like it should be handled by 
retries in the init container, or by some other automation, since it's outside 
Spark land. That makes me apprehensive about handling such specific cases 
within Spark, instead of dealing with it as "framework error" and "app error".

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-02 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460559#comment-16460559
 ] 

Anirudh Ramanathan commented on SPARK-24135:


+1 to detecting all pod error states and doing something about them. We should 
try and account for as many possible error conditions as possible. For example, 
there are many types of just [image pull 
errors|https://github.com/kubernetes/kubernetes/blob/886e04f1fffbb04faf8a9f9ee141143b2684ae68/pkg/kubelet/images/types.go#L25-L43].
 It is sometimes unclear if they are framework or application errors. I think 
making them count towards job failure is the easiest and most conservative 
behavior to start with. 

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24137) [K8s] Mount temporary directories in emptydir volumes

2018-05-01 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460554#comment-16460554
 ] 

Anirudh Ramanathan commented on SPARK-24137:


SGTM. Let's try and share as much code as possible with 
https://issues.apache.org/jira/browse/SPARK-23529.

> [K8s] Mount temporary directories in emptydir volumes
> -
>
> Key: SPARK-24137
> URL: https://issues.apache.org/jira/browse/SPARK-24137
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> Currently the Spark local directories do not get any volumes and volume 
> mounts, which means we're writing Spark shuffle and cache contents to the 
> file system mounted by Docker. This can be terribly inefficient. We should 
> use emptydir volumes for these directories instead for significant 
> performance improvements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24105) Spark 2.3.0 on kubernetes

2018-04-26 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454847#comment-16454847
 ] 

Anirudh Ramanathan commented on SPARK-24105:


> To avoid this deadlock, its required to support node selector (in future 
> affinity/anti-affinity) configruation by driver & executor.

Would inter-pod anti-affinity be a better bet here for this use-case?
In the extreme case, this is a gang scheduling issue IMO, where we don't want 
to schedule drivers if there are no executors that can be scheduled.
There's some work on gang scheduling ongoing in 
https://github.com/kubernetes/kubernetes/issues/61012 under sig-scheduling.

> Spark 2.3.0 on kubernetes
> -
>
> Key: SPARK-24105
> URL: https://issues.apache.org/jira/browse/SPARK-24105
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Lenin
>Priority: Major
>
> Right now its only possible to define node selector configurations 
> thruspark.kubernetes.node.selector.[labelKey]. This gets used for both driver 
> & executor pods. Without the capability to isolate driver & executor pods, 
> the cluster can run into a livelock scenario, where if there are a lot of 
> spark submits, can cause the driver pods to fill up the cluster capacity, 
> with no room for executor pods to do any work.
>  
> To avoid this deadlock, its required to support node selector (in future 
> affinity/anti-affinity) configruation by driver & executor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24090) Kubernetes Backend Hotlist for Spark 2.4

2018-04-25 Thread Anirudh Ramanathan (JIRA)

Anirudh Ramanathan created SPARK-24090:
--

 Summary: Kubernetes Backend Hotlist for Spark 2.4
 Key: SPARK-24090
 URL: https://issues.apache.org/jira/browse/SPARK-24090
 Project: Spark
  Issue Type: Umbrella
  Components: Kubernetes, Scheduler
Affects Versions: 2.4.0
Reporter: Anirudh Ramanathan
Assignee: Anirudh Ramanathan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24055) Add e2e test for using kubectl proxy for submitting spark jobs

2018-04-23 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-24055:
---
Summary: Add e2e test for using kubectl proxy for submitting spark jobs  
(was: Add e2e test for using kubectl proxy for submission)

> Add e2e test for using kubectl proxy for submitting spark jobs
> --
>
> Key: SPARK-24055
> URL: https://issues.apache.org/jira/browse/SPARK-24055
> Project: Spark
>  Issue Type: Test
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24055) Add e2e test for using kubectl proxy for submission

2018-04-23 Thread Anirudh Ramanathan (JIRA)

Anirudh Ramanathan created SPARK-24055:
--

 Summary: Add e2e test for using kubectl proxy for submission
 Key: SPARK-24055
 URL: https://issues.apache.org/jira/browse/SPARK-24055
 Project: Spark
  Issue Type: Test
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Anirudh Ramanathan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444902#comment-16444902
 ] 

Anirudh Ramanathan commented on SPARK-24028:


My suspicion here is that this has to do with timing. An easy way to check may 
be to add a sleep() of a few seconds during driver pod startup and seeing if 
the issue resolves itself. Looks like there may have been a race condition with 
the storage mounting logic in the past, but if you're seeing this fresh in 
1.9.4, that is something we should file a bug about in upstream. 

All the recent runs of 
https://k8s-testgrid.appspot.com/sig-big-data#spark-periodic-latest-gke on 
v1.9.6 have been green. Any ideas on how we can reproduce this?

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444902#comment-16444902
 ] 

Anirudh Ramanathan edited comment on SPARK-24028 at 4/19/18 10:22 PM:
--

My suspicion here is that this has to do with timing. An easy way to check may 
be to add a sleep() of a few seconds during driver pod startup and seeing if 
the issue resolves itself. Looks like there may have been a race condition with 
the storage mounting logic in the past, but if you're seeing this fresh in 
1.9.4, that is something we should file a bug about in upstream Kubernetes. 

All the recent runs of 
https://k8s-testgrid.appspot.com/sig-big-data#spark-periodic-latest-gke on 
v1.9.6 have been green. Any ideas on how we can reproduce this?


was (Author: foxish):
My suspicion here is that this has to do with timing. An easy way to check may 
be to add a sleep() of a few seconds during driver pod startup and seeing if 
the issue resolves itself. Looks like there may have been a race condition with 
the storage mounting logic in the past, but if you're seeing this fresh in 
1.9.4, that is something we should file a bug about in upstream. 

All the recent runs of 
https://k8s-testgrid.appspot.com/sig-big-data#spark-periodic-latest-gke on 
v1.9.6 have been green. Any ideas on how we can reproduce this?

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444881#comment-16444881
 ] 

Anirudh Ramanathan edited comment on SPARK-24028 at 4/19/18 10:09 PM:
--

This is unexpected. Is this a recent change? You mention a 1.9.4 cluster has 
this issue. 
Like Yinan said, it doesn't sound like the right behavior if an empty file is 
found instead of what you expect - the expectation is that the secret/configmap 
will exist and be mounted, or not exist and cause the pod to go pending and 
retrying till mounting succeeds.


was (Author: foxish):
This is curious. Is this a recent change? You mention a 1.9.4 cluster has this 
issue. 
Like Yinan said, it doesn't sound like the right behavior if an empty file is 
found instead of what you expect - the expectation is that the secret/configmap 
will exist and be mounted, or not exist and cause the pod to go pending and 
retrying till mounting succeeds.

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24028) [K8s] Creating secrets and config maps before creating the driver pod has unpredictable behavior

2018-04-19 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444881#comment-16444881
 ] 

Anirudh Ramanathan commented on SPARK-24028:


This is curious. Is this a recent change? You mention a 1.9.4 cluster has this 
issue. 
Like Yinan said, it doesn't sound like the right behavior if an empty file is 
found instead of what you expect - the expectation is that the secret/configmap 
will exist and be mounted, or not exist and cause the pod to go pending and 
retrying till mounting succeeds.

> [K8s] Creating secrets and config maps before creating the driver pod has 
> unpredictable behavior
> 
>
> Key: SPARK-24028
> URL: https://issues.apache.org/jira/browse/SPARK-24028
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Critical
>
> Currently we create the Kubernetes resources the driver depends on - such as 
> the properties config map and secrets to mount into the pod - only after we 
> create the driver pod. This is because we want these extra objects to 
> immediately have an owner reference to be tied to the driver pod.
> On our Kubernetes 1.9.4. cluster, we're seeing that sometimes this works 
> fine, but other times the driver ends up being started with empty volumes 
> instead of volumes with the contents of the secrets we expect. The result is 
> that sometimes the driver will start without these files mounted, which leads 
> to various failures if the driver requires these files to be present early on 
> in their code. Missing the properties file config map, for example, would 
> mean spark-submit doesn't have a properties file to read at all. See the 
> warning on [https://kubernetes.io/docs/concepts/storage/volumes/#secret.]
> Unfortunately we cannot link owner references to non-existent objects, so we 
> have to do this instead:
>  # Create the auxiliary resources without any owner references.
>  # Create the driver pod mounting these resources into volumes, as before.
>  # If #2 fails, clean up the resources created in #1.
>  # Edit the auxiliary resources to have an owner reference for the driver pod.
> The multi-step approach leaves a small chance for us to leak resources - for 
> example, if we fail to make the resource edits in #4 for some reason. This 
> also changes the permissioning mode required for spark-submit - credentials 
> provided to spark-submit need to be able to edit resources in addition to 
> creating them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-22839) Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction

2018-04-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-22839.

   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 20910
[https://github.com/apache/spark/pull/20910]

> Refactor Kubernetes code for configuring driver/executor pods to use 
> consistent and cleaner abstraction
> ---
>
> Key: SPARK-22839
> URL: https://issues.apache.org/jira/browse/SPARK-22839
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> As discussed in https://github.com/apache/spark/pull/19954, the current code 
> for configuring the driver pod vs the code for configuring the executor pods 
> are not using the same abstraction. Besides that, the current code leaves a 
> lot to be desired in terms of the level and cleaness of abstraction. For 
> example, the current code is passing around many pieces of information around 
> different class hierarchies, which makes code review and maintenance 
> challenging. We need some thorough refactoring of the current code to achieve 
> better, cleaner, and consistent abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-22839) Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction

2018-04-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-22839:
--

Assignee: Matt Cheah

> Refactor Kubernetes code for configuring driver/executor pods to use 
> consistent and cleaner abstraction
> ---
>
> Key: SPARK-22839
> URL: https://issues.apache.org/jira/browse/SPARK-22839
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> As discussed in https://github.com/apache/spark/pull/19954, the current code 
> for configuring the driver pod vs the code for configuring the executor pods 
> are not using the same abstraction. Besides that, the current code leaves a 
> lot to be desired in terms of the level and cleaness of abstraction. For 
> example, the current code is passing around many pieces of information around 
> different class hierarchies, which makes code review and maintenance 
> challenging. We need some thorough refactoring of the current code to achieve 
> better, cleaner, and consistent abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23885) trying to spark submit 2.3.0 on minikube

2018-04-12 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436094#comment-16436094
 ] 

Anirudh Ramanathan edited comment on SPARK-23885 at 4/12/18 6:21 PM:
-

Please post questions to the spark-user mailing list or stackoverflow.


was (Author: foxish):
Please post questions to the spark-user mailing list.

> trying to spark submit 2.3.0 on minikube
> 
>
> Key: SPARK-23885
> URL: https://issues.apache.org/jira/browse/SPARK-23885
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 2.3.0
>Reporter: anant pukale
>Assignee: Anirudh Ramanathan
>Priority: Major
>
>  spark-submit on minikube(kubernets) failing .
> Kindly refere link for details 
>  
> [https://stackoverflow.com/questions/49689298/exception-in-thread-main-org-apache-spark-sparkexception-must-specify-the-dri|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23885) trying to spark submit 2.3.0 on minikube

2018-04-12 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436094#comment-16436094
 ] 

Anirudh Ramanathan commented on SPARK-23885:


Please post questions to the spark-user mailing list.

> trying to spark submit 2.3.0 on minikube
> 
>
> Key: SPARK-23885
> URL: https://issues.apache.org/jira/browse/SPARK-23885
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 2.3.0
>Reporter: anant pukale
>Priority: Major
>
>  spark-submit on minikube(kubernets) failing .
> Kindly refere link for details 
>  
> [https://stackoverflow.com/questions/49689298/exception-in-thread-main-org-apache-spark-sparkexception-must-specify-the-dri|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23885) trying to spark submit 2.3.0 on minikube

2018-04-12 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23885:
--

Assignee: Anirudh Ramanathan

> trying to spark submit 2.3.0 on minikube
> 
>
> Key: SPARK-23885
> URL: https://issues.apache.org/jira/browse/SPARK-23885
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Submit
>Affects Versions: 2.3.0
>Reporter: anant pukale
>Assignee: Anirudh Ramanathan
>Priority: Major
>
>  spark-submit on minikube(kubernets) failing .
> Kindly refere link for details 
>  
> [https://stackoverflow.com/questions/49689298/exception-in-thread-main-org-apache-spark-sparkexception-must-specify-the-dri|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23891) Debian based Dockerfile

2018-04-12 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436092#comment-16436092
 ] 

Anirudh Ramanathan commented on SPARK-23891:


[~eje] has done a lot of research on these images and dependencies. PTAL

> Debian based Dockerfile
> ---
>
> Key: SPARK-23891
> URL: https://issues.apache.org/jira/browse/SPARK-23891
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Sercan Karaoglu
>Priority: Major
>
> Current dockerfile inherits from alpine linux which causes netty tcnative ssl 
> bindings to fail while loading which is the case when we use Google Cloud 
> Platforms Bigtable Client on top of spark cluster. would be better to have 
> another debian based dockerfile



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23668) Support for imagePullSecrets k8s option

2018-04-04 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-23668.

Resolution: Fixed

> Support for imagePullSecrets k8s option
> ---
>
> Key: SPARK-23668
> URL: https://issues.apache.org/jira/browse/SPARK-23668
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Andrew Korzhuev
>Assignee: Andrew Korzhuev
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In enterprise setting it's likely that image registry k8s pulling images from 
> is private.
> Credentials can be passed with the Pod specification through the 
> `imagePullSecrets` parameter, which refers to the k8s secret by name (see 
> [https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/]
>  ).
> Implementation wise we only need to expose configuration option to a user and 
> then pass it along to the k8s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23668) Support for imagePullSecrets k8s option

2018-04-03 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23668:
--

Assignee: Andrew Korzhuev

> Support for imagePullSecrets k8s option
> ---
>
> Key: SPARK-23668
> URL: https://issues.apache.org/jira/browse/SPARK-23668
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Andrew Korzhuev
>Assignee: Andrew Korzhuev
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In enterprise setting it's likely that image registry k8s pulling images from 
> is private.
> Credentials can be passed with the Pod specification through the 
> `imagePullSecrets` parameter, which refers to the k8s secret by name (see 
> [https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/]
>  ).
> Implementation wise we only need to expose configuration option to a user and 
> then pass it along to the k8s.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-04-02 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-23285.

Resolution: Fixed

> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Assignee: Yinan Li
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-04-02 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23285:
--

Assignee: Yinan Li

> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Assignee: Yinan Li
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22865) Publish Official Apache Spark Docker images

2018-04-02 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-22865:
---
Priority: Major  (was: Minor)

> Publish Official Apache Spark Docker images
> ---
>
> Key: SPARK-22865
> URL: https://issues.apache.org/jira/browse/SPARK-22865
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22865) Publish Official Apache Spark Docker images

2018-04-02 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422992#comment-16422992
 ] 

Anirudh Ramanathan commented on SPARK-22865:


[~eje] was working on getting the licensing issues sorted. Erik - do you have 
an update on this?

> Publish Official Apache Spark Docker images
> ---
>
> Key: SPARK-22865
> URL: https://issues.apache.org/jira/browse/SPARK-22865
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23680) entrypoint.sh does not accept arbitrary UIDs, returning as an error

2018-04-02 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422987#comment-16422987
 ] 

Anirudh Ramanathan commented on SPARK-23680:


[~felixcheung] helped me set up the right permissions in JIRA to edit that 
field.

> entrypoint.sh does not accept arbitrary UIDs, returning as an error
> ---
>
> Key: SPARK-23680
> URL: https://issues.apache.org/jira/browse/SPARK-23680
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
> Environment: OpenShift
>Reporter: Ricardo Martinelli de Oliveira
>Priority: Major
>  Labels: easyfix
>
> Openshift supports running pods using arbitrary UIDs 
> ([https://docs.openshift.com/container-platform/3.7/creating_images/guidelines.html#openshift-specific-guidelines)]
>   to improve security. Although entrypoint.sh was developed to cover this 
> feature, the script is returning an error[1].
> The issue is that the script uses getent to find the passwd entry of the 
> current UID, and if the entry is not found it creates an entry in 
> /etc/passwd. According to the getent man page:
> {code:java}
> EXIT STATUS
>    One of the following exit values can be returned by getent:
>   0 Command completed successfully.
>   1 Missing arguments, or database unknown.
>   2 One or more supplied key could not be found in the 
> database.
>   3 Enumeration not supported on this database.
> {code}
> And since the script begin with a "set -ex" command, which means it turns on 
> debug and breaks the script if the command pipelines returns an exit code 
> other than 0.--
> Having that said, this line below must be changed to remove the "-e" flag 
> from set command:
> https://github.com/apache/spark/blob/v2.3.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L20
>  
>  
> [1]https://github.com/apache/spark/blob/v2.3.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L25-L34



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23680) entrypoint.sh does not accept arbitrary UIDs, returning as an error

2018-04-02 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422987#comment-16422987
 ] 

Anirudh Ramanathan edited comment on SPARK-23680 at 4/2/18 7:04 PM:


[~felixcheung] helped me get the right permissions in JIRA to edit that field.


was (Author: foxish):
[~felixcheung] helped me set up the right permissions in JIRA to edit that 
field.

> entrypoint.sh does not accept arbitrary UIDs, returning as an error
> ---
>
> Key: SPARK-23680
> URL: https://issues.apache.org/jira/browse/SPARK-23680
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
> Environment: OpenShift
>Reporter: Ricardo Martinelli de Oliveira
>Priority: Major
>  Labels: easyfix
>
> Openshift supports running pods using arbitrary UIDs 
> ([https://docs.openshift.com/container-platform/3.7/creating_images/guidelines.html#openshift-specific-guidelines)]
>   to improve security. Although entrypoint.sh was developed to cover this 
> feature, the script is returning an error[1].
> The issue is that the script uses getent to find the passwd entry of the 
> current UID, and if the entry is not found it creates an entry in 
> /etc/passwd. According to the getent man page:
> {code:java}
> EXIT STATUS
>    One of the following exit values can be returned by getent:
>   0 Command completed successfully.
>   1 Missing arguments, or database unknown.
>   2 One or more supplied key could not be found in the 
> database.
>   3 Enumeration not supported on this database.
> {code}
> And since the script begin with a "set -ex" command, which means it turns on 
> debug and breaks the script if the command pipelines returns an exit code 
> other than 0.--
> Having that said, this line below must be changed to remove the "-e" flag 
> from set command:
> https://github.com/apache/spark/blob/v2.3.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L20
>  
>  
> [1]https://github.com/apache/spark/blob/v2.3.0/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L25-L34



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23200) Reset configuration when restarting from checkpoints

2018-03-13 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397432#comment-16397432
 ] 

Anirudh Ramanathan commented on SPARK-23200:


Looks like the change was reverted. [~ssaavedra], can you propose this change 
again with the cleanup? We should target it for 2.4.

> Reset configuration when restarting from checkpoints
> 
>
> Key: SPARK-23200
> URL: https://issues.apache.org/jira/browse/SPARK-23200
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> Streaming workloads and restarting from checkpoints may need additional 
> changes, i.e. resetting properties -  see 
> https://github.com/apache-spark-on-k8s/spark/pull/516



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18278) SPIP: Support native submission of spark jobs to a kubernetes cluster

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-18278.

Resolution: Fixed

Closing this. Sub-issues have been created for remaining items with the 
component tag "Kubernetes".

> SPIP: Support native submission of spark jobs to a kubernetes cluster
> -
>
> Key: SPARK-18278
> URL: https://issues.apache.org/jira/browse/SPARK-18278
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Deploy, Documentation, Kubernetes, Scheduler, 
> Spark Core
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>Assignee: Anirudh Ramanathan
>Priority: Major
>  Labels: SPIP
> Attachments: SPARK-18278 Spark on Kubernetes Design Proposal Revision 
> 2 (1).pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting 
> Spark applications to a kubernetes cluster.   The submitted application runs 
> in a driver executing on a kubernetes pod, and executors lifecycles are also 
> managed as pods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22865) Publish Official Apache Spark Docker images

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-22865:
---
Issue Type: New Feature  (was: Sub-task)
Parent: (was: SPARK-18278)

> Publish Official Apache Spark Docker images
> ---
>
> Key: SPARK-22865
> URL: https://issues.apache.org/jira/browse/SPARK-22865
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23067) Allow for easier debugging of the docker container

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-23067:
---
Issue Type: Bug  (was: Sub-task)
Parent: (was: SPARK-18278)

> Allow for easier debugging of the docker container
> --
>
> Key: SPARK-23067
> URL: https://issues.apache.org/jira/browse/SPARK-23067
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> `docker run -it foxish/spark:v2.3.0 /bin/bash` fails because we don't accept 
> any command except (driver, executor and init). Consider piping the unknown 
> commands through when they're unknown.
> It is still possible to do something like:
> `docker run -it --entrypoint=/bin/bash foxish/spark:v2.3.0` now for debugging 
> but it's common to try and run a different command as specified above. Also 
> consider documenting how to debug/inspect the docker images.
> [~vanzin] [~kimoonkim]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23010) Add integration testing for Kubernetes backend into the apache/spark repository

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-23010:
---
Issue Type: New Feature  (was: Sub-task)
Parent: (was: SPARK-18278)

> Add integration testing for Kubernetes backend into the apache/spark 
> repository
> ---
>
> Key: SPARK-23010
> URL: https://issues.apache.org/jira/browse/SPARK-23010
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> Add tests for the scheduler backend into apache/spark
> /xref: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Integration-testing-and-Scheduler-Backends-td23105.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-23324.

Resolution: Fixed

> Announce new Kubernetes back-end for 2.3 release notes
> --
>
> Key: SPARK-23324
> URL: https://issues.apache.org/jira/browse/SPARK-23324
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>Assignee: Erik Erlandson
>Priority: Major
>  Labels: documentation, kubernetes, releasenotes
>
> This is an issue to request that the new Kubernetes scheduler back-end gets 
> called out in the 2.3 release notes, as it is a prominent new feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23324:
--

Assignee: Erik Erlandson

> Announce new Kubernetes back-end for 2.3 release notes
> --
>
> Key: SPARK-23324
> URL: https://issues.apache.org/jira/browse/SPARK-23324
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>Assignee: Erik Erlandson
>Priority: Major
>  Labels: documentation, kubernetes, releasenotes
>
> This is an issue to request that the new Kubernetes scheduler back-end gets 
> called out in the 2.3 release notes, as it is a prominent new feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23529) Specify hostpath volume and mount the volume in Spark driver and executor pods in Kubernetes

2018-03-13 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397421#comment-16397421
 ] 

Anirudh Ramanathan commented on SPARK-23529:


Can this wait till we have local PVs? Mounting hostpath volumes is error prone 
because if the executor fails and comes up on a different node, we might be 
looking at a different disk entirely. Can you explain some more about your 
use-case?

> Specify hostpath volume and mount the volume in Spark driver and executor 
> pods in Kubernetes
> 
>
> Key: SPARK-23529
> URL: https://issues.apache.org/jira/browse/SPARK-23529
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Suman Somasundar
>Assignee: Anirudh Ramanathan
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23618) docker-image-tool.sh Fails While Building Image

2018-03-13 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397417#comment-16397417
 ] 

Anirudh Ramanathan commented on SPARK-23618:


Thanks Felix. The mechanism appears to work, but I'm unable to assign the PR 
author in the JIRA. 

> docker-image-tool.sh Fails While Building Image
> ---
>
> Key: SPARK-23618
> URL: https://issues.apache.org/jira/browse/SPARK-23618
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Ninad Ingole
>Priority: Major
>
> I am trying to build kubernetes image for version 2.3.0, using 
> {code:java}
> ./bin/docker-image-tool.sh -r ninadingole/spark-docker -t v2.3.0 build
> {code}
> giving me an issue for docker build 
> error:
> {code:java}
> "docker build" requires exactly 1 argument.
> See 'docker build --help'.
> Usage: docker build [OPTIONS] PATH | URL | - [flags]
> Build an image from a Dockerfile
> {code}
>  
> Executing the command within the spark distribution directory. Please let me 
> know what's the issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23618) docker-image-tool.sh Fails While Building Image

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23618:
--

Assignee: (was: Anirudh Ramanathan)

> docker-image-tool.sh Fails While Building Image
> ---
>
> Key: SPARK-23618
> URL: https://issues.apache.org/jira/browse/SPARK-23618
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Ninad Ingole
>Priority: Major
>
> I am trying to build kubernetes image for version 2.3.0, using 
> {code:java}
> ./bin/docker-image-tool.sh -r ninadingole/spark-docker -t v2.3.0 build
> {code}
> giving me an issue for docker build 
> error:
> {code:java}
> "docker build" requires exactly 1 argument.
> See 'docker build --help'.
> Usage: docker build [OPTIONS] PATH | URL | - [flags]
> Build an image from a Dockerfile
> {code}
>  
> Executing the command within the spark distribution directory. Please let me 
> know what's the issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23618) docker-image-tool.sh Fails While Building Image

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23618:
--

Assignee: Anirudh Ramanathan

> docker-image-tool.sh Fails While Building Image
> ---
>
> Key: SPARK-23618
> URL: https://issues.apache.org/jira/browse/SPARK-23618
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Ninad Ingole
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> I am trying to build kubernetes image for version 2.3.0, using 
> {code:java}
> ./bin/docker-image-tool.sh -r ninadingole/spark-docker -t v2.3.0 build
> {code}
> giving me an issue for docker build 
> error:
> {code:java}
> "docker build" requires exactly 1 argument.
> See 'docker build --help'.
> Usage: docker build [OPTIONS] PATH | URL | - [flags]
> Build an image from a Dockerfile
> {code}
>  
> Executing the command within the spark distribution directory. Please let me 
> know what's the issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23136) Mark packages as experimental

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23136:
--

Assignee: Anirudh Ramanathan

> Mark packages as experimental
> -
>
> Key: SPARK-23136
> URL: https://issues.apache.org/jira/browse/SPARK-23136
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> This is tracking marking the packages and indicating in docs that it's 
> experimental. Based on conversation in 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Kubernetes-why-use-init-containers-td23113.html
>  
> [~vanzin] [~felixcheung] [~sameerag]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23016) Spark UI access and documentation

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23016:
--

Assignee: Anirudh Ramanathan

> Spark UI access and documentation
> -
>
> Key: SPARK-23016
> URL: https://issues.apache.org/jira/browse/SPARK-23016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Minor
>
> We should have instructions to access the spark driver UI, or instruct users 
> to create a service to expose it.
> Also might need an integration test to verify that the driver UI works as 
> expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-22758) New Spark Jira component for Kubernetes

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-22758:
--

Assignee: Marcelo Vanzin

> New Spark Jira component for Kubernetes
> ---
>
> Key: SPARK-22758
> URL: https://issues.apache.org/jira/browse/SPARK-22758
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>Assignee: Marcelo Vanzin
>Priority: Major
>
> Given that we now have the first bits of code for adding Kubernetes as a 
> native scheduler backend merged in 
> https://github.com/apache/spark/pull/19468, we need a new Jira component for 
> Kubernetes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-22647) Docker files for image creation

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-22647:
--

Assignee: Yinan Li

> Docker files for image creation
> ---
>
> Key: SPARK-22647
> URL: https://issues.apache.org/jira/browse/SPARK-22647
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Yinan Li
>Priority: Major
>
> This covers the dockerfiles that need to be shipped to enable the Kubernetes 
> backend for Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-22645) Add Scheduler Backend with static allocation of executors

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-22645:
--

Assignee: Anirudh Ramanathan

> Add Scheduler Backend with static allocation of executors
> -
>
> Key: SPARK-22645
> URL: https://issues.apache.org/jira/browse/SPARK-22645
> Project: Spark
>  Issue Type: Sub-task
>  Components: Scheduler
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Major
>
> This is a stripped down version of the KubernetesClusterSchedulerBackend for 
> Spark with the following components:
> * Static Allocation of Executors
> * Executor Pod Factory
> * Executor Recovery Semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18278) SPIP: Support native submission of spark jobs to a kubernetes cluster

2018-03-13 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397403#comment-16397403
 ] 

Anirudh Ramanathan commented on SPARK-18278:


I think we should close this JIRA out and move to create specific sub-issues 
for the remaining tasks. 

> SPIP: Support native submission of spark jobs to a kubernetes cluster
> -
>
> Key: SPARK-18278
> URL: https://issues.apache.org/jira/browse/SPARK-18278
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Deploy, Documentation, Kubernetes, Scheduler, 
> Spark Core
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>Assignee: Anirudh Ramanathan
>Priority: Major
>  Labels: SPIP
> Attachments: SPARK-18278 Spark on Kubernetes Design Proposal Revision 
> 2 (1).pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting 
> Spark applications to a kubernetes cluster.   The submitted application runs 
> in a driver executing on a kubernetes pod, and executors lifecycles are also 
> managed as pods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23083) Adding Kubernetes as an option to https://spark.apache.org/

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-23083:
--

Assignee: Anirudh Ramanathan

> Adding Kubernetes as an option to https://spark.apache.org/
> ---
>
> Key: SPARK-23083
> URL: https://issues.apache.org/jira/browse/SPARK-23083
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Assignee: Anirudh Ramanathan
>Priority: Minor
>
> [https://spark.apache.org/] can now include a reference to, and the k8s logo.
> I think this is not tied to the docs.
> cc/ [~rxin] [~sameer]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18278) SPIP: Support native submission of spark jobs to a kubernetes cluster

2018-03-13 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan reassigned SPARK-18278:
--

Assignee: Anirudh Ramanathan

> SPIP: Support native submission of spark jobs to a kubernetes cluster
> -
>
> Key: SPARK-18278
> URL: https://issues.apache.org/jira/browse/SPARK-18278
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Deploy, Documentation, Kubernetes, Scheduler, 
> Spark Core
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>Assignee: Anirudh Ramanathan
>Priority: Major
>  Labels: SPIP
> Attachments: SPARK-18278 Spark on Kubernetes Design Proposal Revision 
> 2 (1).pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting 
> Spark applications to a kubernetes cluster.   The submitted application runs 
> in a driver executing on a kubernetes pod, and executors lifecycles are also 
> managed as pods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23618) docker-image-tool.sh Fails While Building Image

2018-03-12 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395686#comment-16395686
 ] 

Anirudh Ramanathan commented on SPARK-23618:


[~felixcheung], just merged this PR. But I'm unable to add an assignee to the 
JIRA. Am I missing some permissions?

> docker-image-tool.sh Fails While Building Image
> ---
>
> Key: SPARK-23618
> URL: https://issues.apache.org/jira/browse/SPARK-23618
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Ninad Ingole
>Priority: Major
>
> I am trying to build kubernetes image for version 2.3.0, using 
> {code:java}
> ./bin/docker-image-tool.sh -r ninadingole/spark-docker -t v2.3.0 build
> {code}
> giving me an issue for docker build 
> error:
> {code:java}
> "docker build" requires exactly 1 argument.
> See 'docker build --help'.
> Usage: docker build [OPTIONS] PATH | URL | - [flags]
> Build an image from a Dockerfile
> {code}
>  
> Executing the command within the spark distribution directory. Please let me 
> know what's the issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23618) docker-image-tool.sh Fails While Building Image

2018-03-12 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-23618.

Resolution: Fixed

> docker-image-tool.sh Fails While Building Image
> ---
>
> Key: SPARK-23618
> URL: https://issues.apache.org/jira/browse/SPARK-23618
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Ninad Ingole
>Priority: Major
>
> I am trying to build kubernetes image for version 2.3.0, using 
> {code:java}
> ./bin/docker-image-tool.sh -r ninadingole/spark-docker -t v2.3.0 build
> {code}
> giving me an issue for docker build 
> error:
> {code:java}
> "docker build" requires exactly 1 argument.
> See 'docker build --help'.
> Usage: docker build [OPTIONS] PATH | URL | - [flags]
> Build an image from a Dockerfile
> {code}
>  
> Executing the command within the spark distribution directory. Please let me 
> know what's the issue.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23083) Adding Kubernetes as an option to https://spark.apache.org/

2018-02-28 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-23083.

Resolution: Fixed

This has been merged, closing.

> Adding Kubernetes as an option to https://spark.apache.org/
> ---
>
> Key: SPARK-23083
> URL: https://issues.apache.org/jira/browse/SPARK-23083
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> [https://spark.apache.org/] can now include a reference to, and the k8s logo.
> I think this is not tied to the docs.
> cc/ [~rxin] [~sameer]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23485) Kubernetes should support node blacklist

2018-02-23 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374884#comment-16374884
 ] 

Anirudh Ramanathan edited comment on SPARK-23485 at 2/23/18 7:36 PM:
-

Stavros - we [do currently 
differentiate|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L386-L398]
 between kubernetes causing an executor to disappear (node failure) and exit 
caused by the application itself. 

Here's some detail on node issues and k8s:

The node level problem detection is split between the Kubelet and the [Node 
Problem Detector|https://github.com/kubernetes/node-problem-detector]. This 
works for some common errors and in future, will taint nodes upon detecting 
them. Some of these errors are listed 
[here|https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json#L30:15].
 However, there are some categories of errors this setup won't detect. For 
example: if we have a node that has firewall rules/networking that prevents an 
executor running on it accessing a particular external service, to say - 
download/stream data. Or, a node with issues in its local disk which makes a 
spark executor on it throw read/write errors. These error conditions may only 
affect certain kinds of pods on that node and not others.

Yinan's point I think is that it is uncommon for applications on k8s to try and 
incorporate reasoning about node level conditions. I think this is because the 
general expectation is that a failure on a given node will just cause new 
executors to spin up on different nodes and eventually the application will 
succeed. However, I can see this being an issue in large-scale production 
deployments, where we'd see transient errors like above. Given the existence of 
a blacklist mechanism and anti-affinity primitives, it wouldn't be too complex 
to incorporate it I think. 

[~aash] [~mcheah], have you guys seen this in practice thus far? 


was (Author: foxish):
Stavros - we [do currently 
differentiate|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L386-L398]
 between kubernetes causing an executor to disappear (node failure) and exit 
caused by the application itself. 

Here's some detail on node issues and k8s:

The node level problem detection is split between the Kubelet and the [Node 
Problem Detector|https://github.com/kubernetes/node-problem-detector]. This 
works for some common errors and in future, will taint nodes upon detecting 
them. Some of these errors are listed 
[here|https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json#L30:15].
 However, there are some categories of errors this setup won't detect. For 
example: if we have a node that has firewall rules/networking that prevents it 
from accessing a particular external service, to say - download/stream data. 
Or, a node with issues in its local disk which makes it throw read/write 
errors. These error conditions may only affect certain kinds of pods on that 
node and not others.

Yinan's point I think is that it is uncommon for applications on k8s to try and 
incorporate reasoning about node level conditions. I think this is because the 
general expectation is that a failure on a given node will just cause new 
executors to spin up on different nodes and eventually the application will 
succeed. However, I can see this being an issue in large-scale production 
deployments, where we'd see transient errors like above. Given the existence of 
a blacklist mechanism and anti-affinity primitives, it wouldn't be too complex 
to incorporate it I think. 

[~aash] [~mcheah], have you guys seen this in practice thus far? 

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in

[jira] [Commented] (SPARK-23485) Kubernetes should support node blacklist

2018-02-23 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374884#comment-16374884
 ] 

Anirudh Ramanathan commented on SPARK-23485:


Stavros - we [do currently 
differentiate|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L386-L398]
 between kubernetes causing an executor to disappear (node failure) and exit 
caused by the application itself. 

Here's some detail on node issues and k8s:

The node level problem detection is split between the Kubelet and the [Node 
Problem Detector|https://github.com/kubernetes/node-problem-detector]. This 
works for some common errors and in future, will taint nodes upon detecting 
them. Some of these errors are listed 
[here|https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json#L30:15].
 However, there are some categories of errors this setup won't detect. For 
example: if we have a node that has firewall rules/networking that prevents it 
from accessing a particular external service, to say - download/stream data. 
Or, a node with issues in its local disk which makes it throw read/write 
errors. These error conditions may only affect certain kinds of pods on that 
node and not others.

Yinan's point I think is that it is uncommon for applications on k8s to try and 
incorporate reasoning about node level conditions. I think this is because the 
general expectation is that a failure on a given node will just cause new 
executors to spin up on different nodes and eventually the application will 
succeed. However, I can see this being an issue in large-scale production 
deployments, where we'd see transient errors like above. Given the existence of 
a blacklist mechanism and anti-affinity primitives, it wouldn't be too complex 
to incorporate it I think. 

[~aash] [~mcheah], have you guys seen this in practice thus far? 

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23485) Kubernetes should support node blacklist

2018-02-23 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374745#comment-16374745
 ] 

Anirudh Ramanathan edited comment on SPARK-23485 at 2/23/18 6:00 PM:
-

While mostly I think that K8s would be better suited to make the decision to 
blacklist nodes, I think we will see that there are causes to consider nodes 
problematic beyond just the kubelet health checks, so, using Spark's 
blacklisting sounds like a good idea to me. 

Tainting nodes isn't the right solution given it's one Spark application's 
notion of a blacklist and we don't want it to be applied at a cluster level. We 
could however, use [node 
anti-affinity|https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity]
 to communicate said blacklist and ensure that certain nodes are avoided by 
executors of that application.


was (Author: foxish):
While mostly I think that K8s would be better suited to make the decision to 
blacklist nodes, I think we will see that there are causes to consider nodes 
problematic beyond just the kubelet health checks, so, using Spark's 
blacklisting sounds like a good idea to me. 

Tainting nodes aren't the right solution given it's one Spark application's 
notion of a blacklist and we don't want it to be applied at a cluster level. We 
could however, use [node 
anti-affinity|https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity]
 to communicate said blacklist and ensure that certain nodes are avoided by 
executors of that application.

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23485) Kubernetes should support node blacklist

2018-02-23 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374745#comment-16374745
 ] 

Anirudh Ramanathan commented on SPARK-23485:


While mostly I think that K8s would be better suited to make the decision to 
blacklist nodes, I think we will see that there are causes to consider nodes 
problematic beyond just the kubelet health checks, so, using Spark's 
blacklisting sounds like a good idea to me. 

Tainting nodes aren't the right solution given it's one Spark application's 
notion of a blacklist and we don't want it to be applied at a cluster level. We 
could however, use [node 
anti-affinity|https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity]
 to communicate said blacklist and ensure that certain nodes are avoided by 
executors of that application.

> Kubernetes should support node blacklist
> 
>
> Key: SPARK-23485
> URL: https://issues.apache.org/jira/browse/SPARK-23485
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
>
> Spark's BlacklistTracker maintains a list of "bad nodes" which it will not 
> use for running tasks (eg., because of bad hardware).  When running in yarn, 
> this blacklist is used to avoid ever allocating resources on blacklisted 
> nodes: 
> https://github.com/apache/spark/blob/e836c27ce011ca9aef822bef6320b4a7059ec343/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128
> I'm just beginning to poke around the kubernetes code, so apologies if this 
> is incorrect -- but I didn't see any references to 
> {{scheduler.nodeBlacklist()}} in {{KubernetesClusterSchedulerBackend}} so it 
> seems this is missing.  Thought of this while looking at SPARK-19755, a 
> similar issue on mesos.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-01-31 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-23285:
---
Description: 
There is a strong check for an integral number of cores per executor in 
[#SparkSubmitArguments.scala#L270-L272](https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272).
 Given we're reusing that property in K8s, does it make sense to relax it?

 

K8s treats CPU as a "compressible resource" and can actually assign millicpus 
to individual containers. Also to be noted - spark.driver.cores has no such 
check in place.

  was:
There is a strong check for an integral number of cores per executor in 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272.]
 Given we're reusing that property in K8s, does it make sense to relax it?

 

K8s treats CPU as a "compressible resource" and can actually assign millicpus 
to individual containers. Also to be noted - spark.driver.cores has no such 
check in place.


> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [#SparkSubmitArguments.scala#L270-L272](https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272).
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-01-31 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-23285:
---
Description: 
There is a strong check for an integral number of cores per executor in 
[SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
 Given we're reusing that property in K8s, does it make sense to relax it?

 

K8s treats CPU as a "compressible resource" and can actually assign millicpus 
to individual containers. Also to be noted - spark.driver.cores has no such 
check in place.

  was:
There is a strong check for an integral number of cores per executor in 
[#SparkSubmitArguments.scala#L270-L272](https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272).
 Given we're reusing that property in K8s, does it make sense to relax it?

 

K8s treats CPU as a "compressible resource" and can actually assign millicpus 
to individual containers. Also to be noted - spark.driver.cores has no such 
check in place.


> Allow spark.executor.cores to be fractional
> ---
>
> Key: SPARK-23285
> URL: https://issues.apache.org/jira/browse/SPARK-23285
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Scheduler, Spark Submit
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> There is a strong check for an integral number of cores per executor in 
> [SparkSubmitArguments.scala#L270-L272|https://github.com/apache/spark/blob/3f4060c340d6bac412e8819c4388ccba226efcf3/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272].
>  Given we're reusing that property in K8s, does it make sense to relax it?
>  
> K8s treats CPU as a "compressible resource" and can actually assign millicpus 
> to individual containers. Also to be noted - spark.driver.cores has no such 
> check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23285) Allow spark.executor.cores to be fractional

2018-01-31 Thread Anirudh Ramanathan (JIRA)

Anirudh Ramanathan created SPARK-23285:
--

 Summary: Allow spark.executor.cores to be fractional
 Key: SPARK-23285
 URL: https://issues.apache.org/jira/browse/SPARK-23285
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes, Scheduler, Spark Submit
Affects Versions: 2.4.0
Reporter: Anirudh Ramanathan


There is a strong check for an integral number of cores per executor in 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L270-L272.]
 Given we're reusing that property in K8s, does it make sense to relax it?

 

K8s treats CPU as a "compressible resource" and can actually assign millicpus 
to individual containers. Also to be noted - spark.driver.cores has no such 
check in place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23200) Reset configuration when restarting from checkpoints

2018-01-24 Thread Anirudh Ramanathan (JIRA)

Anirudh Ramanathan created SPARK-23200:
--

 Summary: Reset configuration when restarting from checkpoints
 Key: SPARK-23200
 URL: https://issues.apache.org/jira/browse/SPARK-23200
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Anirudh Ramanathan


Streaming workloads and restarting from checkpoints may need additional 
changes, i.e. resetting properties -  see 
https://github.com/apache-spark-on-k8s/spark/pull/516



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22962) Kubernetes app fails if local files are used

2018-01-18 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331153#comment-16331153
 ] 

Anirudh Ramanathan commented on SPARK-22962:


I think this isn't super critical for this release, mostly a usability thing. 
If it's small enough, it makes sense, but if it introduces risk and we have to 
redo manual testing, I'd vote against getting this into 2.3.

> Kubernetes app fails if local files are used
> 
>
> Key: SPARK-22962
> URL: https://issues.apache.org/jira/browse/SPARK-22962
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> If you try to start a Spark app on kubernetes using a local file as the app 
> resource, for example, it will fail:
> {code}
> ./bin/spark-submit [[bunch of arguments]] /path/to/local/file.jar
> {code}
> {noformat}
> + /sbin/tini -s -- /bin/sh -c 'SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && 
> env | grep SPARK_JAVA_OPT_ | sed '\''s/[^=]*=\(.*\)/\1/g'
> \'' > /tmp/java_opts.txt && readarray -t SPARK_DRIVER_JAVA_OPTS < 
> /tmp/java_opts.txt && if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x}
>  ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi &&   
>   if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SP
> ARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if 
> ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK
> _MOUNTED_FILES_DIR/." .; fi && ${JAVA_HOME}/bin/java 
> "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMOR
> Y -Xmx$SPARK_DRIVER_MEMORY 
> -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS 
> $SPARK_DRIVER_ARGS'
> Error: Could not find or load main class com.cloudera.spark.tests.Sleeper
> {noformat}
> Using an http server to provide the app jar solves the problem.
> The k8s backend should either somehow make these files available to the 
> cluster or error out with a more user-friendly message if that feature is not 
> yet available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23146) Support client mode for Kubernetes cluster backend

2018-01-18 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-23146:
---
Target Version/s: 2.4.0

> Support client mode for Kubernetes cluster backend
> --
>
> Key: SPARK-23146
> URL: https://issues.apache.org/jira/browse/SPARK-23146
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> This issue tracks client mode support within Spark when running in the 
> Kubernetes cluster backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22962) Kubernetes app fails if local files are used

2018-01-18 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-22962:
---
Affects Version/s: (was: 2.4.0)
   2.3.0

> Kubernetes app fails if local files are used
> 
>
> Key: SPARK-22962
> URL: https://issues.apache.org/jira/browse/SPARK-22962
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> If you try to start a Spark app on kubernetes using a local file as the app 
> resource, for example, it will fail:
> {code}
> ./bin/spark-submit [[bunch of arguments]] /path/to/local/file.jar
> {code}
> {noformat}
> + /sbin/tini -s -- /bin/sh -c 'SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && 
> env | grep SPARK_JAVA_OPT_ | sed '\''s/[^=]*=\(.*\)/\1/g'
> \'' > /tmp/java_opts.txt && readarray -t SPARK_DRIVER_JAVA_OPTS < 
> /tmp/java_opts.txt && if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x}
>  ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi &&   
>   if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SP
> ARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if 
> ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK
> _MOUNTED_FILES_DIR/." .; fi && ${JAVA_HOME}/bin/java 
> "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMOR
> Y -Xmx$SPARK_DRIVER_MEMORY 
> -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS 
> $SPARK_DRIVER_ARGS'
> Error: Could not find or load main class com.cloudera.spark.tests.Sleeper
> {noformat}
> Using an http server to provide the app jar solves the problem.
> The k8s backend should either somehow make these files available to the 
> cluster or error out with a more user-friendly message if that feature is not 
> yet available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23146) Support client mode for Kubernetes cluster backend

2018-01-18 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-23146:
---
Affects Version/s: (was: 2.4.0)
   2.3.0

> Support client mode for Kubernetes cluster backend
> --
>
> Key: SPARK-23146
> URL: https://issues.apache.org/jira/browse/SPARK-23146
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> This issue tracks client mode support within Spark when running in the 
> Kubernetes cluster backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22962) Kubernetes app fails if local files are used

2018-01-18 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan updated SPARK-22962:
---
Affects Version/s: (was: 2.3.0)
   2.4.0

> Kubernetes app fails if local files are used
> 
>
> Key: SPARK-22962
> URL: https://issues.apache.org/jira/browse/SPARK-22962
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> If you try to start a Spark app on kubernetes using a local file as the app 
> resource, for example, it will fail:
> {code}
> ./bin/spark-submit [[bunch of arguments]] /path/to/local/file.jar
> {code}
> {noformat}
> + /sbin/tini -s -- /bin/sh -c 'SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && 
> env | grep SPARK_JAVA_OPT_ | sed '\''s/[^=]*=\(.*\)/\1/g'
> \'' > /tmp/java_opts.txt && readarray -t SPARK_DRIVER_JAVA_OPTS < 
> /tmp/java_opts.txt && if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x}
>  ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi &&   
>   if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SP
> ARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if 
> ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK
> _MOUNTED_FILES_DIR/." .; fi && ${JAVA_HOME}/bin/java 
> "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMOR
> Y -Xmx$SPARK_DRIVER_MEMORY 
> -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS 
> $SPARK_DRIVER_ARGS'
> Error: Could not find or load main class com.cloudera.spark.tests.Sleeper
> {noformat}
> Using an http server to provide the app jar solves the problem.
> The k8s backend should either somehow make these files available to the 
> cluster or error out with a more user-friendly message if that feature is not 
> yet available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22962) Kubernetes app fails if local files are used

2018-01-18 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331085#comment-16331085
 ] 

Anirudh Ramanathan commented on SPARK-22962:


This is the resource staging server use-case. We'll upstream this in the 2.4.0 
timeframe.

> Kubernetes app fails if local files are used
> 
>
> Key: SPARK-22962
> URL: https://issues.apache.org/jira/browse/SPARK-22962
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> If you try to start a Spark app on kubernetes using a local file as the app 
> resource, for example, it will fail:
> {code}
> ./bin/spark-submit [[bunch of arguments]] /path/to/local/file.jar
> {code}
> {noformat}
> + /sbin/tini -s -- /bin/sh -c 'SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && 
> env | grep SPARK_JAVA_OPT_ | sed '\''s/[^=]*=\(.*\)/\1/g'
> \'' > /tmp/java_opts.txt && readarray -t SPARK_DRIVER_JAVA_OPTS < 
> /tmp/java_opts.txt && if ! [ -z ${SPARK_MOUNTED_CLASSPATH+x}
>  ]; then SPARK_CLASSPATH="$SPARK_MOUNTED_CLASSPATH:$SPARK_CLASSPATH"; fi &&   
>   if ! [ -z ${SPARK_SUBMIT_EXTRA_CLASSPATH+x} ]; then SP
> ARK_CLASSPATH="$SPARK_SUBMIT_EXTRA_CLASSPATH:$SPARK_CLASSPATH"; fi && if 
> ! [ -z ${SPARK_MOUNTED_FILES_DIR+x} ]; then cp -R "$SPARK
> _MOUNTED_FILES_DIR/." .; fi && ${JAVA_HOME}/bin/java 
> "${SPARK_DRIVER_JAVA_OPTS[@]}" -cp "$SPARK_CLASSPATH" -Xms$SPARK_DRIVER_MEMOR
> Y -Xmx$SPARK_DRIVER_MEMORY 
> -Dspark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS $SPARK_DRIVER_CLASS 
> $SPARK_DRIVER_ARGS'
> Error: Could not find or load main class com.cloudera.spark.tests.Sleeper
> {noformat}
> Using an http server to provide the app jar solves the problem.
> The k8s backend should either somehow make these files available to the 
> cluster or error out with a more user-friendly message if that feature is not 
> yet available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23016) Spark UI access and documentation

2018-01-18 Thread Anirudh Ramanathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Ramanathan resolved SPARK-23016.

Resolution: Fixed

This is resolved and we've verified it for 2.3.0.

> Spark UI access and documentation
> -
>
> Key: SPARK-23016
> URL: https://issues.apache.org/jira/browse/SPARK-23016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> We should have instructions to access the spark driver UI, or instruct users 
> to create a service to expose it.
> Also might need an integration test to verify that the driver UI works as 
> expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23082) Allow separate node selectors for driver and executors in Kubernetes

2018-01-18 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331081#comment-16331081
 ] 

Anirudh Ramanathan edited comment on SPARK-23082 at 1/18/18 7:52 PM:
-

This is an interesting feature request. We had some discussion about this. I 
think it's a bit too late for a feature request in 2.3, so, we can revisit this 
in the 2.4 timeframe.

 

[~mcheah]


was (Author: foxish):
This is an interesting feature request. We had some discussion about this. I 
think it's a bit too late for a feature request in 2.3, so, we can revisit this 
in the 2.4 timeframe.

> Allow separate node selectors for driver and executors in Kubernetes
> 
>
> Key: SPARK-23082
> URL: https://issues.apache.org/jira/browse/SPARK-23082
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>
> In YARN, we can use {{spark.yarn.am.nodeLabelExpression}} to submit the Spark 
> driver to a different set of nodes from its executors. In Kubernetes, we can 
> specify {{spark.kubernetes.node.selector.[labelKey]}}, but we can't use 
> separate options for the driver and executors.
> This would be useful for the particular use case where executors can go on 
> more ephemeral nodes (eg, with cluster autoscaling, or preemptible/spot 
> instances), but the driver should use a more persistent machine.
> The required change would be minimal, essentially just using different config 
> keys for the 
> [driver|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala#L90]
>  and 
> [executor|https://github.com/apache/spark/blob/0b2eefb674151a0af64806728b38d9410da552ec/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L73]
>  instead of {{KUBERNETES_NODE_SELECTOR_PREFIX}} for both.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23082) Allow separate node selectors for driver and executors in Kubernetes

2018-01-18 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331081#comment-16331081
 ] 

Anirudh Ramanathan commented on SPARK-23082:


This is an interesting feature request. We had some discussion about this. I 
think it's a bit too late for a feature request in 2.3, so, we can revisit this 
in the 2.4 timeframe.

> Allow separate node selectors for driver and executors in Kubernetes
> 
>
> Key: SPARK-23082
> URL: https://issues.apache.org/jira/browse/SPARK-23082
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Oz Ben-Ami
>Priority: Minor
>
> In YARN, we can use {{spark.yarn.am.nodeLabelExpression}} to submit the Spark 
> driver to a different set of nodes from its executors. In Kubernetes, we can 
> specify {{spark.kubernetes.node.selector.[labelKey]}}, but we can't use 
> separate options for the driver and executors.
> This would be useful for the particular use case where executors can go on 
> more ephemeral nodes (eg, with cluster autoscaling, or preemptible/spot 
> instances), but the driver should use a more persistent machine.
> The required change would be minimal, essentially just using different config 
> keys for the 
> [driver|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/BasicDriverConfigurationStep.scala#L90]
>  and 
> [executor|https://github.com/apache/spark/blob/0b2eefb674151a0af64806728b38d9410da552ec/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodFactory.scala#L73]
>  instead of {{KUBERNETES_NODE_SELECTOR_PREFIX}} for both.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23133) Spark options are not passed to the Executor in Docker context

2018-01-18 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331074#comment-16331074
 ] 

Anirudh Ramanathan commented on SPARK-23133:


Thanks for submitting the PR fixing this.

> Spark options are not passed to the Executor in Docker context
> --
>
> Key: SPARK-23133
> URL: https://issues.apache.org/jira/browse/SPARK-23133
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
> Environment: Running Spark on K8s using supplied Docker image.
>Reporter: Andrew Korzhuev
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Reproduce:
>  # Build image with `bin/docker-image-tool.sh`.
>  # Submit application to k8s. Set executor options, e.g. ` --conf 
> "spark.executor. 
> extraJavaOptions=-Djava.security.auth.login.config=./jaas.conf"`
>  # Visit Spark UI on executor and notice that option is not set.
> Expected behavior: options from spark-submit should be correctly passed to 
> executor.
> Cause:
> `SPARK_EXECUTOR_JAVA_OPTS` is not defined in `entrypoint.sh`
> https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L70
> [https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L44-L45]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23016) Spark UI access and documentation

2018-01-18 Thread Anirudh Ramanathan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16330389#comment-16330389
 ] 

Anirudh Ramanathan commented on SPARK-23016:


Good point - yeah, we don't recommend the API server proxy in the docs anymore 
- 
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc1-docs/_site/running-on-kubernetes.html.

> Spark UI access and documentation
> -
>
> Key: SPARK-23016
> URL: https://issues.apache.org/jira/browse/SPARK-23016
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> We should have instructions to access the spark driver UI, or instruct users 
> to create a service to expose it.
> Also might need an integration test to verify that the driver UI works as 
> expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 172 matches

Mail list logo