[jira] [Commented] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s

2020-07-13 Thread Rob Vesse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156627#comment-17156627
 ] 

Rob Vesse commented on SPARK-32259:
---

bq. We use Spark launcher to do spark submit in k8s. Since it is evicted, the 
pod logs for stack trace is not available. we have only pod events given in 
attachment

You should still be able to use {{kubectl logs}} to retrieve the logs of 
terminated pods unless these are executor pods that are being evicted since I 
believe Spark cleans those up automatically.  You can add 
{{spark.kubernetes.executor.deleteOnTermination=false}} to your configuration 
to disable this behaviour so that you can go and retrieve those logs later.

> tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
> ---
>
> Key: SPARK-32259
> URL: https://issues.apache.org/jira/browse/SPARK-32259
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prakash Rajendran
>Priority: Blocker
> Attachments: Capture.PNG
>
>
> In Spark-Submit, I have these config 
> "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark 
> is not pointing its spill data to SPARK_LOCAL_DIRS path.
> K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local 
> storage usage exceeds the total limit of containers.*{color}"
>  
> We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod 
> logs for stack trace is not available. we have only pod events given in 
> attachment
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s

2020-07-13 Thread Rob Vesse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156626#comment-17156626
 ] 

Rob Vesse edited comment on SPARK-32259 at 7/13/20, 10:32 AM:
--

[~prakki79] Ideally you'd also include the following in your report:

* The full {{spark-submit}} command
* The {{spark-defaults.conf}} or whatever configuration file you are using (if 
any)
* The {{kubectl describe pod}} output for the relevant pod(s)
* The {{kubectl get pod -o=yaml}} output for the relevant pod(s)

bq. I have these config "spark.kubernetes.local.dirs.tmpfs=true", still spark 
is not pointing its spill data to SPARK_LOCAL_DIRS path.

Nothing you have shown so far suggests that this is true, all that 
configuration setting does is change how Spark configures the relevant 
{{emptyDir}} volume used for ephemeral storage (and that's assuming you haven't 
supplied other configuration that explicitly configures local directories).

You can exhaust an in-memory volume in exactly the same as you exhaust a disk 
based volume and get your pod evicted.  Note that when using in-memory volumes 
then you may need to adjust the amount of memory allocated to your pod per the 
documentation - 
http://spark.apache.org/docs/latest/running-on-kubernetes.html#using-ram-for-local-storage




was (Author: rvesse):
[~prakki79] Ideally you'd also include the following in your report:

* The full {{spark-submit}} command
* The {{kubectl describe pod}} output for the relevant pod(s)
* The {{kubectl get pod -o=yaml}} output for the relevant pod(s)

bq. I have these config "spark.kubernetes.local.dirs.tmpfs=true", still spark 
is not pointing its spill data to SPARK_LOCAL_DIRS path.

Nothing you have shown so far suggests that this is true, all that 
configuration setting does is change how Spark configures the relevant 
{{emptyDir}} volume used for ephemeral storage (and that's assuming you haven't 
supplied other configuration that explicitly configures local directories).

You can exhaust an in-memory volume in exactly the same as you exhaust a disk 
based volume and get your pod evicted.  Note that when using in-memory volumes 
then you may need to adjust the amount of memory allocated to your pod per the 
documentation - 
http://spark.apache.org/docs/latest/running-on-kubernetes.html#using-ram-for-local-storage



> tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
> ---
>
> Key: SPARK-32259
> URL: https://issues.apache.org/jira/browse/SPARK-32259
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prakash Rajendran
>Priority: Blocker
> Attachments: Capture.PNG
>
>
> In Spark-Submit, I have these config 
> "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark 
> is not pointing its spill data to SPARK_LOCAL_DIRS path.
> K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local 
> storage usage exceeds the total limit of containers.*{color}"
>  
> We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod 
> logs for stack trace is not available. we have only pod events given in 
> attachment
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32259) tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s

2020-07-13 Thread Rob Vesse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156626#comment-17156626
 ] 

Rob Vesse commented on SPARK-32259:
---

[~prakki79] Ideally you'd also include the following in your report:

* The full {{spark-submit}} command
* The {{kubectl describe pod}} output for the relevant pod(s)
* The {{kubectl get pod -o=yaml}} output for the relevant pod(s)

bq. I have these config "spark.kubernetes.local.dirs.tmpfs=true", still spark 
is not pointing its spill data to SPARK_LOCAL_DIRS path.

Nothing you have shown so far suggests that this is true, all that 
configuration setting does is change how Spark configures the relevant 
{{emptyDir}} volume used for ephemeral storage (and that's assuming you haven't 
supplied other configuration that explicitly configures local directories).

You can exhaust an in-memory volume in exactly the same as you exhaust a disk 
based volume and get your pod evicted.  Note that when using in-memory volumes 
then you may need to adjust the amount of memory allocated to your pod per the 
documentation - 
http://spark.apache.org/docs/latest/running-on-kubernetes.html#using-ram-for-local-storage



> tmpfs=true, not pointing to SPARK_LOCAL_DIRS in k8s
> ---
>
> Key: SPARK-32259
> URL: https://issues.apache.org/jira/browse/SPARK-32259
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prakash Rajendran
>Priority: Blocker
> Attachments: Capture.PNG
>
>
> In Spark-Submit, I have these config 
> "{color:#4c9aff}*spark.kubernetes.local.dirs.tmpfs=true*{color}", still spark 
> is not pointing its spill data to SPARK_LOCAL_DIRS path.
> K8s is evicting the pod due to error "{color:#de350b}*Pod ephemeral local 
> storage usage exceeds the total limit of containers.*{color}"
>  
> We use Spark launcher to do spark submit in k8s. Since it is evicted, the pod 
> logs for stack trace is not available. we have only pod events given in 
> attachment
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28649) Git Ignore does not ignore python/.eggs

2019-08-07 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-28649:
-

 Summary: Git Ignore does not ignore python/.eggs
 Key: SPARK-28649
 URL: https://issues.apache.org/jira/browse/SPARK-28649
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.4.3
Reporter: Rob Vesse


Currently the {{python/.eggs}} folder is not in the {{.gitignore}} file.  If 
you are building a Spark distribution from your working copy and enabling 
Python distribution as part of that you'll end up with this folder present and 
Git will always warn you that it has untracked changes as a result.  Since this 
directory contains transient build artifacts this should be ignored.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

2019-04-24 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825132#comment-16825132
 ] 

Rob Vesse commented on SPARK-25262:
---

[~Udbhav Agrawal] Yes I think an approach like that would be acceptable to the 
community (and if not then I don't know what will be).  If you want to take a 
stab at doing this please feel free

> Make Spark local dir volumes configurable with Spark on Kubernetes
> --
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while 
> providing pod templates will provide more in-depth customisation for Spark on 
> Kubernetes there are some things that cannot be modified because Spark code 
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} 
> which is done by {{LocalDirsFeatureStep.scala}}.  For each directory 
> specified, or a single default if no explicit specification, it creates a 
> Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation 
> this will be backed by the node storage 
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
> compute environments this may be extremely undesirable.  For example with 
> diskless compute resources the node storage will likely be a non-performant 
> remote mounted disk, often with limited capacity.  For such environments it 
> would likely be better to set {{medium: Memory}} on the volume per the K8S 
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different 
> volume type to back the local directories and there is no possibility to do 
> that.
> Pod templates will not really solve either of these issues because Spark is 
> always going to attempt to generate a new volume for each local directory and 
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
> volumes
> * Modify the logic to check if there is a volume already defined with the 
> name and if so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27063) Spark on K8S Integration Tests timeouts are too short for some test clusters

2019-03-06 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786051#comment-16786051
 ] 

Rob Vesse commented on SPARK-27063:
---

[~skonto] Yes we have experienced the same problem, I think my next PR for this 
will look to make that overall timeout user configurable

> Spark on K8S Integration Tests timeouts are too short for some test clusters
> 
>
> Key: SPARK-27063
> URL: https://issues.apache.org/jira/browse/SPARK-27063
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Rob Vesse
>Priority: Minor
>
> As noted during development for SPARK-26729 there are a couple of integration 
> test timeouts that are too short when running on slower clusters e.g. 
> developers laptops, small CI clusters etc
> [~skonto] confirmed that he has also experienced this behaviour in the 
> discussion on PR [PR 
> 23846|https://github.com/apache/spark/pull/23846#discussion_r262564938]
> We should up the defaults of this timeouts as an initial step and longer term 
> consider making the timeouts themselves configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27063) Spark on K8S Integration Tests timeouts are too short for some test clusters

2019-03-05 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-27063:
-

 Summary: Spark on K8S Integration Tests timeouts are too short for 
some test clusters
 Key: SPARK-27063
 URL: https://issues.apache.org/jira/browse/SPARK-27063
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Rob Vesse


As noted during development for SPARK-26729 there are a couple of integration 
test timeouts that are too short when running on slower clusters e.g. 
developers laptops, small CI clusters etc

[~skonto] confirmed that he has also experienced this behaviour in the 
discussion on PR [PR 
23846|https://github.com/apache/spark/pull/23846#discussion_r262564938]

We should up the defaults of this timeouts as an initial step and longer term 
consider making the timeouts themselves configurable




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements

2019-02-06 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761667#comment-16761667
 ] 

Rob Vesse commented on SPARK-26833:
---

Although not sure the latter is doable.  With {{kubectl}} you can do {{--as 
system:serviceaccounts:namespace:account}} but I can't see any obvious way to 
do that with Fabric 8 unless you have the service account token present 
locally.  We might be able to explicitly obtain the token for the relevant 
service account and then reconfigure a fresh client based on that but it would 
be a significant change to the existing behaviour.

> Kubernetes RBAC documentation is unclear on exact RBAC requirements
> ---
>
> Key: SPARK-26833
> URL: https://issues.apache.org/jira/browse/SPARK-26833
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> I've seen a couple of users get bitten by this in informal discussions on 
> GitHub and Slack.  Basically the user sets up the service account and 
> configures Spark to use it as described in the documentation but then when 
> they try and run a job they encounter an error like the following:
> {quote}019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: 
> HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: 
> User "system:anonymous" cannot watch pods in the namespace "default"
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: pods 
> "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
> watch pods in the namespace "default"{quote}
> This error stems from the fact that the configured service account is only 
> used by the driver pod and not by the submission client.  The submission 
> client wants to do driver pod monitoring which it does with the users 
> submission credentials *NOT* the service account as the user might expect.
> It seems like there are two ways to resolve this issue:
> * Improve the documentation to clarify the current situation
> * Ensure that if a service account is configured we always use it even on the 
> submission client
> The former is the easy fix, the latter is more invasive and may have other 
> knock on effects so we should start with the former and discuss the 
> feasibility of the latter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements

2019-02-06 Thread Rob Vesse (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse updated SPARK-26833:
--
Description: 
I've seen a couple of users get bitten by this in informal discussions on 
GitHub and Slack.  Basically the user sets up the service account and 
configures Spark to use it as described in the documentation but then when they 
try and run a job they encounter an error like the following:

{noformat}
019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
"system:anonymous" cannot watch pods in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
...
Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: pods 
"spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
watch pods in the namespace "default"
{noformat}

This error stems from the fact that the configured service account is only used 
by the driver pod and not by the submission client.  The submission client 
wants to do driver pod monitoring which it does with the users submission 
credentials *NOT* the service account as the user might expect.

It seems like there are two ways to resolve this issue:

* Improve the documentation to clarify the current situation
* Ensure that if a service account is configured we always use it even on the 
submission client

The former is the easy fix, the latter is more invasive and may have other 
knock on effects so we should start with the former and discuss the feasibility 
of the latter.

  was:
I've seen a couple of users get bitten by this in informal discussions on 
GitHub and Slack.  Basically the user sets up the service account and 
configures Spark to use it as described in the documentation but then when they 
try and run a job they encounter an error like the following:

{noformat}
019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
"system:anonymous" cannot watch pods in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
...
Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: pods 
"spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
watch pods in the namespace "default"
{noformat}

This error stems from the fact that the configured service account is only used 
by the driver pod and not by the submission client.  The submission client 
wants to do driver pod monitoring which it does with the users submission 
credentials **NOT** the service account as the user might expect.

It seems like there are two ways to resolve this issue:

* Improve the documentation to clarify the current situation
* Ensure that if a service account is configured we always use it even on the 
submission client

The former is the easy fix, the latter is more invasive and may have other 
knock on effects so we should start with the former and discuss the feasibility 
of the latter.


> Kubernetes RBAC documentation is unclear on exact RBAC requirements
> ---
>
> Key: SPARK-26833
> URL: https://issues.apache.org/jira/browse/SPARK-26833
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> I've seen a couple of users get bitten by this in informal discussions on 
> GitHub and Slack.  Basically the user sets up the service account and 
> configures Spark to use it as described in the documentation but then when 
> they try and run a job they encounter an error like the following:
> {noformat}
> 019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
> Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
> "system:anonymous" cannot watch pods in the namespace "default"
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> ...
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: pods 
> "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
> watch pods in the namespace "default"
> {noformat}
> This error stems from the fact that the configured service account is only 
> used by the driver pod and not by the submission client.  The submission 
> client wants to do driver pod monitoring which it does with the users 
> submission credentials *NOT* the service account as the user might expect.
> It seems like there are two ways to resolve this issue:
> * Improve the documentation to clarify the current situation
> * Ensure that if a service account is 

[jira] [Updated] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements

2019-02-06 Thread Rob Vesse (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse updated SPARK-26833:
--
Description: 
I've seen a couple of users get bitten by this in informal discussions on 
GitHub and Slack.  Basically the user sets up the service account and 
configures Spark to use it as described in the documentation but then when they 
try and run a job they encounter an error like the following:

{quote}019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 
403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
"system:anonymous" cannot watch pods in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: pods 
"spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
watch pods in the namespace "default"{quote}

This error stems from the fact that the configured service account is only used 
by the driver pod and not by the submission client.  The submission client 
wants to do driver pod monitoring which it does with the users submission 
credentials *NOT* the service account as the user might expect.

It seems like there are two ways to resolve this issue:

* Improve the documentation to clarify the current situation
* Ensure that if a service account is configured we always use it even on the 
submission client

The former is the easy fix, the latter is more invasive and may have other 
knock on effects so we should start with the former and discuss the feasibility 
of the latter.

  was:
I've seen a couple of users get bitten by this in informal discussions on 
GitHub and Slack.  Basically the user sets up the service account and 
configures Spark to use it as described in the documentation but then when they 
try and run a job they encounter an error like the following:

{noformat}
019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
"system:anonymous" cannot watch pods in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: pods 
"spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
watch pods in the namespace "default"
{noformat}

This error stems from the fact that the configured service account is only used 
by the driver pod and not by the submission client.  The submission client 
wants to do driver pod monitoring which it does with the users submission 
credentials *NOT* the service account as the user might expect.

It seems like there are two ways to resolve this issue:

* Improve the documentation to clarify the current situation
* Ensure that if a service account is configured we always use it even on the 
submission client

The former is the easy fix, the latter is more invasive and may have other 
knock on effects so we should start with the former and discuss the feasibility 
of the latter.


> Kubernetes RBAC documentation is unclear on exact RBAC requirements
> ---
>
> Key: SPARK-26833
> URL: https://issues.apache.org/jira/browse/SPARK-26833
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> I've seen a couple of users get bitten by this in informal discussions on 
> GitHub and Slack.  Basically the user sets up the service account and 
> configures Spark to use it as described in the documentation but then when 
> they try and run a job they encounter an error like the following:
> {quote}019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: 
> HTTP 403, Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: 
> User "system:anonymous" cannot watch pods in the namespace "default"
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: pods 
> "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
> watch pods in the namespace "default"{quote}
> This error stems from the fact that the configured service account is only 
> used by the driver pod and not by the submission client.  The submission 
> client wants to do driver pod monitoring which it does with the users 
> submission credentials *NOT* the service account as the user might expect.
> It seems like there are two ways to resolve this issue:
> * Improve the documentation to clarify the current situation
> * Ensure that if a service account is configured we always use it even on 

[jira] [Updated] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements

2019-02-06 Thread Rob Vesse (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse updated SPARK-26833:
--
Description: 
I've seen a couple of users get bitten by this in informal discussions on 
GitHub and Slack.  Basically the user sets up the service account and 
configures Spark to use it as described in the documentation but then when they 
try and run a job they encounter an error like the following:

{noformat}
019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
"system:anonymous" cannot watch pods in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: pods 
"spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
watch pods in the namespace "default"
{noformat}

This error stems from the fact that the configured service account is only used 
by the driver pod and not by the submission client.  The submission client 
wants to do driver pod monitoring which it does with the users submission 
credentials *NOT* the service account as the user might expect.

It seems like there are two ways to resolve this issue:

* Improve the documentation to clarify the current situation
* Ensure that if a service account is configured we always use it even on the 
submission client

The former is the easy fix, the latter is more invasive and may have other 
knock on effects so we should start with the former and discuss the feasibility 
of the latter.

  was:
I've seen a couple of users get bitten by this in informal discussions on 
GitHub and Slack.  Basically the user sets up the service account and 
configures Spark to use it as described in the documentation but then when they 
try and run a job they encounter an error like the following:

{noformat}
019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
"system:anonymous" cannot watch pods in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
...
Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: pods 
"spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
watch pods in the namespace "default"
{noformat}

This error stems from the fact that the configured service account is only used 
by the driver pod and not by the submission client.  The submission client 
wants to do driver pod monitoring which it does with the users submission 
credentials *NOT* the service account as the user might expect.

It seems like there are two ways to resolve this issue:

* Improve the documentation to clarify the current situation
* Ensure that if a service account is configured we always use it even on the 
submission client

The former is the easy fix, the latter is more invasive and may have other 
knock on effects so we should start with the former and discuss the feasibility 
of the latter.


> Kubernetes RBAC documentation is unclear on exact RBAC requirements
> ---
>
> Key: SPARK-26833
> URL: https://issues.apache.org/jira/browse/SPARK-26833
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> I've seen a couple of users get bitten by this in informal discussions on 
> GitHub and Slack.  Basically the user sets up the service account and 
> configures Spark to use it as described in the documentation but then when 
> they try and run a job they encounter an error like the following:
> {noformat}
> 019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
> Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
> "system:anonymous" cannot watch pods in the namespace "default"
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> Exception in thread "main" 
> io.fabric8.kubernetes.client.KubernetesClientException: pods 
> "spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
> watch pods in the namespace "default"
> {noformat}
> This error stems from the fact that the configured service account is only 
> used by the driver pod and not by the submission client.  The submission 
> client wants to do driver pod monitoring which it does with the users 
> submission credentials *NOT* the service account as the user might expect.
> It seems like there are two ways to resolve this issue:
> * Improve the documentation to clarify the current situation
> * Ensure that if a service account is configured we 

[jira] [Created] (SPARK-26833) Kubernetes RBAC documentation is unclear on exact RBAC requirements

2019-02-06 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-26833:
-

 Summary: Kubernetes RBAC documentation is unclear on exact RBAC 
requirements
 Key: SPARK-26833
 URL: https://issues.apache.org/jira/browse/SPARK-26833
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0
Reporter: Rob Vesse


I've seen a couple of users get bitten by this in informal discussions on 
GitHub and Slack.  Basically the user sets up the service account and 
configures Spark to use it as described in the documentation but then when they 
try and run a job they encounter an error like the following:

{noformat}
019-02-05 20:29:02 WARN  WatchConnectionManager:185 - Exec Failure: HTTP 403, 
Status: 403 - pods "spark-pi-1549416541302-driver" is forbidden: User 
"system:anonymous" cannot watch pods in the namespace "default"
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
...
Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: pods 
"spark-pi-1549416541302-driver" is forbidden: User "system:anonymous" cannot 
watch pods in the namespace "default"
{noformat}

This error stems from the fact that the configured service account is only used 
by the driver pod and not by the submission client.  The submission client 
wants to do driver pod monitoring which it does with the users submission 
credentials **NOT** the service account as the user might expect.

It seems like there are two ways to resolve this issue:

* Improve the documentation to clarify the current situation
* Ensure that if a service account is configured we always use it even on the 
submission client

The former is the easy fix, the latter is more invasive and may have other 
knock on effects so we should start with the former and discuss the feasibility 
of the latter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26729) Spark on Kubernetes tooling hardcodes default image names

2019-01-25 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-26729:
-

 Summary: Spark on Kubernetes tooling hardcodes default image names
 Key: SPARK-26729
 URL: https://issues.apache.org/jira/browse/SPARK-26729
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Rob Vesse


Both when creating images with {{bin/docker-image-tool.sh}} and when running 
the Kubernetes integration tests the image names are hardcoded to {{spark}}, 
{{spark-py}} and {{spark-r}}.

If you are producing custom images in some other way (e.g. a CI/CD process that 
doesn't use the script) or are required to use a different naming convention 
due to company policy e.g. prefixing with vendor name (e.g. {{apache-spark}}) 
then you can't directly create/test your images with the desired names.

You can of course simply re-tag the images with the desired names but this 
might not be possible in some CI/CD pipelines especially if naming conventions 
are being enforced at the registry level.

It would be nice if the default image names were customisable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Rob Vesse (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse resolved SPARK-26704.
---
Resolution: Not A Problem

> docker-image-tool.sh should copy custom Dockerfiles into the build context 
> for inclusion in images
> --
>
> Key: SPARK-26704
> URL: https://issues.apache.org/jira/browse/SPARK-26704
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> As surfaced in the discussion on the PR for SPARK-26687 
> (https://github.com/apache/spark/pull/23613) when using custom Dockerfiles 
> these are not copied into the build context.  Rather the build context 
> includes the default Dockerfiles from Spark regardless of what Dockerfiles 
> the end user actually used to build the images.
> The suggestion in the PR was that the script should copy in the custom 
> Dockerfiles over the stock  Dockerfiles.  This potentially aids in 
> reproducing the images later because someone with an image can get the exact 
> Dockerfile used to build that image.
> A related issue is that the script allows for and even in some cases 
> implicitly uses Docker build arguments as part of building the images.  In 
> the case where build arguments are used these should probably also be 
> captured in the image to aid reproducibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750366#comment-16750366
 ] 

Rob Vesse commented on SPARK-26704:
---

Yes sorry I'm conflating with the build context with the image contents.  So 
you're correct there isn't anything to do here.  Will close as Not a Problem

> docker-image-tool.sh should copy custom Dockerfiles into the build context 
> for inclusion in images
> --
>
> Key: SPARK-26704
> URL: https://issues.apache.org/jira/browse/SPARK-26704
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> As surfaced in the discussion on the PR for SPARK-26687 
> (https://github.com/apache/spark/pull/23613) when using custom Dockerfiles 
> these are not copied into the build context.  Rather the build context 
> includes the default Dockerfiles from Spark regardless of what Dockerfiles 
> the end user actually used to build the images.
> The suggestion in the PR was that the script should copy in the custom 
> Dockerfiles over the stock  Dockerfiles.  This potentially aids in 
> reproducing the images later because someone with an image can get the exact 
> Dockerfile used to build that image.
> A related issue is that the script allows for and even in some cases 
> implicitly uses Docker build arguments as part of building the images.  In 
> the case where build arguments are used these should probably also be 
> captured in the image to aid reproducibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750323#comment-16750323
 ] 

Rob Vesse commented on SPARK-26704:
---

For me it's a question of build reproducibility (I've been following an 
interesting discussion around this on legal-discuss - 
https://lists.apache.org/thread.html/d578819f1afa6b8fb697ea72083e0fb05e43938a23d6e7bb804069b8@%3Clegal-discuss.apache.org%3E).
  If I crack open the image and start poking around and find a Dockerfile 
present do I have a reasonable expectation that the Dockerfile I find there is 
the one used to build the image?

If Yes, then we should ensure we include the correct Dockerfile's in the build 
context and thus the image.

If No, then we should probably not bother including the Dockerfile's at all.  
However since as you point out when building from a Spark release distribution 
they will be present and thus packaged into the image I would suspect we want 
to continue doing this even for developer builds.

> docker-image-tool.sh should copy custom Dockerfiles into the build context 
> for inclusion in images
> --
>
> Key: SPARK-26704
> URL: https://issues.apache.org/jira/browse/SPARK-26704
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> As surfaced in the discussion on the PR for SPARK-26687 
> (https://github.com/apache/spark/pull/23613) when using custom Dockerfiles 
> these are not copied into the build context.  Rather the build context 
> includes the default Dockerfiles from Spark regardless of what Dockerfiles 
> the end user actually used to build the images.
> The suggestion in the PR was that the script should copy in the custom 
> Dockerfiles over the stock  Dockerfiles.  This potentially aids in 
> reproducing the images later because someone with an image can get the exact 
> Dockerfile used to build that image.
> A related issue is that the script allows for and even in some cases 
> implicitly uses Docker build arguments as part of building the images.  In 
> the case where build arguments are used these should probably also be 
> captured in the image to aid reproducibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26704) docker-image-tool.sh should copy custom Dockerfiles into the build context for inclusion in images

2019-01-23 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-26704:
-

 Summary: docker-image-tool.sh should copy custom Dockerfiles into 
the build context for inclusion in images
 Key: SPARK-26704
 URL: https://issues.apache.org/jira/browse/SPARK-26704
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Rob Vesse


As surfaced in the discussion on the PR for SPARK-26687 
(https://github.com/apache/spark/pull/23613) when using custom Dockerfiles 
these are not copied into the build context.  Rather the build context includes 
the default Dockerfiles from Spark regardless of what Dockerfiles the end user 
actually used to build the images.

The suggestion in the PR was that the script should copy in the custom 
Dockerfiles over the stock  Dockerfiles.  This potentially aids in reproducing 
the images later because someone with an image can get the exact Dockerfile 
used to build that image.

A related issue is that the script allows for and even in some cases implicitly 
uses Docker build arguments as part of building the images.  In the case where 
build arguments are used these should probably also be captured in the image to 
aid reproducibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26687) Building Spark Images has non-intuitive behaviour with paths to custom Dockerfiles

2019-01-22 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-26687:
-

 Summary: Building Spark Images has non-intuitive behaviour with 
paths to custom Dockerfiles
 Key: SPARK-26687
 URL: https://issues.apache.org/jira/browse/SPARK-26687
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Rob Vesse


With the changes from SPARK-26025 (https://github.com/apache/spark/pull/23019) 
we use a pared down Docker build context which significantly improves build 
times.  However the way this is implemented leads to non-intuitive behaviour 
when supplying custom Docker file paths.  This is because of the following code 
snippets:

{code}
(cd $(img_ctx_dir base) && docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
-t $(image_ref spark) \
-f "$BASEDOCKERFILE" .)
{code}

Since the script changes to the temporary build context directory and then runs 
{{docker build}} there any path given for the Docker file is taken as relative 
to the temporary build context directory rather than to the directory where the 
user invoked the script.  This produces somewhat unhelpful errors e.g.

{noformat}
> ./bin/docker-image-tool.sh -r rvesse -t badpath -p 
> resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
>  build
Sending build context to Docker daemon  218.4MB
Step 1/15 : FROM openjdk:8-alpine
 ---> 5801f7d008e5
Step 2/15 : ARG spark_uid=185
 ---> Using cache
 ---> 5fd63df1ca39
...
Successfully tagged rvesse/spark:badpath
unable to prepare context: unable to evaluate symlinks in Dockerfile path: 
lstat 
/Users/rvesse/Documents/Work/Code/spark/target/tmp/docker/pyspark/resource-managers:
 no such file or directory
Failed to build PySpark Docker image, please refer to Docker build output for 
details.
{noformat}

Here we can see that the relative path that was valid where the user typed the 
command was not valid inside the build context directory.

To resolve this we need to ensure that we are resolving relative paths to 
Docker files appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26685) Building Spark Images with latest Docker does not honour spark_uid build argument

2019-01-22 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748605#comment-16748605
 ] 

Rob Vesse commented on SPARK-26685:
---

Opened a PR to fix this

> Building Spark Images with latest Docker does not honour spark_uid build 
> argument
> -
>
> Key: SPARK-26685
> URL: https://issues.apache.org/jira/browse/SPARK-26685
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> Latest Docker releases are stricter in their interpretation of the scope of 
> build arguments meaning the location of the {{ARG spark_uid}} declaration 
> puts it out of scope by the time the variable is consumed resulting in the 
> Python and R images still running as {{root}} regardless of what the user may 
> have specified as the desired UID.
> e.g. Images built with {{-u 456}} provided to {{bin/docker-image-tool.sh}}
> {noformat}
> > docker run -it --entrypoint /bin/bash rvesse/spark-py:uid456
> bash-4.4# whoami
> root
> bash-4.4# id -u
> 0
> bash-4.4# exit
> > docker run -it --entrypoint /bin/bash rvesse/spark:uid456
> bash-4.4$ id -u
> 456
> bash-4.4$ exit
> {noformat}
> Note that for the Python image the build argument was out of scope and 
> ignored.  For the base image the {{ARG}} declaration is in an in-scope 
> location and so is honoured correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26685) Building Spark Images with latest Docker does not honour spark_uid build argument

2019-01-22 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-26685:
-

 Summary: Building Spark Images with latest Docker does not honour 
spark_uid build argument
 Key: SPARK-26685
 URL: https://issues.apache.org/jira/browse/SPARK-26685
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Rob Vesse


Latest Docker releases are stricter in their interpretation of the scope of 
build arguments meaning the location of the {{ARG spark_uid}} declaration puts 
it out of scope by the time the variable is consumed resulting in the Python 
and R images still running as {{root}} regardless of what the user may have 
specified as the desired UID.

e.g. Images built with {{-u 456}} provided to {{bin/docker-image-tool.sh}}

{noformat}
> docker run -it --entrypoint /bin/bash rvesse/spark-py:uid456
bash-4.4# whoami
root
bash-4.4# id -u
0
bash-4.4# exit
> docker run -it --entrypoint /bin/bash rvesse/spark:uid456
bash-4.4$ id -u
456
bash-4.4$ exit
{noformat}

Note that for the Python image the build argument was out of scope and ignored. 
 For the base image the {{ARG}} declaration is in an in-scope location and so 
is honoured correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26028) Design sketch for SPIP: Property Graphs, Cypher Queries, and Algorithms

2019-01-18 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746134#comment-16746134
 ] 

Rob Vesse commented on SPARK-26028:
---

One initial comment - why include the query engine directly in the data 
representation? i.e. PropertyGraph should not have the cypher related 
properties and methods.

Ideally these should be on a separate trait (e.g. CypherCapablePropertyGraph) 
so you cleanly separate the data representation from the query engine.  Then 
the design would cleanly allow for other query engines in the future e.g. 
future versions of Cypher, GQL, GraphQL etc.

> Design sketch for SPIP: Property Graphs, Cypher Queries, and Algorithms
> ---
>
> Key: SPARK-26028
> URL: https://issues.apache.org/jira/browse/SPARK-26028
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Martin Junghanns
>Priority: Major
>
> Placeholder for the design discussion of SPARK-25994. The scope here is to 
> help SPIP vote instead of the final design.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26015) Include a USER directive in project provided Spark Dockerfiles

2018-11-12 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-26015:
-

 Summary: Include a USER directive in project provided Spark 
Dockerfiles
 Key: SPARK-26015
 URL: https://issues.apache.org/jira/browse/SPARK-26015
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0
Reporter: Rob Vesse


The current Dockerfiles provided by the project for running on Kubernetes do 
not include a [USER 
directive|https://docs.docker.com/engine/reference/builder/#user] which means 
that they default to running as {{root}}.  This may lead to unsuspecting users 
running their Spark jobs with unexpected levels of privilege.

The project should follow Docker/K8S best practises by including {{USER}} 
directives in the Dockerfiles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24833) Allow specifying Kubernetes host name aliases in the pod specs

2018-10-31 Thread Rob Vesse (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse resolved SPARK-24833.
---
Resolution: Won't Fix

As discussed in PR the pod template feature provides the ability to do this 
without needing new configuration properties

> Allow specifying Kubernetes host name aliases in the pod specs
> --
>
> Key: SPARK-24833
> URL: https://issues.apache.org/jira/browse/SPARK-24833
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Rob Vesse
>Priority: Major
>
> For some workloads you would like to allow Spark executors to access external 
> services using host name aliases.  Currently there is no way to specify Host 
> name aliases 
> (https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/)
>  to the pods that Spark generates and pod presets cannot be used to add these 
> at admission time currently (plus the fact that pod presets are still an 
> Alpha feature so not guaranteed to be usable on any given cluster).
> Since Spark on K8S already allows adding secrets and volumes to mount via 
> Spark configuration it should be fairly easy to use the same approach to 
> include host name aliases.
> I will look at opening a PR for this in the next couple of days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25887) Allow specifying Kubernetes context to use

2018-10-30 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-25887:
-

 Summary: Allow specifying Kubernetes context to use
 Key: SPARK-25887
 URL: https://issues.apache.org/jira/browse/SPARK-25887
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.2, 2.3.1, 2.3.0, 2.4.0
Reporter: Rob Vesse


In working on SPARK-25809 support was added to the integration testing 
machinery for Spark on K8S to use an arbitrary context from the users K8S 
config file.  However this can fail/cause false positives because regardless of 
what the integration test harness does the K8S submission client uses the 
Fabric 8 client library in such a way that it only ever configures itself from 
the current context.

For users who work with multiple K8S clusters or who have multiple K8S "users" 
for interacting with their cluster being able to support arbitrary contexts 
without forcing the user to first {{kubectl config use-context }} is 
an important improvement.

This would be a fairly small fix to {{SparkKubernetesClientFactory}} and an 
associated configuration key, likely {{spark.kubernetes.context}} to go along 
with this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25809) Support additional K8S cluster types for integration tests

2018-10-30 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669048#comment-16669048
 ] 

Rob Vesse commented on SPARK-25809:
---

Fairly close to this being ready to merge

> Support additional K8S cluster types for integration tests
> --
>
> Key: SPARK-25809
> URL: https://issues.apache.org/jira/browse/SPARK-25809
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> Currently the Spark on K8S integration tests are hardcoded to use a 
> {{minikube}} based backend.  It would be nice if developers had more 
> flexibility in the choice of K8S cluster they wish to use for integration 
> testing.  More specifically it would be useful to be able to use the built-in 
> Kubernetes support in recent Docker releases and to just use a generic K8S 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25809) Support additional K8S cluster types for integration tests

2018-10-23 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-25809:
-

 Summary: Support additional K8S cluster types for integration tests
 Key: SPARK-25809
 URL: https://issues.apache.org/jira/browse/SPARK-25809
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.2, 2.4.0
Reporter: Rob Vesse


Currently the Spark on K8S integration tests are hardcoded to use a 
{{minikube}} based backend.  It would be nice if developers had more 
flexibility in the choice of K8S cluster they wish to use for integration 
testing.  More specifically it would be useful to be able to use the built-in 
Kubernetes support in recent Docker releases and to just use a generic K8S 
cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25745) docker-image-tool.sh ignores errors from Docker

2018-10-16 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-25745:
-

 Summary: docker-image-tool.sh ignores errors from Docker
 Key: SPARK-25745
 URL: https://issues.apache.org/jira/browse/SPARK-25745
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Kubernetes
Affects Versions: 2.3.2, 2.3.1, 2.3.0
Reporter: Rob Vesse


In attempting to use the {{docker-image-tool.sh}} scripts to build some custom 
Dockerfiles I ran into issues with the scripts interaction with Docker.  Most 
notably if the Docker build/push fails the script continues blindly ignoring 
the errors.  This can either result in complete failure to build or lead to 
subtle bugs where images are built against different base images than expected.

Additionally while the Dockerfiles assume that Spark is first built locally the 
scripts fail to validate this which they could easily do by checking the 
expected JARs location.  This can also lead to failed Docker builds which could 
easily be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23153) Support application dependencies in submission client's local file system

2018-10-03 Thread Rob Vesse (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Vesse updated SPARK-23153:
--
Description: Currently local dependencies are not supported with Spark on 
K8S i.e. if the user has code or dependencies only on the client where they run 
{{spark-submit}} then the current implementation has no way to make those 
visible to the Spark application running inside the K8S pods that get launched. 
 This limits users to only running applications where the code and dependencies 
are either baked into the Docker images used or where those are available via 
some external and globally accessible file system e.g. HDFS which are not 
viable options for many users and environments

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> Currently local dependencies are not supported with Spark on K8S i.e. if the 
> user has code or dependencies only on the client where they run 
> {{spark-submit}} then the current implementation has no way to make those 
> visible to the Spark application running inside the K8S pods that get 
> launched.  This limits users to only running applications where the code and 
> dependencies are either baked into the Docker images used or where those are 
> available via some external and globally accessible file system e.g. HDFS 
> which are not viable options for many users and environments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23153) Support application dependencies in submission client's local file system

2018-10-02 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635534#comment-16635534
 ] 

Rob Vesse commented on SPARK-23153:
---

[~cloud_fan][~liyinan926][~mcheah][~eje] Has there been any discussion of how 
to go about addressing this limitation?

In the original downstream fork there was the Resource Staging Server but that 
got removed to simplify upstreaming and because Spark core folks had objections 
to that approach.  Also in our usages of it we encountered a number of 
performance, scalability and security issues that made it a not particularly 
stable approach.

There was a long dev list thread on this - 
https://lists.apache.org/thread.html/82b4ae9a2eb5ddeb3f7240ebf154f06f19b830f8b3120038e5d687a1@%3Cdev.spark.apache.org%3E
 - but no real conclusion seemed to be reached.

There are a few workarounds open to users that I can think of:

* Use the PVC support to mount a pre-created PVC that has somehow been 
populated with the user code
* Use the incoming pod template feature to mount arbitrary volumes that has 
somehow been populated with the user code
* Build custom images

All these options put the onus on users to do prep work prior to launch, I 
think Option 3 is currently the "recommended" workaround.  Unfortunately for us 
that is not a viable option as our customers tend to be very security conscious 
and often only allow a pre-approved list of images to be run.  (Ignoring the 
obvious fallacy of disallowing custom images while permitting the running of 
images that allow custom user code to execute...)

This is a blocker for me currently and I would like to contribute here but 
don't want to reinvent the wheel or waste effort on approaches that have 
already been discussed/discounted.

> Support application dependencies in submission client's local file system
> -
>
> Key: SPARK-23153
> URL: https://issues.apache.org/jira/browse/SPARK-23153
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-19 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620809#comment-16620809
 ] 

Rob Vesse commented on SPARK-24434:
---

Started a mailing list thread re: the limitations of this as currently 
implemented - 
https://lists.apache.org/thread.html/8a0ac1cada800d10ec1fe7f9552257af1dfc6719b404bdc3696b5c1f@%3Cdev.spark.apache.org%3E

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

2018-09-07 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606797#comment-16606797
 ] 

Rob Vesse commented on SPARK-25262:
---

[~mcheah] I would like to keep this open so we can have the larger discussion 
that the original PR implied about how pod templates and feature steps should 
interact and how best to enable power user customisation.  I am busy today at a 
conference but will try and kick off the discussion of this on the dev list 
next week.

> Make Spark local dir volumes configurable with Spark on Kubernetes
> --
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while 
> providing pod templates will provide more in-depth customisation for Spark on 
> Kubernetes there are some things that cannot be modified because Spark code 
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} 
> which is done by {{LocalDirsFeatureStep.scala}}.  For each directory 
> specified, or a single default if no explicit specification, it creates a 
> Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation 
> this will be backed by the node storage 
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
> compute environments this may be extremely undesirable.  For example with 
> diskless compute resources the node storage will likely be a non-performant 
> remote mounted disk, often with limited capacity.  For such environments it 
> would likely be better to set {{medium: Memory}} on the volume per the K8S 
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different 
> volume type to back the local directories and there is no possibility to do 
> that.
> Pod templates will not really solve either of these issues because Spark is 
> always going to attempt to generate a new volume for each local directory and 
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
> volumes
> * Modify the logic to check if there is a volume already defined with the 
> name and if so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598402#comment-16598402
 ] 

Rob Vesse commented on SPARK-24434:
---

{quote}
I think the miscommunication here was because of the discrepancy between this 
Jira and k8s-sig-big-data weekly meeting notes
{quote}

As an Apache member this comment raises red flags for me.  All Spark 
development discussions should either be happening on Apache resources (JIRA, 
mailing lists, GitHub repos) or being captured and posted to Apache resources.  
If people are having to follow external resources, particularly live meetings 
which naturally exclude portions of the community due to timezone/availability 
constraints, to participate in an Apache community then that community is not 
operating as a proper Apache community.  

This doesn't mean that such discussions and meetings can't happen but they 
should be summarised back on Apache resources so the wider community has the 
opportunity to participate.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

2018-08-28 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595178#comment-16595178
 ] 

Rob Vesse commented on SPARK-25262:
---

I have changes for this almost ready and plan to open a PR tomorrow

> Make Spark local dir volumes configurable with Spark on Kubernetes
> --
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while 
> providing pod templates will provide more in-depth customisation for Spark on 
> Kubernetes there are some things that cannot be modified because Spark code 
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} 
> which is done by {{LocalDirsFeatureStep.scala}}.  For each directory 
> specified, or a single default if no explicit specification, it creates a 
> Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation 
> this will be backed by the node storage 
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
> compute environments this may be extremely undesirable.  For example with 
> diskless compute resources the node storage will likely be a non-performant 
> remote mounted disk, often with limited capacity.  For such environments it 
> would likely be better to set {{medium: Memory}} on the volume per the K8S 
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different 
> volume type to back the local directories and there is no possibility to do 
> that.
> Pod templates will not really solve either of these issues because Spark is 
> always going to attempt to generate a new volume for each local directory and 
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
> volumes
> * Modify the logic to check if there is a volume already defined with the 
> name and if so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

2018-08-28 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-25262:
-

 Summary: Make Spark local dir volumes configurable with Spark on 
Kubernetes
 Key: SPARK-25262
 URL: https://issues.apache.org/jira/browse/SPARK-25262
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.1, 2.3.0
Reporter: Rob Vesse


As discussed during review of the design document for SPARK-24434 while 
providing pod templates will provide more in-depth customisation for Spark on 
Kubernetes there are some things that cannot be modified because Spark code 
generates pod specs in very specific ways.

The particular issue identified relates to handling on {{spark.local.dirs}} 
which is done by {{LocalDirsFeatureStep.scala}}.  For each directory specified, 
or a single default if no explicit specification, it creates a Kubernetes 
{{emptyDir}} volume.  As noted in the Kubernetes documentation this will be 
backed by the node storage 
(https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
compute environments this may be extremely undesirable.  For example with 
diskless compute resources the node storage will likely be a non-performant 
remote mounted disk, often with limited capacity.  For such environments it 
would likely be better to set {{medium: Memory}} on the volume per the K8S 
documentation to use a {{tmpfs}} volume instead.

Another closely related issue is that users might want to use a different 
volume type to back the local directories and there is no possibility to do 
that.

Pod templates will not really solve either of these issues because Spark is 
always going to attempt to generate a new volume for each local directory and 
always going to set these as {{emptyDir}}.

Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:

* Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
volumes
* Modify the logic to check if there is a volume already defined with the name 
and if so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25024) Update mesos documentation to be clear about security supported

2018-08-28 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594933#comment-16594933
 ] 

Rob Vesse commented on SPARK-25024:
---

[~tgraves] Attempting to answer your questions:

* We never used cluster mode so can't comment
* Yes and no
** Similar to YARN it does the login locally in the client and then uses HDFS 
delegation tokens so it doesn't ship the keytabs AFAIK but it does ship the 
delegation tokens
* We never used Spark Shuffle Service either so can't comment
* Yes
** Mesos does authentication at the framework level rather than the user level 
so it depends on your setup.  You might have setups where there is a single 
principal and secret used by all Spark users or you might have setups where you 
create a principal and secret for each user.  You can optionally do ACLs within 
Mesos for each framework principal including configuring things like which 
users a framework is allowed to launch jobs as.
* Again not used this feature, think these are similar to K8S secrets in that 
they are created separately and you are just passing identifiers for these to 
Spark and Mesos takes care of providing these securely to your jobs.

Generally we have dropped use of Spark on Mesos in favour of Spark on K8S 
because the security story for Mesos was poor and we had to do a lot of extra 
stuff to provide multi-tenancy whereas with K8S a lot more was available out of 
the box (even if secure HDFS support has yet to land in mainline Spark)

> Update mesos documentation to be clear about security supported
> ---
>
> Key: SPARK-25024
> URL: https://issues.apache.org/jira/browse/SPARK-25024
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.2
>Reporter: Thomas Graves
>Priority: Major
>
> I was reading through our mesos deployment docs and security docs and its not 
> clear at all what type of security and how to set it up for mesos.  I think 
> we should clarify this and have something about exactly what is supported and 
> what is not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25222) Spark on Kubernetes Pod Watcher dumps raw container status

2018-08-24 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591386#comment-16591386
 ] 

Rob Vesse commented on SPARK-25222:
---

There is also a similar issue with task failure:

{noformat}
2018-08-24 09:11:57 WARN  TaskSetManager:66 - Lost task 2.3 in stage 0.0 (TID 
13, 10.244.3.199, executor 8): ExecutorLostFailure (executor 8 exited caused by 
one of the running tasks) Reason:
The executor with id 8 exited with exit code 52.
The API gave the following brief reason: null
The API gave the following message: null
The API gave the following container statuses:

ContainerStatus(containerID=docker://353f78fd634d312ec8115032c32da56748fb5d8da2c5ae54b1d0a9f112fb4d1d,
 image=rvesse/spark:latest, 
imageID=docker-pullable://rvesse/spark@sha256:92abf0b718743d0f5a26068fc94ec42233db0493c55a8570dc8c851c62a4bc0a,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=executor, ready=false, restartCount=0, 
state=ContainerState(running=null, 
terminated=ContainerStateTerminated(containerID=docker://353f78fd634d312ec8115032c32da56748fb5d8da2c5ae54b1d0a9f112fb4d1d,
 exitCode=52, finishedAt=Time(time=2018-08-24T09:11:56Z, 
additionalProperties={}), message=null, reason=Error, signal=null, 
startedAt=Time(time=2018-08-24T09:11:48Z, additionalProperties={}), 
additionalProperties={}), waiting=null, additionalProperties={}), 
additionalProperties={})
{noformat}

> Spark on Kubernetes Pod Watcher dumps raw container status
> --
>
> Key: SPARK-25222
> URL: https://issues.apache.org/jira/browse/SPARK-25222
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Minor
>
> Spark on Kubernetes provides logging of the pod/container status as a monitor 
> of the job progress.  However the logger just dumps the raw container status 
> object leading to fairly unreadable output like so:
> {noformat}
> 18/08/24 09:03:27 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-groupby-1535101393784-driver
>namespace: default
>labels: spark-app-selector -> spark-47f7248122b9444b8d5fd3701028a1e8, 
> spark-role -> driver
>pod uid: 88de6467-a77c-11e8-b9da-a4bf0128b75b
>creation time: 2018-08-24T09:03:14Z
>service account name: spark
>volumes: spark-local-dir-1, spark-conf-volume, spark-token-kjxkv
>node name: tab-cmp4
>start time: 2018-08-24T09:03:14Z
>container images: rvesse/spark:latest
>phase: Running
>status: 
> [ContainerStatus(containerID=docker://23ae58571f59505e837dca40455d0347fb90e9b88e2a2b145a38e2919fceb447,
>  image=rvesse/spark:latest, 
> imageID=docker-pullable://rvesse/spark@sha256:92abf0b718743d0f5a26068fc94ec42233db0493c55a8570dc8c851c62a4bc0a,
>  lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
> restartCount=0, 
> state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2018-08-24T09:03:26Z,
>  additionalProperties={}), additionalProperties={}), terminated=null, 
> waiting=null, additionalProperties={}), additionalProperties={})]
> {noformat}
> The {{LoggingPodStatusWatcher}} actually already includes code to nicely 
> format this information but only invokes it at the end of the job:
> {noformat}
> 18/08/24 09:04:07 INFO LoggingPodStatusWatcherImpl: Container final statuses:
>  Container name: spark-kubernetes-driver
>  Container image: rvesse/spark:latest
>  Container state: Terminated
>  Exit code: 0
> {noformat}
> It would be nice if we continually used the nice formatting throughout the 
> logging.
> We already have patched this on our internal fork and will upstream a fix 
> shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25222) Spark on Kubernetes Pod Watcher dumps raw container status

2018-08-24 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-25222:
-

 Summary: Spark on Kubernetes Pod Watcher dumps raw container status
 Key: SPARK-25222
 URL: https://issues.apache.org/jira/browse/SPARK-25222
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.1, 2.3.0
Reporter: Rob Vesse


Spark on Kubernetes provides logging of the pod/container status as a monitor 
of the job progress.  However the logger just dumps the raw container status 
object leading to fairly unreadable output like so:

{noformat}
18/08/24 09:03:27 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
 pod name: spark-groupby-1535101393784-driver
 namespace: default
 labels: spark-app-selector -> spark-47f7248122b9444b8d5fd3701028a1e8, 
spark-role -> driver
 pod uid: 88de6467-a77c-11e8-b9da-a4bf0128b75b
 creation time: 2018-08-24T09:03:14Z
 service account name: spark
 volumes: spark-local-dir-1, spark-conf-volume, spark-token-kjxkv
 node name: tab-cmp4
 start time: 2018-08-24T09:03:14Z
 container images: rvesse/spark:latest
 phase: Running
 status: 
[ContainerStatus(containerID=docker://23ae58571f59505e837dca40455d0347fb90e9b88e2a2b145a38e2919fceb447,
 image=rvesse/spark:latest, 
imageID=docker-pullable://rvesse/spark@sha256:92abf0b718743d0f5a26068fc94ec42233db0493c55a8570dc8c851c62a4bc0a,
 lastState=ContainerState(running=null, terminated=null, waiting=null, 
additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
restartCount=0, 
state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2018-08-24T09:03:26Z,
 additionalProperties={}), additionalProperties={}), terminated=null, 
waiting=null, additionalProperties={}), additionalProperties={})]
{noformat}

The {{LoggingPodStatusWatcher}} actually already includes code to nicely format 
this information but only invokes it at the end of the job:

{noformat}
18/08/24 09:04:07 INFO LoggingPodStatusWatcherImpl: Container final statuses:


 Container name: spark-kubernetes-driver
 Container image: rvesse/spark:latest
 Container state: Terminated
 Exit code: 0
{noformat}

It would be nice if we continually used the nice formatting throughout the 
logging.

We already have patched this on our internal fork and will upstream a fix 
shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-02 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567017#comment-16567017
 ] 

Rob Vesse commented on SPARK-24434:
---

[~skonto] Added a couple more comments based on some issues I've run into 
during ongoing development

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24833) Allow specifying Kubernetes host name aliases in the pod specs

2018-07-17 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-24833:
-

 Summary: Allow specifying Kubernetes host name aliases in the pod 
specs
 Key: SPARK-24833
 URL: https://issues.apache.org/jira/browse/SPARK-24833
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 2.3.1
Reporter: Rob Vesse


For some workloads you would like to allow Spark executors to access external 
services using host name aliases.  Currently there is no way to specify Host 
name aliases 
(https://kubernetes.io/docs/concepts/services-networking/add-entries-to-pod-etc-hosts-with-host-aliases/)
 to the pods that Spark generates and pod presets cannot be used to add these 
at admission time currently (plus the fact that pod presets are still an Alpha 
feature so not guaranteed to be usable on any given cluster).

Since Spark on K8S already allows adding secrets and volumes to mount via Spark 
configuration it should be fairly easy to use the same approach to include host 
name aliases.

I will look at opening a PR for this in the next couple of days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23257) Implement Kerberos Support in Kubernetes resource manager

2018-05-31 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496284#comment-16496284
 ] 

Rob Vesse commented on SPARK-23257:
---

[~ifilonenko] Any updates on this?

We're currently using the fork as Kerberos support is a must-have for our 
customers and would love to get this into upstream and get ourselves back onto 
an official Spark release.

We can likely help out with testing, review and/or implementation as needed

> Implement Kerberos Support in Kubernetes resource manager
> -
>
> Key: SPARK-23257
> URL: https://issues.apache.org/jira/browse/SPARK-23257
> Project: Spark
>  Issue Type: Wish
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Rob Keevil
>Priority: Major
>
> On the forked k8s branch of Spark at 
> [https://github.com/apache-spark-on-k8s/spark/pull/540] , Kerberos support 
> has been added to the Kubernetes resource manager.  The Kubernetes code 
> between these two repositories appears to have diverged, so this commit 
> cannot be merged in easily.  Are there any plans to re-implement this work on 
> the main Spark repository?
>  
> [ifilonenko|https://github.com/ifilonenko] [~liyinan926] I am happy to help 
> with the development and testing of this, but i wanted to confirm that this 
> isn't already in progress -  I could not find any discussion about this 
> specific topic online.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23374) Checkstyle/Scalastyle only work from top level build

2018-02-09 Thread Rob Vesse (JIRA)
Rob Vesse created SPARK-23374:
-

 Summary: Checkstyle/Scalastyle only work from top level build
 Key: SPARK-23374
 URL: https://issues.apache.org/jira/browse/SPARK-23374
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.2.1
Reporter: Rob Vesse


The current Maven plugin definitions for Checkstyle/Scalastyle use fixed XML 
configs for the style rule locations that are only valid relative to the top 
level POM.  Therefore if you try and do a {{mvn verify}} in an individual 
module you get the following error:

{noformat}
[ERROR] Failed to execute goal 
org.scalastyle:scalastyle-maven-plugin:1.0.0:check (default) on project 
spark-mesos_2.11: Failed during scalastyle execution: Unable to find 
configuration file at location scalastyle-config.xml
{noformat}

As the paths are hardcoded in XML and don't use Maven properties you can't 
override these settings so you can't style check a single module which makes 
doing style checking require a full project {{mvn verify}} which is not ideal.

By introducing Maven properties for these two paths it would become possible to 
run checks on a single module like so:

{noformat}
mvn verify -Dscalastyle.location=../scalastyle-config.xml
{noformat}

Obviously the override would need to vary depending on the specific module you 
are trying to run it against but this would be a relatively simply change that 
would streamline dev workflows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-20 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212462#comment-16212462
 ] 

Rob Vesse commented on SPARK-9:
---

[~yuvaldeg], thanks for providing additional clarifications

If a library is considered to be a standard part of the platform software then 
it should fall under the foundations standards 
[platform|http://www.apache.org/legal/resolved.html#platform] resolution that 
licensing of the platform does generally not affect the software running upon 
it. And if there are other Apache projects already depending on this that 
provides a precedent that Spark can rely on.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-18 Thread Rob Vesse (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209396#comment-16209396
 ] 

Rob Vesse commented on SPARK-9:
---

I'd be interested to know more about what performance testing you carried out?

In our experience of running Spark over various high-performance interconnects 
you can get fairly good performance boost by simply setting 
{{spark.shuffle.compress=false}} and relying on TCP/IP performance over the 
interconnect.  It is pretty difficult to write a Spark job that saturates the 
available bandwidth of such an interconnect so disabling compression means you 
don't waste CPU cycles compressing data and instead simply pump it across the 
network ASAP.

I also wonder if you could comment on the choice of the underlying RDMA 
libraries and their licensing?

It looks like {{libdisni}} is ASLv2 which is fine but some of its dependencies 
appear to be at least in parts GPL (e.g. {{librdmacm}}) which would mean they 
could not be depended on by an Apache project even as optional dependencies due 
to foundation level policies around dependency licensing. Due to the nature of 
licensing of most of the libraries in this space it may be legally impossible 
for RDMA support to make it into spark proper. It which case you would likely 
have to stick with the external plug-in approach as you do currently.

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org