from:"Michael Gummelt \(JIRA\)"

[jira] [Updated] (SPARK-1359) SGD implementation is not efficient

2018-03-26 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-1359:
---
Remaining Estimate: (was: 168h)
 Original Estimate: (was: 168h)

> SGD implementation is not efficient
> ---
>
> Key: SPARK-1359
> URL: https://issues.apache.org/jira/browse/SPARK-1359
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 0.9.0, 1.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> The SGD implementation samples a mini-batch to compute the stochastic 
> gradient. This is not efficient because examples are provided via an iterator 
> interface. We have to scan all of them to obtain a sample.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1359) SGD implementation is not efficient

2018-03-26 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-1359:
---
Remaining Estimate: 168h
 Original Estimate: 168h

> SGD implementation is not efficient
> ---
>
> Key: SPARK-1359
> URL: https://issues.apache.org/jira/browse/SPARK-1359
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 0.9.0, 1.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The SGD implementation samples a mini-batch to compute the stochastic 
> gradient. This is not efficient because examples are provided via an iterator 
> interface. We have to scan all of them to obtain a sample.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1359) SGD implementation is not efficient

2018-03-26 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-1359:
---
Remaining Estimate: (was: 504h)
 Original Estimate: (was: 504h)

> SGD implementation is not efficient
> ---
>
> Key: SPARK-1359
> URL: https://issues.apache.org/jira/browse/SPARK-1359
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 0.9.0, 1.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> The SGD implementation samples a mini-batch to compute the stochastic 
> gradient. This is not efficient because examples are provided via an iterator 
> interface. We have to scan all of them to obtain a sample.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1359) SGD implementation is not efficient

2018-03-26 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-1359:
---
Remaining Estimate: 504h
 Original Estimate: 504h

> SGD implementation is not efficient
> ---
>
> Key: SPARK-1359
> URL: https://issues.apache.org/jira/browse/SPARK-1359
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 0.9.0, 1.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The SGD implementation samples a mini-batch to compute the stochastic 
> gradient. This is not efficient because examples are provided via an iterator 
> interface. We have to scan all of them to obtain a sample.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-06-14 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049380#comment-16049380
 ] 

Michael Gummelt commented on SPARK-20812:
-

The dispatcher won't know about the secrets.  It will only know about the name 
of the secret that the user provides.  The Mesos plugin that handles secrets 
(which in DC/OS is the secret store), will actually start the task with the 
secret mounted.

As for the secret store specifically, that's a DC/OS feature, not an Apache 
feature, so let's talk about it elsewhere.


> Add Mesos Secrets support to the spark dispatcher
> -
>
> Key: SPARK-20812
> URL: https://issues.apache.org/jira/browse/SPARK-20812
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Michael Gummelt
>
> Mesos 1.4 will support secrets.  In order to support sending keytabs through 
> the Spark Dispatcher, or any other secret, we need to integrate this with the 
> Spark Dispatcher.
> The integration should include support for both file-based and env-based 
> secrets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-06-13 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-20812:

Description: 
Mesos 1.4 will support secrets.  In order to support sending keytabs through 
the Spark Dispatcher, or any other secret, we need to integrate this with the 
Spark Dispatcher.

The integration should include support for both file-based and env-based 
secrets.

  was:
Mesos 1.3 supports secrets.  In order to support sending keytabs through the 
Spark Dispatcher, or any other secret, we need to integrate this with the Spark 
Dispatcher.

The integration should include support for both file-based and env-based 
secrets.


> Add Mesos Secrets support to the spark dispatcher
> -
>
> Key: SPARK-20812
> URL: https://issues.apache.org/jira/browse/SPARK-20812
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Michael Gummelt
>
> Mesos 1.4 will support secrets.  In order to support sending keytabs through 
> the Spark Dispatcher, or any other secret, we need to integrate this with the 
> Spark Dispatcher.
> The integration should include support for both file-based and env-based 
> secrets.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20434) Move Hadoop delegation token code from yarn to core

2017-06-12 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-20434:

Summary: Move Hadoop delegation token code from yarn to core  (was: Move 
Kerberos delegation token code from yarn to core)

> Move Hadoop delegation token code from yarn to core
> ---
>
> Key: SPARK-20434
> URL: https://issues.apache.org/jira/browse/SPARK-20434
> Project: Spark
>  Issue Type: Task
>  Components: Mesos, Spark Core, YARN
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>
> This is to enable kerberos support for other schedulers, such as Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21000) Add labels support to the Spark Dispatcher

2017-06-06 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-21000:
---

 Summary: Add labels support to the Spark Dispatcher
 Key: SPARK-21000
 URL: https://issues.apache.org/jira/browse/SPARK-21000
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.2.1
Reporter: Michael Gummelt


Labels can be used for tagging drivers with arbitrary data, which can then be 
used by an organization's tooling.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-05-24 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023383#comment-16023383
 ] 

Michael Gummelt commented on SPARK-4899:


Thanks Kamal.  I responded to the thread, which I'll copy here:

bq. Restarting the agent without checkpointing enabled will kill the executor, 
but that still shouldn't cause the Spark job to fail, since Spark jobs should 
tolerate executor failures.

So I'm fine with adding checkpointing support, but I'm not sure it actually 
solves any problem.

> Support Mesos features: roles and checkpoints
> -
>
> Key: SPARK-4899
> URL: https://issues.apache.org/jira/browse/SPARK-4899
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 1.2.0
>Reporter: Andrew Ash
>
> Inspired by https://github.com/apache/spark/pull/60
> Mesos has two features that would be nice for Spark to take advantage of:
> 1. Roles -- a way to specify ACLs and priorities for users
> 2. Checkpoints -- a way to restart a failed Mesos slave without losing all 
> the work that was happening on the box
> Some of these may require a Mesos upgrade past our current 0.18.1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-05-23 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021766#comment-16021766
 ] 

Michael Gummelt commented on SPARK-4899:


[~drcrallen] Can you link me to the conversation you had with Tim?  I can't 
find it on the mailing list.

> Support Mesos features: roles and checkpoints
> -
>
> Key: SPARK-4899
> URL: https://issues.apache.org/jira/browse/SPARK-4899
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 1.2.0
>Reporter: Andrew Ash
>
> Inspired by https://github.com/apache/spark/pull/60
> Mesos has two features that would be nice for Spark to take advantage of:
> 1. Roles -- a way to specify ACLs and priorities for users
> 2. Checkpoints -- a way to restart a failed Mesos slave without losing all 
> the work that was happening on the box
> Some of these may require a Mesos upgrade past our current 0.18.1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-05-23 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16021750#comment-16021750
 ] 

Michael Gummelt commented on SPARK-4899:


These are two separate features, which need two separate JIRAs.  Roles is 
already supported, though, so this should either be renamed or closed in favor 
of a JIRA just for checkpointing.


> Support Mesos features: roles and checkpoints
> -
>
> Key: SPARK-4899
> URL: https://issues.apache.org/jira/browse/SPARK-4899
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 1.2.0
>Reporter: Andrew Ash
>
> Inspired by https://github.com/apache/spark/pull/60
> Mesos has two features that would be nice for Spark to take advantage of:
> 1. Roles -- a way to specify ACLs and priorities for users
> 2. Checkpoints -- a way to restart a failed Mesos slave without losing all 
> the work that was happening on the box
> Some of these may require a Mesos upgrade past our current 0.18.1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-05-19 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-20812:

Description: 
Mesos 1.3 supports secrets.  In order to support sending keytabs through the 
Spark Dispatcher, or any other secret, we need to integrate this with the Spark 
Dispatcher.

The integration should include support for both file-based and env-based 
secrets.

  was:Mesos 1.3 supports secrets.  In order to support sending keytabs through 
the Spark Dispatcher, or any other secret, we need to integrate this with the 
Spark Dispatcher.


> Add Mesos Secrets support to the spark dispatcher
> -
>
> Key: SPARK-20812
> URL: https://issues.apache.org/jira/browse/SPARK-20812
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Michael Gummelt
>
> Mesos 1.3 supports secrets.  In order to support sending keytabs through the 
> Spark Dispatcher, or any other secret, we need to integrate this with the 
> Spark Dispatcher.
> The integration should include support for both file-based and env-based 
> secrets.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher

2017-05-19 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-20812:
---

 Summary: Add Mesos Secrets support to the spark dispatcher
 Key: SPARK-20812
 URL: https://issues.apache.org/jira/browse/SPARK-20812
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.3.0
Reporter: Michael Gummelt


Mesos 1.3 supports secrets.  In order to support sending keytabs through the 
Spark Dispatcher, or any other secret, we need to integrate this with the Spark 
Dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16627) --jars doesn't work in Mesos mode

2017-05-18 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016253#comment-16016253
 ] 

Michael Gummelt commented on SPARK-16627:
-

I'm not completely sure, but I believe that the dispatcher is correctly setting 
{{spark.jars}}, but due to SPARK-10643, the driver is not recognizing the 
remote jar URL.

> --jars doesn't work in Mesos mode
> -
>
> Key: SPARK-16627
> URL: https://issues.apache.org/jira/browse/SPARK-16627
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Michael Gummelt
>
> Definitely doesn't work in cluster mode.  Might not work in client mode 
> either.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12559) Cluster mode doesn't work with --packages

2017-05-18 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016243#comment-16016243
 ] 

Michael Gummelt commented on SPARK-12559:
-

I changed the title from "Standalone cluster mode" to "cluster mode", since 
--packages doesn't work with any cluster mode.

> Cluster mode doesn't work with --packages
> -
>
> Key: SPARK-12559
> URL: https://issues.apache.org/jira/browse/SPARK-12559
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>
> From the mailing list:
> {quote}
> Another problem I ran into that you also might is that --packages doesn't
> work with --deploy-mode cluster.  It downloads the packages to a temporary
> location on the node running spark-submit, then passes those paths to the
> node that is running the Driver, but since that isn't the same machine, it
> can't find anything and fails.  The driver process *should* be the one
> doing the downloading, but it isn't. I ended up having to create a fat JAR
> with all of the dependencies to get around that one.
> {quote}
> The problem is that we currently don't upload jars to the cluster. It seems 
> to fix this we either (1) do upload jars, or (2) just run the packages code 
> on the driver side. I slightly prefer (2) because it's simpler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12559) Cluster mode doesn't work with --packages

2017-05-18 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-12559:

Summary: Cluster mode doesn't work with --packages  (was: Standalone 
cluster mode doesn't work with --packages)

> Cluster mode doesn't work with --packages
> -
>
> Key: SPARK-12559
> URL: https://issues.apache.org/jira/browse/SPARK-12559
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.3.0
>Reporter: Andrew Or
>
> From the mailing list:
> {quote}
> Another problem I ran into that you also might is that --packages doesn't
> work with --deploy-mode cluster.  It downloads the packages to a temporary
> location on the node running spark-submit, then passes those paths to the
> node that is running the Driver, but since that isn't the same machine, it
> can't find anything and fails.  The driver process *should* be the one
> doing the downloading, but it isn't. I ended up having to create a fat JAR
> with all of the dependencies to get around that one.
> {quote}
> The problem is that we currently don't upload jars to the cluster. It seems 
> to fix this we either (1) do upload jars, or (2) just run the packages code 
> on the driver side. I slightly prefer (2) because it's simpler.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20447) spark mesos scheduler suppress call

2017-04-25 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983314#comment-15983314
 ] 

Michael Gummelt commented on SPARK-20447:
-

The scheduler doesn't support suppression, no, but it does reject offers for 
120s: 
https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L375,
 and this is configurable.

With Mesos' 1s offer cycle, this should allow offers to be sent to 119 other 
frameworks before being re-offered to Spark.

Is this not sufficient? 

> spark mesos scheduler suppress call
> ---
>
> Key: SPARK-20447
> URL: https://issues.apache.org/jira/browse/SPARK-20447
> Project: Spark
>  Issue Type: Wish
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Pavel Plotnikov
>Priority: Minor
>
>  The spark mesos scheduler never sends the suppress call to mesos to exclude 
> application from Mesos batch allocation process (HierarchicalDRF allocator) 
> when spark application is idle and there are no tasks in the queue. As a 
> result, the application receives 0% cluster share because of the dynamic 
> resource allocation while other applications, that need additional resources, 
> can’t receive an offer because they have bigger cluster share that is 
> significantly more than 0%
> About suppress call: 
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20434) Move Kerberos delegation token code from yarn to core

2017-04-21 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-20434:
---

 Summary: Move Kerberos delegation token code from yarn to core
 Key: SPARK-20434
 URL: https://issues.apache.org/jira/browse/SPARK-20434
 Project: Spark
  Issue Type: Task
  Components: Mesos, Spark Core, YARN
Affects Versions: 2.1.0
Reporter: Michael Gummelt


This is to enable kerberos support for other schedulers, such as Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-14 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969379#comment-15969379
 ] 

Michael Gummelt commented on SPARK-20328:
-

Ah, yes, of course.  Thanks.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-14 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969341#comment-15969341
 ] 

Michael Gummelt commented on SPARK-16742:
-

[~jerryshao] No, but you can look at our solution here: 
https://github.com/mesosphere/spark/commit/0a2cc4248039ca989e177e96e92a594a025661fe#diff-79391110e9f26657e415aa169a004998R129

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-14 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969341#comment-15969341
 ] 

Michael Gummelt edited comment on SPARK-16742 at 4/14/17 6:01 PM:
--

[~jerryshao] No, but you can look at our solution here: 
https://github.com/mesosphere/spark/commit/0a2cc4248039ca989e177e96e92a594a025661fe#diff-79391110e9f26657e415aa169a004998R129

The code we upstream will be quite different, but the delegation token handling 
will be similar.


was (Author: mgummelt):
[~jerryshao] No, but you can look at our solution here: 
https://github.com/mesosphere/spark/commit/0a2cc4248039ca989e177e96e92a594a025661fe#diff-79391110e9f26657e415aa169a004998R129

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968469#comment-15968469
 ] 

Michael Gummelt commented on SPARK-20328:
-

bq. I have no idea what that means.

I'm pretty sure a delegation token is just another way for a subject to 
authenticate.  So the driver uses the delegation token provided to it by 
{{spark-submit}} to authenticate.  This is what I mean by "driver is already 
logged in via the delegation token".  Since the driver is authenticated, it can 
request further delegation tokens.  But my point is that it shouldn't need to, 
because that code is not "delegating" the tokens to any other process, which is 
the only thing delegation tokens are needed for.

But this is neither here nor there.  I think I know what I have to do.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968432#comment-15968432
 ] 

Michael Gummelt commented on SPARK-20328:
-

bq. It depends. e.g. on YARN, when you submit in cluster mode, the driver is 
running in the cluster and all it has are delegation tokens. (The TGT is only 
available to the launcher process.)

Right, but my understanding is that the driver is already logged in via the 
delegation token provided to it by the {{spark-submit}} process (via 
{{amContainer.setTokens}}), so it wouldn't need to then fetch further 
delegation tokens.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968416#comment-15968416
 ] 

Michael Gummelt edited comment on SPARK-20328 at 4/14/17 12:02 AM:
---

bq. It shouldn't need to do it not for the reasons you mention, but because 
Spark already the necessary credentials available (either a TGT, or a valid 
delegation token for HDFS).

But it shouldn't need delegation tokens at all, right?  The authentication of 
the currently logged in user, whether it be through the OS or through Kerberos, 
should be sufficient.


was (Author: mgummelt):
bq. It shouldn't need to do it not for the reasons you mention, but because 
Spark already the necessary credentials available (either a TGT, or a valid 
delegation token for HDFS).

But it shouldn't need delegation tokens at all, right?

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968411#comment-15968411
 ] 

Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:59 PM:
---

bq. The Mesos backend (I mean the code in Spark, not the Mesos service) can set 
the configs in the SparkContext's "hadoopConfiguration" object, can't it?

I suppose this would work.  It would rely on the assumption that the Mesos 
scheduler backend is started before the HadoopRDD is created, which happens to 
be true, but ideally we wouldn't have to rely on that ordering.  Right now I'm 
just setting it in {{SparkSubmit}}, but that's not great either.

I filed a Hadoop ticket for the {{FileInputFormat}} issue and linked it here.


was (Author: mgummelt):
> The Mesos backend (I mean the code in Spark, not the Mesos service) can set 
> the configs in the SparkContext's "hadoopConfiguration" object, can't it?

I suppose this would work.  It would rely on the assumption that the Mesos 
scheduler backend is started before the HadoopRDD is created, which happens to 
be true, but ideally we wouldn't have to rely on that ordering.  Right now I'm 
just setting it in {{SparkSubmit}}, but that's not great either.

I filed a Hadoop ticket for the {{FileInputFormat}} issue and linked it here.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968416#comment-15968416
 ] 

Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:59 PM:
---

bq. It shouldn't need to do it not for the reasons you mention, but because 
Spark already the necessary credentials available (either a TGT, or a valid 
delegation token for HDFS).

But it shouldn't need delegation tokens at all, right?


was (Author: mgummelt):
> It shouldn't need to do it not for the reasons you mention, but because Spark 
> already the necessary credentials available (either a TGT, or a valid 
> delegation token for HDFS).

But it shouldn't need delegation tokens at all, right?

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968416#comment-15968416
 ] 

Michael Gummelt commented on SPARK-20328:
-

> It shouldn't need to do it not for the reasons you mention, but because Spark 
> already the necessary credentials available (either a TGT, or a valid 
> delegation token for HDFS).

But it shouldn't need delegation tokens at all, right?

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968411#comment-15968411
 ] 

Michael Gummelt commented on SPARK-20328:
-

> The Mesos backend (I mean the code in Spark, not the Mesos service) can set 
> the configs in the SparkContext's "hadoopConfiguration" object, can't it?

I suppose this would work.  It would rely on the assumption that the Mesos 
scheduler backend is started before the HadoopRDD is created, which happens to 
be true, but ideally we wouldn't have to rely on that ordering.  Right now I'm 
just setting it in {{SparkSubmit}}, but that's not great either.

I filed a Hadoop ticket for the {{FileInputFormat}} issue and linked it here.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968388#comment-15968388
 ] 

Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:27 PM:
---

Hey [~vanzin], thanks for the response.

Everything you said is correct, but I want to clarify one thing:

> You just need to make the Mesos backend in Spark do that automatically for 
> the submitting user.

The problem can't be solved in the Mesos backend.  When I fetch delegation 
tokens for transmission to Executors in the Mesos backend, there's no problem.  
I can set whatever renewer I want.

The problem is that there's a second location where delegation tokens are 
fetched: {{HadoopRDD}}.  This is entirely separate from the fetching that the 
scheduler backends do (either Mesos or YARN).  {{HadoopRDD}} tries to fetch 
split data, and ultimately calls into {{TokenCache}} in the hadoop library, 
which fetches delegation tokens with the renewer set to the YARN 
ResourceManager's principal: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213.
  I'm currently solving this by setting that config var in {{SparkSubmit}}.

The big question I have, which I suppose is more for the {{hadoop}} team, is 
why in the world is {{FileInputFormat}} fetching delegation tokens?  AFAICT, 
they're not sending those tokens to any other process.  They're just fetching 
split data directly from the Name Nodes, and there should be no delegation 
required.


was (Author: mgummelt):
Hey [~vanzin], thanks for the response.

Everything you said is correct, but I want to clarify one thing:

> You just need to make the Mesos backend in Spark do that automatically for 
> the submitting user.

The problem can't be solved in the Mesos backend.  When I fetch delegation 
tokens for transmission to Executors in the Mesos backend, there's no problem.  
I can set whatever renewer I want.

The problem is that there's a second location where delegation tokens are 
fetched: {{HadoopRDD}}.  This is entirely separate from the fetching that the 
scheduler backends do (either Mesos or YARN).  {{HadoopRDD}} tries to fetch 
split data, and ultimately calls into {{TokenCache}} in the hadoop library, 
which fetches delegation tokens with the renewer set to the YARN 
ResourceManager's principal: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213.
  I'm currently solving this by setting that config var in {{SparkSubmit}}.

The big question I have, which I suppose is more for the {{hadoop}} team, is 
why in the world is {{FileInputFormat}} fetching delegation tokens.  AFAICT, 
they're not sending those tokens to any other process.  They're just fetching 
split data directly from the Name Nodes, and there should be no delegation 
required.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>

[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968388#comment-15968388
 ] 

Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:27 PM:
---

Hey [~vanzin], thanks for the response.

Everything you said is correct, but I want to clarify one thing:

> You just need to make the Mesos backend in Spark do that automatically for 
> the submitting user.

The problem can't be solved in the Mesos backend.  When I fetch delegation 
tokens for transmission to Executors in the Mesos backend, there's no problem.  
I can set whatever renewer I want.

The problem is that there's a second location where delegation tokens are 
fetched: {{HadoopRDD}}.  This is entirely separate from the fetching that the 
scheduler backends do (either Mesos or YARN).  {{HadoopRDD}} tries to fetch 
split data, and ultimately calls into {{TokenCache}} in the hadoop library, 
which fetches delegation tokens with the renewer set to the YARN 
ResourceManager's principal: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213.
  I'm currently solving this by setting that config var in {{SparkSubmit}}.

The big question I have, which I suppose is more for the {{hadoop}} team, is 
why in the world is {{FileInputFormat}} fetching delegation tokens.  AFAICT, 
they're not sending those tokens to any other process.  They're just fetching 
split data directly from the Name Nodes, and there should be no delegation 
required.


was (Author: mgummelt):
Hey [~vanzin], thanks for the response.

Everything you said is correct, but I want to clarify one thing:

> You just need to make the Mesos backend in Spark do that automatically for 
> the submitting user.

The problem can't be solved in the Mesos backend.  When I fetch delegation 
tokens for transmission to Executors in the Mesos backend, there's no problem.  
I can set whatever renewer I want.

The problem is that there's a second location where delegation tokens are 
fetched: {{HadoopRDD}}.  This is entirely separate from the fetching that the 
scheduler backends do (either Mesos or YARN).  {{HadoopRDD}} tries to fetch 
split data, and ultimately calls into {{TokenCache}} in the hadoop library, 
which fetches delegation tokens with the renewer set to the YARN 
ResourceManager's principal: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213

The big question I have, which I suppose is more for the {{hadoop}} team, is 
why in the world is {{FileInputFormat}} fetching delegation tokens.  AFAICT, 
they're not sending those tokens to any other process.  They're just fetching 
split data directly from the Name Nodes, and there should be no delegation 
required.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
>

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968388#comment-15968388
 ] 

Michael Gummelt commented on SPARK-20328:
-

Hey [~vanzin], thanks for the response.

Everything you said is correct, but I want to clarify one thing:

> You just need to make the Mesos backend in Spark do that automatically for 
> the submitting user.

The problem can't be solved in the Mesos backend.  When I fetch delegation 
tokens for transmission to Executors in the Mesos backend, there's no problem.  
I can set whatever renewer I want.

The problem is that there's a second location where delegation tokens are 
fetched: {{HadoopRDD}}.  This is entirely separate from the fetching that the 
scheduler backends do (either Mesos or YARN).  {{HadoopRDD}} tries to fetch 
split data, and ultimately calls into {{TokenCache}} in the hadoop library, 
which fetches delegation tokens with the renewer set to the YARN 
ResourceManager's principal: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213

The big question I have, which I suppose is more for the {{hadoop}} team, is 
why in the world is {{FileInputFormat}} fetching delegation tokens.  AFAICT, 
they're not sending those tokens to any other process.  They're just fetching 
split data directly from the Name Nodes, and there should be no delegation 
required.

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured (e.g. via 
> {{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
> Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
> principal for use as renewer
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>   at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
> {code}
> I have a workaround where I set a YARN-specific configuration variable to 
> trick {{TokenCache}} into thinking YARN is configured, but this is obviously 
> suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-20328:

Description: 
In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138

Semantically, this is a problem because a HadoopRDD does not represent a Hadoop 
MapReduce job.  Practically, this is a problem because this line: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
 results in this MapReduce-specific security code being called: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
 which assumes the MapReduce master is configured (e.g. via 
{{yarn.resourcemanager.*}}).  If it isn't, an exception is thrown.

So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
the Spark Mesos scheduler:

{code}
Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
principal for use as renewer
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
{code}

I have a workaround where I set a YARN-specific configuration variable to trick 
{{TokenCache}} into thinking YARN is configured, but this is obviously 
suboptimal.

The proper fix to this would likely require significant {{hadoop}} refactoring 
to make split information available without going through {{JobConf}}, so I'm 
not yet sure what the best course of action is.

  was:
In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138

Semantically, this is a problem because a HadoopRDD does not represent a Hadoop 
MapReduce job.  Practically, this is a problem because this line: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
 results in this MapReduce-specific security code being called: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
 which assumes the MapReduce master is configured.  If it isn't, an exception 
is thrown.

So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
the Spark Mesos scheduler:

{code}
Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
principal for use as renewer
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
{code}

I have a workaround where I set a YARN-specific configuration variable to trick 
{{TokenCache}} into thinking YARN is configured, but this is obviously 
suboptimal.

The proper fix to this would likely require significant {{hadoop}} refactoring 
to make split information available without going through {{JobConf}}, so I'm 
not yet sure what the best course of action is.


> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
>

[jira] [Updated] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-20328:

Description: 
In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138

Semantically, this is a problem because a HadoopRDD does not represent a Hadoop 
MapReduce job.  Practically, this is a problem because this line: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
 results in this MapReduce-specific security code being called: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
 which assumes the MapReduce master is configured.  If it isn't, an exception 
is thrown.

So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
the Spark Mesos scheduler:

{code}
Exception in thread "main" java.io.IOException: Can't get Master Kerberos 
principal for use as renewer
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
{code}

I have a workaround where I set a YARN-specific configuration variable to trick 
{{TokenCache}} into thinking YARN is configured, but this is obviously 
suboptimal.

The proper fix to this would likely require significant {{hadoop}} refactoring 
to make split information available without going through {{JobConf}}, so I'm 
not yet sure what the best course of action is.

  was:
In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138

Semantically, this is a problem because a HadoopRDD does not represent a Hadoop 
MapReduce job.  Practically, this is a problem because this line: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
 results in this MapReduce-specific security code being called: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
 which assumes the MapReduce master is configured.  If it isn't, an exception 
is thrown.

So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
the Spark Mesos scheduler.  I have a workaround where I set a YARN-specific 
configuration variable to trick {{TokenCache}} into thinking YARN is 
configured, but this is obviously suboptimal.

The proper fix to this would likely require significant {{hadoop}} refactoring 
to make split information available without going through {{JobConf}}, so I'm 
not yet sure what the best course of action is.


> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured.  If it isn't, an exception 
> is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler:
> {code}
>

[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968232#comment-15968232
 ] 

Michael Gummelt commented on SPARK-20328:
-

cc [~colorant] [~hfeng] [~vanzin]

> HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
> -
>
> Key: SPARK-20328
> URL: https://issues.apache.org/jira/browse/SPARK-20328
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2
>Reporter: Michael Gummelt
>
> In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
> MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138
> Semantically, this is a problem because a HadoopRDD does not represent a 
> Hadoop MapReduce job.  Practically, this is a problem because this line: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
>  results in this MapReduce-specific security code being called: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
>  which assumes the MapReduce master is configured.  If it isn't, an exception 
> is thrown.
> So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
> the Spark Mesos scheduler.  I have a workaround where I set a YARN-specific 
> configuration variable to trick {{TokenCache}} into thinking YARN is 
> configured, but this is obviously suboptimal.
> The proper fix to this would likely require significant {{hadoop}} 
> refactoring to make split information available without going through 
> {{JobConf}}, so I'm not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs

2017-04-13 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-20328:
---

 Summary: HadoopRDDs create a MapReduce JobConf, but are not 
MapReduce jobs
 Key: SPARK-20328
 URL: https://issues.apache.org/jira/browse/SPARK-20328
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0, 2.1.1, 2.1.2
Reporter: Michael Gummelt


In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a 
MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138

Semantically, this is a problem because a HadoopRDD does not represent a Hadoop 
MapReduce job.  Practically, this is a problem because this line: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194
 results in this MapReduce-specific security code being called: 
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130,
 which assumes the MapReduce master is configured.  If it isn't, an exception 
is thrown.

So I'm seeing this exception thrown as I'm trying to add Kerberos support for 
the Spark Mesos scheduler.  I have a workaround where I set a YARN-specific 
configuration variable to trick {{TokenCache}} into thinking YARN is 
configured, but this is obviously suboptimal.

The proper fix to this would likely require significant {{hadoop}} refactoring 
to make split information available without going through {{JobConf}}, so I'm 
not yet sure what the best course of action is.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-10 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963601#comment-15963601
 ] 

Michael Gummelt commented on SPARK-16742:
-

bq. So, assuming that Mesos is configured properly, then it should be OK for 
Spark code to distribute user credentials.

Right.  It's just a matter of the cluster admin syncing Mesos credentials and 
kerberos credentials properly.  In summary, it's simpler in YARN because YARN 
is Kerberos-aware, whereas Mesos isn't.

bq. That sounds like you might need the current code that distributes keytabs 
and logs in the cluster to make even client mode work in this setup.

Since client mode requires network access to the Mesos master, we generally 
assume that the user is on the same network as their datacenter, and can thus 
kinit against the KDC.


> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-10 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963583#comment-15963583
 ] 

Michael Gummelt commented on SPARK-16742:
-

bq. That sounds problematic. The way YARN works is that it actually 
authenticates the user. Are you saying that Mesos doesn't do user 
authentication?

AFAICT, YARN doesn't authenticate the Linux user.  The KDC authenticates the 
kerberos principal, and YARN maps this principal to a Linux user via 
{{hadoop.security.auth_to_local}}.  So if a user authenticated to the KDC via a 
principal "Joe", and the {{auth_to_local}} rule maps "Joe" to "root", then 
"Joe" can launch processes as "root", even though he never provided "root" 
credentials.  It's up to the cluster administrator to properly setup this 
Kerberos -> Linux mapping.

It's a similar story with Mesos.  Mesos doesn't authenticate the Linux user.  
It authenticates the Mesos principal, and this principal is allowed to launch 
processes only as certain Linux users.  It's up the cluster admin to setup this 
mapping appropriately.

The big difference is that, by default, YARN will map the kerberos principal to 
the linux user with the same name, so there's no problem.  Whereas Mesos will 
allow the driver to launch executors as any user that their Mesos principal is 
allowed to launch users as.  So it's up to the admin to only provide users with 
consistent Mesos and Kerberos credentials.

bq. Are you saying that for YARN or Mesos? When YARN runs in Kerberos mode, 
Kerberos dictates the user.

I'm talking about YARN.  See the above comment.  If {{auth_to_local}} is used 
like I think it is, then that's what ultimately determines the Linux user, not 
just Kerberos.

bq.  The use case you mention ("user starting an application in cluster mode 
with no kerberos credentials") sounds actually worrying

I actually said a "user might not be kinit'd".  They may, however, have access 
to the keytab.  But since they're not on the same network as the KDC, they 
can't authenticate directly.  But they do have the creds.


> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-10 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963469#comment-15963469
 ] 

Michael Gummelt commented on SPARK-16742:
-

[~jerryshao] Great! The current RPC used in Mesos is very simple.  The executor 
just periodically requests the latest credentials from the driver, which uses 
the keytab to periodically renew.  We can swap in a different mechanism once 
that exists.

I left a comment on your design doc.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-10 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963446#comment-15963446
 ] 

Michael Gummelt edited comment on SPARK-16742 at 4/10/17 8:35 PM:
--

[~vanzin]

bq. The most basic feature needed for any kerberos-related work is user 
isolation (different users cannot mess with each others' processes). I was 
under the impression that Mesos supported that.

Mesos of course supports configuring the Linux user that process runs as.  But 
in Spark, this isn't currently derived from the Kerberos principal.  It's 
configured by the user.  The scheduler's *Mesos* principal, along with ACLs 
configured in Mesos, is what determines which Linux users are allowed.  That's 
why I was asking about {{hadoop.security.auth_to_local}}, to understand how 
YARN determines what Linux user to run executors as.  It would be a 
vulnerability, for example, if the Linux user for the executors is simply 
derived from that of the driver, because two human users running as the same 
Linux user, but logged in via different Kerberos principals, would be able to 
see each others' tokens.

bq. I don't know where this notion that cluster mode requires you to distribute 
keytabs comes from

As you said, it's mostly the renewal use case that requires distributing the 
keytab, but that's not all.  In many Mesos setups, and certainly in DC/OS, the 
submitting user might not already be kinit'd.  They may be running from outside 
the datacenter entirely, without network access to the KDC.

You're right that we could implement cluster mode in some form, but I'd rather 
keep the initial PR small.  I hope that's acceptable.


was (Author: mgummelt):
[~vanzin]

bq. The most basic feature needed for any kerberos-related work is user 
isolation (different users cannot mess with each others' processes). I was 
under the impression that Mesos supported that.

Mesos of course supports configuring the Linux user that process runs as.  But 
in Spark, this isn't currently derived from the Kerberos principal.  It's 
configured by the user.  The scheduler's *Mesos* principal, along with ACLs 
configured in Mesos, is what determines which Linux users are allowed.  That's 
why I was asking about {{hadoop.security.auth_to_local}}, to understand how 
YARN determines what Linux user to run executors as.  It would be a 
vulnerability, for example, if the Linux user for the executors is simply 
derived from that of the driver, because two human users running as the same 
Linux user, but logged in via different Kerberos principals, would be able to 
see each others' tokens.

bq. I don't know where this notion that cluster mode requires you to distribute 
keytabs comes from

As you said, it's mostly the renewal use case that requires distributing the 
keytab, but that's not all.  In many Mesos setups, and certainly in DC/OS, the 
submitting user might not already be kinit'd.  They may be running from outside 
the datacenter entirely, without network access to the KDC.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-10 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963446#comment-15963446
 ] 

Michael Gummelt edited comment on SPARK-16742 at 4/10/17 8:20 PM:
--

[~vanzin]

bq. The most basic feature needed for any kerberos-related work is user 
isolation (different users cannot mess with each others' processes). I was 
under the impression that Mesos supported that.

Mesos of course supports configuring the Linux user that process runs as.  But 
in Spark, this isn't currently derived from the Kerberos principal.  It's 
configured by the user.  The scheduler's *Mesos* principal, along with ACLs 
configured in Mesos, is what determines which Linux users are allowed.  That's 
why I was asking about {{hadoop.security.auth_to_local}}, to understand how 
YARN determines what Linux user to run executors as.  It would be a 
vulnerability, for example, if the Linux user for the executors is simply 
derived from that of the driver, because two human users running as the same 
Linux user, but logged in via different Kerberos principals, would be able to 
see each others' tokens.

bq. I don't know where this notion that cluster mode requires you to distribute 
keytabs comes from

As you said, it's mostly the renewal use case that requires distributing the 
keytab, but that's not all.  In many Mesos setups, and certainly in DC/OS, the 
submitting user might not already be kinit'd.  They may be running from outside 
the datacenter entirely, without network access to the KDC.


was (Author: mgummelt):
[~vanzin]

bq. The most basic feature needed for any kerberos-related work is user 
isolation (different users cannot mess with each others' processes). I was 
under the impression that Mesos supported that.

Mesos of course supports configuring the Linux user that process runs as.  But 
in Spark, this isn't currently derived from the Kerberos principal.  It's 
configured by the user, and the *Mesos* principal of the scheduler, along with 
ACLs configured in Mesos, is what determines which Linux users are allowed.  
That's why I was asking about {{hadoop.security.auth_to_local}}, to understand 
how YARN determines what Linux user to run executors as.  It would be a 
vulnerability, for example, if the Linux user for the executors is simply 
derived from that of the driver, because two human users running as the same 
Linux user, but logged in via different Kerberos principals, would be able to 
see each others' tokens.

bq. I don't know where this notion that cluster mode requires you to distribute 
keytabs comes from

As you said, it's mostly the renewal use case that requires distributing the 
keytab, but that's not all.  In many Mesos setups, and certainly in DC/OS, the 
submitting user might not already be kinit'd.  They may be running from outside 
the datacenter entirely, without network access to the KDC.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-10 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963446#comment-15963446
 ] 

Michael Gummelt commented on SPARK-16742:
-

bq. The most basic feature needed for any kerberos-related work is user 
isolation (different users cannot mess with each others' processes). I was 
under the impression that Mesos supported that.

Mesos of course supports configuring the Linux user that process runs as.  But 
in Spark, this isn't currently derived from the Kerberos principal.  It's 
configured by the user, and the *Mesos* principal of the scheduler, along with 
ACLs configured in Mesos, is what determines which Linux users are allowed.  
That's why I was asking about {{hadoop.security.auth_to_local}}, to understand 
how YARN determines what Linux user to run executors as.  It would be a 
vulnerability, for example, if the Linux user for the executors is simply 
derived from that of the driver, because two human users running as the same 
Linux user, but logged in via different Kerberos principals, would be able to 
see each others' tokens.

bq. I don't know where this notion that cluster mode requires you to distribute 
keytabs comes from

As you said, it's mostly the renewal use case that requires distributing the 
keytab, but that's not all.  In many Mesos setups, and certainly in DC/OS, the 
submitting user might not already be kinit'd.  They may be running from outside 
the datacenter entirely, without network access to the KDC.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-10 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963446#comment-15963446
 ] 

Michael Gummelt edited comment on SPARK-16742 at 4/10/17 8:18 PM:
--

[~vanzin]

bq. The most basic feature needed for any kerberos-related work is user 
isolation (different users cannot mess with each others' processes). I was 
under the impression that Mesos supported that.

Mesos of course supports configuring the Linux user that process runs as.  But 
in Spark, this isn't currently derived from the Kerberos principal.  It's 
configured by the user, and the *Mesos* principal of the scheduler, along with 
ACLs configured in Mesos, is what determines which Linux users are allowed.  
That's why I was asking about {{hadoop.security.auth_to_local}}, to understand 
how YARN determines what Linux user to run executors as.  It would be a 
vulnerability, for example, if the Linux user for the executors is simply 
derived from that of the driver, because two human users running as the same 
Linux user, but logged in via different Kerberos principals, would be able to 
see each others' tokens.

bq. I don't know where this notion that cluster mode requires you to distribute 
keytabs comes from

As you said, it's mostly the renewal use case that requires distributing the 
keytab, but that's not all.  In many Mesos setups, and certainly in DC/OS, the 
submitting user might not already be kinit'd.  They may be running from outside 
the datacenter entirely, without network access to the KDC.


was (Author: mgummelt):
bq. The most basic feature needed for any kerberos-related work is user 
isolation (different users cannot mess with each others' processes). I was 
under the impression that Mesos supported that.

Mesos of course supports configuring the Linux user that process runs as.  But 
in Spark, this isn't currently derived from the Kerberos principal.  It's 
configured by the user, and the *Mesos* principal of the scheduler, along with 
ACLs configured in Mesos, is what determines which Linux users are allowed.  
That's why I was asking about {{hadoop.security.auth_to_local}}, to understand 
how YARN determines what Linux user to run executors as.  It would be a 
vulnerability, for example, if the Linux user for the executors is simply 
derived from that of the driver, because two human users running as the same 
Linux user, but logged in via different Kerberos principals, would be able to 
see each others' tokens.

bq. I don't know where this notion that cluster mode requires you to distribute 
keytabs comes from

As you said, it's mostly the renewal use case that requires distributing the 
keytab, but that's not all.  In many Mesos setups, and certainly in DC/OS, the 
submitting user might not already be kinit'd.  They may be running from outside 
the datacenter entirely, without network access to the KDC.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-09 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962450#comment-15962450
 ] 

Michael Gummelt commented on SPARK-16742:
-

Also, note that the above Mesos implementation is not dependent on Mesos in any 
way.  It just uses Spark's existing RPC mechanisms to transmit delegation 
tokens.  I see that there's a related effort here to standardize this RPC 
mechanism: https://issues.apache.org/jira/browse/SPARK-19143.  We'd be more 
than happy to adopt that standard once it exists.  But hopefully our one-off 
RPC that we're currently using is acceptable in the interim.

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-09 Thread Michael Gummelt (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962440#comment-15962440
]

Michael Gummelt edited comment on SPARK-16742 at 4/10/17 5:28 AM:
--

Hi [~vanzin],

[~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation
for now, and we at Mesosphere are about to submit a PR to upstream our
implementation. I have a few questions I'd like to run by you to make sure
that PR goes smoothly.

1) I've been following your comments on this Spark Standalone Kerberos PR:
https://github.com/apache/spark/pull/17530. It looks like your concern is that
in *cluster mode*, the keytab is written to a file on the host running the
driver, and is owned by the user of the Spark Worker, which will be the same
for each job. So jobs submitted by multiple users will be able to read each
other's keytabs. In *client mode*, it looks like the delegation tokens are
written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the
executor, which suffers from the same problem as the keytab in cluster mode.

The problem is then that a kerberos-authenticated user submitting their job
would be unaware that their credentials are being leaked to other users. Is
this an accurate description of the issue?

2) I understand that YARN writes delegation tokens via
{{amContainer.setTokens()}}, which ultimately results in the delegation token
being written to a file owned by the submitting user. However, since the
"submitting user" is a Kerberos user, not a Unix user, I'm assuming that
{{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix
user who runs the ApplicationMaster and owns that file. Is that correct?

To avoid the shared-file problem for delegation tokens, our Mesos
implementation currently has the Executor issue an RPC call to fetch the
delegation token from the driver. There therefore isn't any need for at-rest
access control, and if in-motion interception is in the user's threat model,
then can be sure to run Spark with SSL.

We avoid the shared-file problem for keytabs entirely, because there's no need
to distribute the keytab, at least in client mode. Unlike YARN, the driver and
the equivalent of the "ApplicationMaster" in Mesos are one and the same. They
both exist in the same process, the {{spark-submit}} process.

We're probably going to punt on cluster mode for now, just for simplicity, but
we should be able to solve this in cluster mode as well, because unlike
standalone, and much like YARN, Mesos controls what user the driver runs as.

What do you think of the above approach? If you see any blockers, I would very
much appreciate teasing those out now rather than during the PR. Thanks!

was (Author: mgummelt):
Hi [~vanzin],

The problem is then that a kerberos-authenticated user submitting their job
would be unaware that their credentials are being leaked to other users. Is
this an accurate description of the issue?

To avoid the shared-file problem for delegation tokens, our Mesos
implementation currently has the Executor issue an RPC call to fetch the
delegation token from the driver. There therefore isn't any need for at-rest
encryption, and if in-motion encryption is in the user's threat model, then can
be sure to run Spark with SSL.

[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos

2017-04-09 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962440#comment-15962440
 ] 

Michael Gummelt commented on SPARK-16742:
-

Hi [~vanzin],

[~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation 
for now, and we at Mesosphere are about to submit a PR to upstream our 
implementation.  I have a few questions I'd like to run by you to make sure 
that PR goes smoothly.

1) I've been following your comments on this Spark Standalone Kerberos PR: 
https://github.com/apache/spark/pull/17530.  It looks like your concern is that 
in *cluster mode*, the keytab is written to a file on the host running the 
driver, and is owned by the user of the Spark Worker, which will be the same 
for each job.  So jobs submitted by multiple users will be able to read each 
other's keytabs.  In *client mode*, it looks like the delegation tokens are 
written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the 
executor, which suffers from the same problem as the keytab in cluster mode.

The problem is then that a kerberos-authenticated user submitting their job 
would be unaware that their credentials are being leaked to other users.  Is 
this an accurate description of the issue?  

2) I understand that YARN writes delegation tokens via 
{{amContainer.setTokens()}}, which ultimately results in the delegation token 
being written to a file owned by the submitting user.  However, since the 
"submitting user" is a Kerberos user, not a Unix user, I'm assuming that 
{{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix 
user who runs the ApplicationMaster and owns that file.  Is that correct?

To avoid the shared-file problem for delegation tokens, our Mesos 
implementation currently has the Executor issue an RPC call to fetch the 
delegation token from the driver.  There therefore isn't any need for at-rest 
encryption, and if in-motion encryption is in the user's threat model, then can 
be sure to run Spark with SSL.

We avoid the shared-file problem for keytabs entirely, because there's no need 
to distribute the keytab, at least in client mode.  Unlike YARN, the driver and 
the equivalent of the "ApplicationMaster" in Mesos are one and the same.  They 
both exist in the same process, the {{spark-submit}} process.

We're probably going to punt on cluster mode for now, just for simplicity, but 
we should be able to solve this in cluster mode as well, because unlike 
standalone, and much like YARN, Mesos controls what user the driver runs as.

What do you think of the above approach?  If you see any blockers, I would very 
much appreciate teasing those out now rather than during the PR.  Thanks!

> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20054) [Mesos] Detectability for resource starvation

2017-03-21 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935605#comment-15935605
 ] 

Michael Gummelt commented on SPARK-20054:
-

Sounds like this could be solved just by having some better logging?  Something 
that indicates the driver is waiting for more registered executors?

> [Mesos] Detectability for resource starvation
> -
>
> Key: SPARK-20054
> URL: https://issues.apache.org/jira/browse/SPARK-20054
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Kamal Gurala
>Priority: Minor
>
> We currently use Mesos 1.1.0 for our Spark cluster in coarse-grained mode. We 
> had a production issue recently wherein we had our spark frameworks accept 
> resources from the Mesos master, so executors were started and spark driver 
> was aware of them, but the driver didn’t plan any task and nothing was 
> happening for a long time because it didn't meet a minimum registered 
> resources threshold. and the cluster is usually under-provisioned in order 
> because not all the jobs need to run at the same time. These held resources 
> were never offered back to the master for re-allocation leading to the entire 
> cluster to a halt until we had to manually intervene. 
> Using DRF for mesos and FIFO for Spark and the cluster is usually 
> under-provisioned. At any point of time there could be 10-15 spark frameworks 
> running on Mesos on the under-provisioned cluster 
> The ask is to have a way to better recoverability or detectability for a 
> scenario where the individual Spark frameworks hold onto resources but never 
> launch any tasks or have these frameworks release these resources after a 
> fixed amount of time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores

2017-03-01 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891332#comment-15891332
 ] 

Michael Gummelt commented on SPARK-19373:
-

[~skonto] Either decline or hoard.

> Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at 
> acquired cores rather than registerd cores
> ---
>
> Key: SPARK-19373
> URL: https://issues.apache.org/jira/browse/SPARK-19373
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.3, 2.0.2, 2.1.0
>Reporter: Michael Gummelt
>Assignee: Michael Gummelt
> Fix For: 2.1.1, 2.2.0
>
>
> We're currently using `totalCoresAcquired` to account for registered 
> resources, which is incorrect.  That variable measures the number of cores 
> the scheduler has accepted.  We should be using `totalCoreCount` like the 
> other schedulers do.
> Fixing this is important for locality, since users often want to wait for all 
> executors to come up before scheduling tasks to ensure they get a node-local 
> placement. 
> original PR to add support: https://github.com/apache/spark/pull/8672/files



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores

2017-02-28 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-19373:

Affects Version/s: 1.6.3
   2.0.2

> Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at 
> acquired cores rather than registerd cores
> ---
>
> Key: SPARK-19373
> URL: https://issues.apache.org/jira/browse/SPARK-19373
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.3, 2.0.2, 2.1.0
>Reporter: Michael Gummelt
>
> We're currently using `totalCoresAcquired` to account for registered 
> resources, which is incorrect.  That variable measures the number of cores 
> the scheduler has accepted.  We should be using `totalCoreCount` like the 
> other schedulers do.
> Fixing this is important for locality, since users often want to wait for all 
> executors to come up before scheduling tasks to ensure they get a node-local 
> placement. 
> original PR to add support: https://github.com/apache/spark/pull/8672/files



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores

2017-02-28 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888692#comment-15888692
 ] 

Michael Gummelt commented on SPARK-19373:
-

This change makes it so that the user can instruct the driver to wait for all 
executors to register before scheduling tasks.  The TaskSchedulerImpl 
understand locality, so it can then make the optimal placement. Otherwise, 
tasks are scheduled as soon as the first executor is registered, which of 
course might not be node-local for the first task.

However, this is still assuming that executors will be scheduled on the correct 
nodes, which isn't guaranteed unless you're launching executors on every node 
in your cluster.  For the best locality functionality, we need to integrate 
task locality information with dynamic allocation, so that the driver can 
dynamically spin up executors on the needed nodes.  That is outside the scope 
of this JIRA, though.

> Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at 
> acquired cores rather than registerd cores
> ---
>
> Key: SPARK-19373
> URL: https://issues.apache.org/jira/browse/SPARK-19373
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>
> We're currently using `totalCoresAcquired` to account for registered 
> resources, which is incorrect.  That variable measures the number of cores 
> the scheduler has accepted.  We should be using `totalCoreCount` like the 
> other schedulers do.
> Fixing this is important for locality, since users often want to wait for all 
> executors to come up before scheduling tasks to ensure they get a node-local 
> placement. 
> original PR to add support: https://github.com/apache/spark/pull/8672/files



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19702) Add Suppress/Revive support to the Mesos Spark Dispatcher

2017-02-27 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-19702:

Description: Due to the problem described here: 
https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks 
concurrently can result in starvation.  For example, running 10 dispatchers 
could result in 5 of them getting all the offers, even if they have no jobs to 
launch.  We must implement increase the refuse_seconds timeout to solve this 
problem.  Another option would have been to implement suppress/revive, but that 
can cause starvation due to the unreliability of mesos RPC calls.  (was: Due to 
the problem described here: https://issues.apache.org/jira/browse/MESOS-6112, 
Running > 5 Mesos frameworks concurrently can result in starvation.  For 
example, running 10 dispatchers could result in 5 of them getting all the 
offers, even if they have no jobs to launch.  We must implement explicit 
SUPPRESS and REVIVE calls in the Spark Dispatcher to solve this problem.)

> Add Suppress/Revive support to the Mesos Spark Dispatcher
> -
>
> Key: SPARK-19702
> URL: https://issues.apache.org/jira/browse/SPARK-19702
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>
> Due to the problem described here: 
> https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos 
> frameworks concurrently can result in starvation.  For example, running 10 
> dispatchers could result in 5 of them getting all the offers, even if they 
> have no jobs to launch.  We must implement increase the refuse_seconds 
> timeout to solve this problem.  Another option would have been to implement 
> suppress/revive, but that can cause starvation due to the unreliability of 
> mesos RPC calls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19702) Increasse refuse_seconds timeout in the Mesos Spark Dispatcher

2017-02-27 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-19702:

Summary: Increasse refuse_seconds timeout in the Mesos Spark Dispatcher  
(was: Add Suppress/Revive support to the Mesos Spark Dispatcher)

> Increasse refuse_seconds timeout in the Mesos Spark Dispatcher
> --
>
> Key: SPARK-19702
> URL: https://issues.apache.org/jira/browse/SPARK-19702
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.1.0
>Reporter: Michael Gummelt
>
> Due to the problem described here: 
> https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos 
> frameworks concurrently can result in starvation.  For example, running 10 
> dispatchers could result in 5 of them getting all the offers, even if they 
> have no jobs to launch.  We must implement increase the refuse_seconds 
> timeout to solve this problem.  Another option would have been to implement 
> suppress/revive, but that can cause starvation due to the unreliability of 
> mesos RPC calls.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19702) Add Suppress/Revive support to the Mesos Spark Dispatcher

2017-02-22 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-19702:
---

 Summary: Add Suppress/Revive support to the Mesos Spark Dispatcher
 Key: SPARK-19702
 URL: https://issues.apache.org/jira/browse/SPARK-19702
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.1.0
Reporter: Michael Gummelt


Due to the problem described here: 
https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks 
concurrently can result in starvation.  For example, running 10 dispatchers 
could result in 5 of them getting all the offers, even if they have no jobs to 
launch.  We must implement explicit SUPPRESS and REVIVE calls in the Spark 
Dispatcher to solve this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19703) Add Suppress/Revive support to the Mesos Spark Driver

2017-02-22 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-19703:
---

 Summary: Add Suppress/Revive support to the Mesos Spark Driver
 Key: SPARK-19703
 URL: https://issues.apache.org/jira/browse/SPARK-19703
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.1.0
Reporter: Michael Gummelt


Due to the problem described here: 
https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks 
concurrently can result in starvation.  For example, running 10 jobs could 
result in 5 of them getting all the offers, even after they've launched all 
their executors.  This leads to starvation of the other jobs.  We must 
implement explicit SUPPRESS and REVIVE calls in the Spark Dispatcher to solve 
this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19479) Spark Mesos artifact split causes spark-core dependency to not pull in mesos impl

2017-02-06 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855091#comment-15855091
 ] 

Michael Gummelt commented on SPARK-19479:
-

Yea, sorry for the inconvenience, but I announced this on the dev list.  Search 
for "Mesos is now a maven module".  If I were you, I would create an email 
filter for "Mesos" on the user/dev lists.  This is what I do.

> Spark Mesos artifact split causes spark-core dependency to not pull in mesos 
> impl
> -
>
> Key: SPARK-19479
> URL: https://issues.apache.org/jira/browse/SPARK-19479
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos, Spark Core
>Affects Versions: 2.1.0
>Reporter: Charles Allen
>
> https://github.com/apache/spark/pull/14637 ( 
> https://issues.apache.org/jira/browse/SPARK-16967 ) forked off the mesos impl 
> into its own artifact, but the release notes do not call this out. This broke 
> our deployments because we depend on packaging with spark-core, which no 
> longer had any mesos awareness. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-16742) Kerberos support for Spark on Mesos

2017-01-30 Thread Michael Gummelt (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt commented on  SPARK-16742 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Kerberos support for Spark on Mesos  
 
 
 
 
 
 
 
 
 
 
Thomas Graves Yea, I'm pretty sure we're going to change that to use delegation tokens like the existing solutions. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)

[jira] (SPARK-16742) Kerberos support for Spark on Mesos

2017-01-30 Thread Michael Gummelt (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt commented on  SPARK-16742 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Kerberos support for Spark on Mesos  
 
 
 
 
 
 
 
 
 
 
As an update, we (Mesosphere) are working with Stratio on a joint solution. Stratio will submit a WIP PR soon, and we'll have a design discussion in this JIRA issue. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)

[jira] (SPARK-16784) Configurable log4j settings

2017-01-29 Thread Michael Gummelt (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-16784 
 
 
 
  Configurable log4j settings  
 
 
 
 
 
 
 
 
 

Change By:
 
 Michael Gummelt 
 
 
 

Affects Version/s:
 
 2.1.0 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)

[jira] (SPARK-16784) Configurable log4j settings

2017-01-29 Thread Michael Gummelt (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt commented on  SPARK-16784 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Configurable log4j settings  
 
 
 
 
 
 
 
 
 
 
This actually doesn't seem to work for executors. I have a file log4j.properties.debug with the following content: 

 

log4j.rootCategory=DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=INFO

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
 

 
And I've run my job as follows: 

 

root@ip-10-0-6-74:/opt/spark/dist# ./bin/spark-shell --keytab $(pwd)/hadoop-install/keytabs/nn.10.0.2.keytab --principal nn/10.0.2.103@LOCAL --master mesos://leader.mesos:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.7 --conf spark.mesos.executor.home=/opt/spark/dist --conf spark.mesos.uris=http://mgummelt-mesos.s3.amazonaws.com/log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug"
 

 
I've verified that /mnt/mesos/sandbox/log4j.properties.debug exists in the executor's file system, and that the executor process is run with -Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug 
But debug logging is not enabled, and the executors print: 

 

root@ip-10-0-6-74:/opt/spark/dist# ./bin/spark-shell --keytab $(pwd)/hadoop-install/keytabs/nn.10.0.2.keytab --principal nn/10.0.2.103@LOCAL --master mesos://leader.mesos:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.7 --conf spark.mesos.executor.home=/opt/spark/dist --conf spark.mesos.uris=http://mgummelt-mesos.s3.amazonaws.com/log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug"

[jira] (SPARK-16784) Configurable log4j settings

2017-01-29 Thread Michael Gummelt (JIRA)

Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Michael Gummelt edited a comment on  SPARK-16784 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Configurable log4j settings  
 
 
 
 
 
 
 
 
 
 This actually doesn't seem to work for executors.  I have a file {{log4j.properties.debug}} with the following content:{code}log4j.rootCategory=DEBUG, consolelog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.appender.console.target=System.errlog4j.appender.console.layout=org.apache.log4j.PatternLayoutlog4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n# Set the default spark-shell log level to WARN. When running the spark-shell, the# log level for this class is used to overwrite the root logger's log level, so that# the user can have different defaults for the shell and regular Spark apps.log4j.logger.org.apache.spark.repl.Main=INFO# Settings to quiet third party logs that are too verboselog4j.logger.org.spark-project.jetty=WARNlog4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERRORlog4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFOlog4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFOlog4j.logger.org.apache.parquet=ERRORlog4j.logger.parquet=ERROR# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive supportlog4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATALlog4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR{code}And I've run my job as follows:{code}root@ip-10-0-6-74:/opt/spark/dist# ./bin/spark-shell --keytab $(pwd)/hadoop-install/keytabs/nn.10.0.2.keytab --principal nn/10.0.2.103@LOCAL --master mesos://leader.mesos:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.7 --conf spark.mesos.executor.home=/opt/spark/dist --conf spark.mesos.uris=http://mgummelt-mesos.s3.amazonaws.com/log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug"{code}I've verified that {{/mnt/mesos/sandbox/log4j.properties.debug}} exists in the executor's file system, and that the executor process is run with {{-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug}}But debug logging is not enabled, and the executors print:{code} root@ip-10-0-6-74 \Using Spark's default log4j profile :  org / opt apache /spark/ dist# ./bin/spark log4j - shell --keytab $(pwd)/hadoop-install/keytabs/nn defaults . 10.0.2.keytab --principal nn properties17 / 10.0.2.103@LOCAL --master mesos: 01 / /leader.mesos 30 02 : 5050 43:34 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 9@ip  - 10 - conf spark.mesos.executor.docker.image=mesosphere/spark:1. 0 .7 -2 -159 . 1.0 us - hadoop west -2. 7 --conf spark compute . mesos.executor.home= internal17 / opt 01 / spark/dist --conf spark.mesos.uris=http 30 02 : 43:34 INFO SignalUtils: Registered signal handler for TERM17 / 01 / mgummelt-mesos.s3.amazonaws.com 30 02:43:34 INFO SignalUtils: Registered signal handler for HUP17 / log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration= 01 / mnt/mesos/sandbox/log4j.properties.debug" 30 02:43:34 INFO SignalUtils: Registered signal handler for INT {code} 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment

[jira] [Created] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores

2017-01-26 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-19373:
---

 Summary: Mesos implementation of 
spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than 
registerd cores
 Key: SPARK-19373
 URL: https://issues.apache.org/jira/browse/SPARK-19373
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 2.1.0
Reporter: Michael Gummelt


We're currently using `totalCoresAcquired` to account for registered resources, 
which is incorrect.  That variable measures the number of cores the scheduler 
has accepted.  We should be using `totalCoreCount` like the other schedulers do.

Fixing this is important for locality, since users often want to wait for all 
executors to come up before scheduling tasks to ensure they get a node-local 
placement. 

original PR to add support: https://github.com/apache/spark/pull/8672/files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10643) Support remote application download in client mode spark submit

2016-11-18 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-10643:

Summary: Support remote application download in client mode spark submit  
(was: Support HDFS application download in client mode spark submit)

> Support remote application download in client mode spark submit
> ---
>
> Key: SPARK-10643
> URL: https://issues.apache.org/jira/browse/SPARK-10643
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Submit
>Reporter: Alan Braithwaite
>Priority: Minor
>
> When using mesos with docker and marathon, it would be nice to be able to 
> make spark-submit deployable on marathon and have that download a jar from 
> HDFS instead of having to package the jar with the docker.
> {code}
> $ docker run -it docker.example.com/spark:latest 
> /usr/local/spark/bin/spark-submit  --class 
> com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar 
> Warning: Skip remote jar hdfs://hdfs/tmp/application.jar.
> java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> Although I'm aware that we can run in cluster mode with mesos, we've already 
> built some nice tools surrounding marathon for logging and monitoring.
> Code in question:
> https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit

2016-11-18 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15677467#comment-15677467
 ] 

Michael Gummelt commented on SPARK-10643:
-

It's not just HDFS.  HTTP urls fail as well:

{code}
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local 
http://mgummelt-mesos.s3.amazonaws.com/spark-examples_2.11-2.0.0.jar
Warning: Skip remote jar 
http://mgummelt-mesos.s3.amazonaws.com/spark-examples_2.11-2.0.0.jar.
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

But the docs say this is supported:

"hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as 
expected"

> Support HDFS application download in client mode spark submit
> -
>
> Key: SPARK-10643
> URL: https://issues.apache.org/jira/browse/SPARK-10643
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Submit
>Reporter: Alan Braithwaite
>Priority: Minor
>
> When using mesos with docker and marathon, it would be nice to be able to 
> make spark-submit deployable on marathon and have that download a jar from 
> HDFS instead of having to package the jar with the docker.
> {code}
> $ docker run -it docker.example.com/spark:latest 
> /usr/local/spark/bin/spark-submit  --class 
> com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar 
> Warning: Skip remote jar hdfs://hdfs/tmp/application.jar.
> java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}
> Although I'm aware that we can run in cluster mode with mesos, we've already 
> built some nice tools surrounding marathon for logging and monitoring.
> Code in question:
> https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18232) Support Mesos CNI

2016-11-15 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15667791#comment-15667791
 ] 

Michael Gummelt commented on SPARK-18232:
-

[~rxin] Fix Version should be 2.1.0, right?

> Support Mesos CNI
> -
>
> Key: SPARK-18232
> URL: https://issues.apache.org/jira/browse/SPARK-18232
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Michael Gummelt
>Assignee: Michael Gummelt
> Fix For: 2.2.0
>
>
> Add the ability to launch containers attached to a CNI network: 
> http://mesos.apache.org/documentation/latest/cni/
> This allows for user-pluggable network isolation, including IP-per-container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18232) Support Mesos CNI

2016-11-02 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-18232:
---

 Summary: Support Mesos CNI
 Key: SPARK-18232
 URL: https://issues.apache.org/jira/browse/SPARK-18232
 Project: Spark
  Issue Type: Improvement
  Components: Mesos
Reporter: Michael Gummelt


Add the ability to launch containers attached to a CNI network: 
http://mesos.apache.org/documentation/latest/cni/

This allows for user-pluggable network isolation, including IP-per-container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-10-31 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622992#comment-15622992
 ] 

Michael Gummelt commented on SPARK-16522:
-

This JIRA was for a bug in Mesos.  If you're getting this error w/ Standalone, 
it's likely a different bug, and you should submit a separate JIRA.

> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>Assignee: Sun Rui
> Fix For: 2.0.1, 2.1.0
>
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
>

[jira] [Updated] (SPARK-17454) Use Mesos disk resources

2016-09-26 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-17454:

Summary: Use Mesos disk resources  (was: Add option to specify Mesos 
resource offer constraints)

> Use Mesos disk resources
> 
>
> Key: SPARK-17454
> URL: https://issues.apache.org/jira/browse/SPARK-17454
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Chris Bannister
>
> Currently the driver will accept offers from Mesos which have enough ram for 
> the executor and until its max cores is reached. There is no way to control 
> the required CPU's or disk for each executor, it would be very useful to be 
> able to apply something similar to spark.mesos.constraints to resource offers 
> instead of attributes on the offer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17454) Add option to specify Mesos resource offer constraints

2016-09-23 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15517007#comment-15517007
 ] 

Michael Gummelt commented on SPARK-17454:
-

So you're trying to only launch executors on nodes with a sufficient amount of 
disk space?

> Add option to specify Mesos resource offer constraints
> --
>
> Key: SPARK-17454
> URL: https://issues.apache.org/jira/browse/SPARK-17454
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Chris Bannister
>
> Currently the driver will accept offers from Mesos which have enough ram for 
> the executor and until its max cores is reached. There is no way to control 
> the required CPU's or disk for each executor, it would be very useful to be 
> able to apply something similar to spark.mesos.constraints to resource offers 
> instead of attributes on the offer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16742) Kerberos support for Spark on Mesos

2016-09-12 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16742:

Description: 
We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
contributing it to Apache Spark soon.

Mesosphere design doc: 
https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
Mesosphere code: 
https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa

  was:
We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
contributing it to Apache Spark soon.


https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa


> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> Mesosphere design doc: 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6
> Mesosphere code: 
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16742) Kerberos support for Spark on Mesos

2016-09-12 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16742:

Description: 
We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
contributing it to Apache Spark soon.


https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa

  was:
We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
contributing it to Apache Spark soon.

https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa


> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16742) Kerberos support for Spark on Mesos

2016-09-12 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16742:

Description: 
We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
contributing it to Apache Spark soon.

https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa

  was:We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll 
be contributing it to Apache Spark soon.


> Kerberos support for Spark on Mesos
> ---
>
> Key: SPARK-16742
> URL: https://issues.apache.org/jira/browse/SPARK-16742
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> We at Mesosphere have written Kerberos support for Spark on Mesos.  We'll be 
> contributing it to Apache Spark soon.
> https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17454) Add option to specify Mesos resource offer constraints

2016-09-11 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15482682#comment-15482682
 ] 

Michael Gummelt commented on SPARK-17454:
-

As of Spark 2.0, Mesos mode supports spark.executor.cores

And the scheduler doesn't reserve any disk.  It just writes to the local 
workspace.  Do you have a need for disk reservation?

> Add option to specify Mesos resource offer constraints
> --
>
> Key: SPARK-17454
> URL: https://issues.apache.org/jira/browse/SPARK-17454
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Chris Bannister
>
> Currently the driver will accept offers from Mesos which have enough ram for 
> the executor and until its max cores is reached. There is no way to control 
> the required CPU's or disk for each executor, it would be very useful to be 
> able to apply something similar to spark.mesos.constraints to resource offers 
> instead of attributes on the offer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17419) Mesos virtual network support

2016-09-06 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-17419:
---

 Summary: Mesos virtual network support
 Key: SPARK-17419
 URL: https://issues.apache.org/jira/browse/SPARK-17419
 Project: Spark
  Issue Type: Task
  Components: Mesos
Reporter: Michael Gummelt


http://mesos.apache.org/documentation/latest/cni/

This will enable launching executors into virtual networks for isolation and 
security. It will also enable container per IP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-17067) Revocable resource support

2016-09-01 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-17067:

Description: Blocked by https://issues.apache.org/jira/browse/MESOS-4392

> Revocable resource support
> --
>
> Key: SPARK-17067
> URL: https://issues.apache.org/jira/browse/SPARK-17067
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Michael Gummelt
>
> Blocked by https://issues.apache.org/jira/browse/MESOS-4392



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-11183) enable support for mesos 0.24+

2016-09-01 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt closed SPARK-11183.
---
Resolution: Done

> enable support for mesos 0.24+
> --
>
> Key: SPARK-11183
> URL: https://issues.apache.org/jira/browse/SPARK-11183
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Ioannis Polyzos
>
> mesos 0.24, the mesos leader info in ZK has changed to json tis result to 
> spark failed to running on 0.24+.
> References : 
>   https://issues.apache.org/jira/browse/MESOS-2340 
>   
> http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E
>   https://github.com/mesos/elasticsearch/issues/338
>   https://github.com/spark-jobserver/spark-jobserver/issues/267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-6679) java.lang.ClassNotFoundException on Mesos fine grained mode and input replication

2016-09-01 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt closed SPARK-6679.
--
Resolution: Won't Fix

fine-grained is deprecated 

> java.lang.ClassNotFoundException on Mesos fine grained mode and input 
> replication
> -
>
> Key: SPARK-6679
> URL: https://issues.apache.org/jira/browse/SPARK-6679
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos, Streaming
>Affects Versions: 1.3.0
>Reporter: Ondřej Smola
>
> Spark Streaming 1.3.0, Mesos 0.21.1 - Only when using fine grained mode and 
> receiver input replication (StorageLevel.MEMORY_ONLY_2, 
> StorageLevel.MEMORY_AND_DISK_2). When using coarse grained mode it works. 
> When not using replication (StorageLevel.MEMORY_ONLY ...) it works. Error:
> {code}
> 15/03/26 14:50:00 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() on RPC id 7178767328921933569
> java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:344)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:88)
>   at 
> org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:65)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster

2016-09-01 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt closed SPARK-5197.
--
Resolution: Won't Fix

fine-grained is deprecated 

> Support external shuffle service in fine-grained mode on mesos cluster
> --
>
> Key: SPARK-5197
> URL: https://issues.apache.org/jira/browse/SPARK-5197
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Mesos, Shuffle
>Reporter: Jongyoul Lee
>
> I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
> which already offers resources dynamically, and returns automatically when a 
> task is finished. It, however, doesn't have a mechanism on support external 
> shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
> doesn't support AusiliaryService, we think a different way to do this.
> - Launching a shuffle service like a spark job on same cluster
> -- Pros
> --- Support multi-tenant environment
> --- Almost same way like yarn
> -- Cons
> --- Control long running 'background' job - service - when mesos runs
> --- Satisfy all slave - or host - to have one shuffle service all the time
> - Launching jobs within shuffle service
> -- Pros
> --- Easy to implement
> --- Don't consider whether shuffle service exists or not.
> -- Cons
> --- exists multiple shuffle services under multi-tenant environment
> --- Control shuffle service port dynamically on multi-user environment
> In my opinion, the first one is better idea to support external shuffle 
> service. Please leave comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17320) Spark Mesos module not building on PRs

2016-08-30 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-17320:
---

 Summary: Spark Mesos module not building on PRs
 Key: SPARK-17320
 URL: https://issues.apache.org/jira/browse/SPARK-17320
 Project: Spark
  Issue Type: Task
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Michael Gummelt
 Fix For: 2.0.1






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-16627) --jars doesn't work in Mesos mode

2016-08-25 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt closed SPARK-16627.
---
Resolution: Won't Fix

> --jars doesn't work in Mesos mode
> -
>
> Key: SPARK-16627
> URL: https://issues.apache.org/jira/browse/SPARK-16627
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Reporter: Michael Gummelt
>
> Definitely doesn't work in cluster mode.  Might not work in client mode 
> either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17240) SparkConf is Serializable but contains a non-serializable field

2016-08-25 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437436#comment-15437436
 ] 

Michael Gummelt commented on SPARK-17240:
-

cc [~vanzin]

> SparkConf is Serializable but contains a non-serializable field
> ---
>
> Key: SPARK-17240
> URL: https://issues.apache.org/jira/browse/SPARK-17240
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1
>Reporter: Michael Gummelt
> Fix For: 2.0.1
>
>
> This commit: 
> https://github.com/apache/spark/commit/5da6c4b24f512b63cd4e6ba7dd8968066a9396f5
> Added ConfigReader to SparkConf.  SparkConf is Serializable, but ConfigReader 
> is not, which results in the following exception:
> {code}
> java.io.NotSerializableException: 
> org.apache.spark.internal.config.ConfigReader
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>   at org.apache.spark.util.Utils$.serialize(Utils.scala:134)
>   at 
> org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:111)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:170)
>   at 
> org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:126)
>   at 
> org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:265)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.spark_project.jetty.server.Server.handle(Server.java:499)
>   at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>   at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17240) SparkConf is Serializable but contains a non-serializable field

2016-08-25 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-17240:
---

 Summary: SparkConf is Serializable but contains a non-serializable 
field
 Key: SPARK-17240
 URL: https://issues.apache.org/jira/browse/SPARK-17240
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.1
Reporter: Michael Gummelt
 Fix For: 2.0.1


This commit: 
https://github.com/apache/spark/commit/5da6c4b24f512b63cd4e6ba7dd8968066a9396f5

Added ConfigReader to SparkConf.  SparkConf is Serializable, but ConfigReader 
is not, which results in the following exception:

{code}
java.io.NotSerializableException: org.apache.spark.internal.config.ConfigReader
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.spark.util.Utils$.serialize(Utils.scala:134)
at 
org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:111)
at 
org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:170)
at 
org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:126)
at 
org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:265)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587)
at 
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at 
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at 
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at 
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.spark_project.jetty.server.Server.handle(Server.java:499)
at 
org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311)
at 
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at 
org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10401) spark-submit --unsupervise

2016-08-19 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15428572#comment-15428572
 ] 

Michael Gummelt commented on SPARK-10401:
-

This should probably be a separate JIRA, but I'm just adding a note here that 
{{--kill}} doesn't seem to kill the job immediately.  It invokes Mesos' 
{{killTask}} function, which runs a {{docker stop}} for docker images.  This 
sends a SIGTERM, which seems to be ignored, then sends a SIGKILL after 10s, 
which ultimately kills the job.  I'd like to find out why the SIGTERM is 
ignored.

> spark-submit --unsupervise 
> ---
>
> Key: SPARK-10401
> URL: https://issues.apache.org/jira/browse/SPARK-10401
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, Mesos
>Affects Versions: 1.5.0
>Reporter: Alberto Miorin
>
> When I submit a streaming job with the option --supervise to the new mesos 
> spark dispatcher, I cannot decommission the job.
> I tried spark-submit --kill, but dispatcher always restarts it.
> Driver and Executors are both Docker containers.
> I think there should be a subcommand spark-submit --unsupervise 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17067) Revocable resource support

2016-08-15 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421698#comment-15421698
 ] 

Michael Gummelt commented on SPARK-17067:
-

Add revocable resource support: 
http://mesos.apache.org/documentation/latest/oversubscription/

This will allow higher priority jobs (or other, non-Spark services) to preempty 
lower priority jobs

> Revocable resource support
> --
>
> Key: SPARK-17067
> URL: https://issues.apache.org/jira/browse/SPARK-17067
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Michael Gummelt
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17067) Revocable resource support

2016-08-15 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-17067:
---

 Summary: Revocable resource support
 Key: SPARK-17067
 URL: https://issues.apache.org/jira/browse/SPARK-17067
 Project: Spark
  Issue Type: Improvement
  Components: Mesos
Reporter: Michael Gummelt






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417857#comment-15417857
 ] 

Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:11 PM:
--

{{log4j.debug=true}} only results in log4j printing its internal debugging 
messages (e.g. config file location, appenders, etc.).  It doesn't turn on 
debug logging for the application.


was (Author: mgummelt):
{{log4j.debug=true}} only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417856#comment-15417856
 ] 

Michael Gummelt commented on SPARK-16784:
-

`log4j.debug=true` only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt reopened SPARK-16784:
-

`log4j.debug=true` only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417857#comment-15417857
 ] 

Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:10 PM:
--

{{log4j.debug=true}} only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.


was (Author: mgummelt):
`log4j.debug=true` only results in log4j printing its debugging messages.  It 
doesn't turn on debug logging for the application.

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-16784) Configurable log4j settings

2016-08-11 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16784:

Comment: was deleted

(was: `log4j.debug=true` only results in log4j printing its debugging messages. 
 It doesn't turn on debug logging for the application.)

> Configurable log4j settings
> ---
>
> Key: SPARK-16784
> URL: https://issues.apache.org/jira/browse/SPARK-16784
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Michael Gummelt
>
> I often want to change the logging configuration on a single spark job.  This 
> is easy in client mode.  I just modify log4j.properties.  It's difficult in 
> cluster mode, because I need to modify the log4j.properties in the 
> distribution in which the driver runs.  I'd like a way of setting this 
> dynamically, such as a java system property.  Some brief searching showed 
> that log4j doesn't seem to accept such a property, but I'd like to open up 
> this idea for further comment.  Maybe we can find a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL

2016-08-10 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-17002:
---

 Summary: Document that spark.ssl.protocol. is required for SSL
 Key: SPARK-17002
 URL: https://issues.apache.org/jira/browse/SPARK-17002
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.0.0, 1.6.2
Reporter: Michael Gummelt


cc [~jlewandowski]

I was trying to start the Spark master.  When setting 
{{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get 
this none-too-helpful error message:

{code}
16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(mgummelt); users 
with modify permissions: Set(mgummelt)
16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for 
SSL connections.
Exception in thread "main" java.security.KeyManagementException: Default 
SSLContext is initialized automatically
at 
sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749)
at javax.net.ssl.SSLContext.init(SSLContext.java:282)
at org.apache.spark.SecurityManager.(SecurityManager.scala:284)
at 
org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121)
at org.apache.spark.deploy.master.Master$.main(Master.scala:1106)
at org.apache.spark.deploy.master.Master.main(Master.scala)
{code}

We should document that {{spark.ssl.protocol}} is required, and throw a more 
helpful error message when it isn't set.  In fact, we should remove the 
`getOrElse` here: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285,
 since the following line fails when the protocol is set to "Default"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-08-09 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt reopened SPARK-16522:
-

Reopening so we can track this until it's merged into the 2.0 branch.

Also changed the fix version to 2.0.1


> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>Assignee: Sun Rui
> Fix For: 2.0.1
>
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at

[jira] [Updated] (SPARK-16522) [MESOS] Spark application throws exception on exit

2016-08-09 Thread Michael Gummelt (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gummelt updated SPARK-16522:

Fix Version/s: (was: 2.1.0)
   2.0.1

> [MESOS] Spark application throws exception on exit
> --
>
> Key: SPARK-16522
> URL: https://issues.apache.org/jira/browse/SPARK-16522
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>Assignee: Sun Rui
> Fix For: 2.0.1
>
>
> Spark applications running on Mesos throw exception upon exit as follows:
> {noformat}
> 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
> org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   ... 4 more
> Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
> notifying standalone scheduler's driver endpoint
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
>   at 
> org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
> Caused by: org.apache.spark.SparkException: Error sending message [message = 
> RemoveExecutor(1,Executor finished with state FINISHED)]
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
>   ... 2 more
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
>   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
>   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>   ... 4 more
> Caused by: org.apache.spark.SparkException: Could not find 
> CoarseGrainedScheduler.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
>

[jira] [Commented] (SPARK-16967) Collect Mesos support code into a module/profile

2016-08-09 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414004#comment-15414004
 ] 

Michael Gummelt commented on SPARK-16967:
-

Will do

> Collect Mesos support code into a module/profile
> 
>
> Key: SPARK-16967
> URL: https://issues.apache.org/jira/browse/SPARK-16967
> Project: Spark
>  Issue Type: Task
>  Components: Mesos, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sean Owen
>Priority: Critical
>
> CC [~mgummelt] [~tnachen] [~skonto] 
> I think this is fairly easy and would be beneficial as more work goes into 
> Mesos. It should separate into a module like YARN does, just on principle 
> really, but because it also means anyone that doesn't need Mesos support can 
> build without it.
> I'm entirely willing to take a shot at this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos

2016-08-09 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15413999#comment-15413999
 ] 

Michael Gummelt commented on SPARK-12909:
-

I agree.  I just spoke with Reynold about this.  I'll create the module before 
the next big feature.

> Spark on Mesos accessing Secured HDFS w/Kerberos
> 
>
> Key: SPARK-12909
> URL: https://issues.apache.org/jira/browse/SPARK-12909
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Greg Senia
>
> Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It 
> seems like this is not possible based on email chains and forum articles? If 
> these are true how hard would it be to get this implemented I'm willing to 
> try to help.
> https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html
> https://www.mail-archive.com/user@spark.apache.org/msg31326.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos

2016-08-08 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412624#comment-15412624
 ] 

Michael Gummelt commented on SPARK-12909:
-

DC/OS Spark has this functionality, and we'll be upstreaming it to Apache Spark 
soon.

> Spark on Mesos accessing Secured HDFS w/Kerberos
> 
>
> Key: SPARK-12909
> URL: https://issues.apache.org/jira/browse/SPARK-12909
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Greg Senia
>
> Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It 
> seems like this is not possible based on email chains and forum articles? If 
> these are true how hard would it be to get this implemented I'm willing to 
> try to help.
> https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html
> https://www.mail-archive.com/user@spark.apache.org/msg31326.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking

2016-08-08 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15412461#comment-15412461
 ] 

Michael Gummelt commented on SPARK-11638:
-

[~radekg]

> The only advantage we had was using the same configuration inside of the 
> docker container.

You mean you want to run the spark driver in a docker container?  Which 
configuration did you have to change?  I can look more into this, but I need a 
clear "It's easier/better to do X in bridge mode than in host mode".

> So with the HTTP API, Spark would still require the heavy libmesos in order 
> to work with Mesos?

No.  The HTTP API will remove the libmesos dependency, which is nice.  It's not 
an urgent priority though. 

> Run Spark on Mesos with bridge networking
> -
>
> Key: SPARK-11638
> URL: https://issues.apache.org/jira/browse/SPARK-11638
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Radoslaw Gruchalski
> Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, 
> 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch
>
>
> h4. Summary
> Provides {{spark.driver.advertisedPort}}, 
> {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and 
> {{spark.replClassServer.advertisedPort}} settings to enable running Spark in 
> Mesos on Docker with Bridge networking. Provides patches for Akka Remote to 
> enable Spark driver advertisement using alternative host and port.
> With these settings, it is possible to run Spark Master in a Docker container 
> and have the executors running on Mesos talk back correctly to such Master.
> The problem is discussed on the Mesos mailing list here: 
> https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E
> h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door
> In order for the framework to receive orders in the bridged container, Mesos 
> in the container has to register for offers using the IP address of the 
> Agent. Offers are sent by Mesos Master to the Docker container running on a 
> different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} 
> would advertise itself using the IP address of the container, something like 
> {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a 
> different host, it's a different machine. Mesos 0.24.0 introduced two new 
> properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and 
> {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's 
> address to register for offers. This was provided mainly for running Mesos in 
> Docker on Mesos.
> h4. Spark - how does the above relate and what is being addressed here?
> Similar to Mesos, out of the box, Spark does not allow to advertise its 
> services on ports different than bind ports. Consider following scenario:
> Spark is running inside a Docker container on Mesos, it's a bridge networking 
> mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for 
> the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and 
> {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to 
> Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the 
> container ports. Starting the executors from such container results in 
> executors not being able to communicate back to the Spark Master.
> This happens because of 2 things:
> Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} 
> transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port 
> different to what it bound to. The settings discussed are here: 
> https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376.
>  These do not exist in Akka {{2.3.x}}. Spark driver will always advertise 
> port {{}} as this is the one {{akka-remote}} is bound to.
> Any URIs the executors contact the Spark Master on, are prepared by Spark 
> Master and handed over to executors. These always contain the port number 
> used by the Master to find the service on. The services are:
> - {{spark.broadcast.port}}
> - {{spark.fileserver.port}}
> - {{spark.replClassServer.port}}
> all above ports are by default {{0}} (random assignment) but can be specified 
> using Spark configuration ( {{-Dspark...port}} ). However, they are limited 
> in the same way as the {{spark.driver.port}}; in the above example, an 
> executor should not contact the file server on port {{6677}} but rather on 
> the respective 31xxx assigned by Mesos.
> Spark currently does not allow any of that.
> h4. Taking on the problem, step 1: Spark Driver
> As mentioned above, Spark

[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411315#comment-15411315
 ] 

Michael Gummelt commented on SPARK-16944:
-

Yea, we typically call it "delay scheduling".  It was first written about by 
the Spark/Mesos researchers:  
http://elmeleegy.com/khaled/papers/delay_scheduling.pdf

Spark already has `spark.locality.wait`, but that's how long the task scheduler 
will wait until an executor will come up with the preferred locality.  We need 
a similar concept for waiting for offers to come in so we can place the 
executor correctly in the first place.

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411309#comment-15411309
 ] 

Michael Gummelt commented on SPARK-16944:
-

Since Mesos is offer based, it's up to the Spark scheduler itself to choose 
which offers have the best locality.  In YARN, I think they tell the resource 
manager about preferences.


> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled

2016-08-07 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411307#comment-15411307
 ] 

Michael Gummelt commented on SPARK-16944:
-

I think we can improve both with and without dynamic allocation.  In both 
modes, Mesos is only looking at locality after it's already placed the 
executors. 

> [MESOS] Improve data locality when launching new executors when dynamic 
> allocation is enabled
> -
>
> Key: SPARK-16944
> URL: https://issues.apache.org/jira/browse/SPARK-16944
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Sun Rui
>
> Currently Spark on Yarn supports better data locality by considering the 
> preferred locations of the pending tasks when dynamic allocation is enabled. 
> Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better 
> that Mesos can also support this feature.
> I guess that some logic existing in Yarn could be reused by Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking

2016-08-07 Thread Michael Gummelt (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411277#comment-15411277
 ] 

Michael Gummelt commented on SPARK-11638:
-

This JIRA is complex and a lot of it is out of date.  Can someone briefly 
explain to me what the problem is?  Why do you want bridge networking?



> Run Spark on Mesos with bridge networking
> -
>
> Key: SPARK-11638
> URL: https://issues.apache.org/jira/browse/SPARK-11638
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0
>Reporter: Radoslaw Gruchalski
> Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, 
> 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch
>
>
> h4. Summary
> Provides {{spark.driver.advertisedPort}}, 
> {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and 
> {{spark.replClassServer.advertisedPort}} settings to enable running Spark in 
> Mesos on Docker with Bridge networking. Provides patches for Akka Remote to 
> enable Spark driver advertisement using alternative host and port.
> With these settings, it is possible to run Spark Master in a Docker container 
> and have the executors running on Mesos talk back correctly to such Master.
> The problem is discussed on the Mesos mailing list here: 
> https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E
> h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door
> In order for the framework to receive orders in the bridged container, Mesos 
> in the container has to register for offers using the IP address of the 
> Agent. Offers are sent by Mesos Master to the Docker container running on a 
> different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} 
> would advertise itself using the IP address of the container, something like 
> {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a 
> different host, it's a different machine. Mesos 0.24.0 introduced two new 
> properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and 
> {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's 
> address to register for offers. This was provided mainly for running Mesos in 
> Docker on Mesos.
> h4. Spark - how does the above relate and what is being addressed here?
> Similar to Mesos, out of the box, Spark does not allow to advertise its 
> services on ports different than bind ports. Consider following scenario:
> Spark is running inside a Docker container on Mesos, it's a bridge networking 
> mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for 
> the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and 
> {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to 
> Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the 
> container ports. Starting the executors from such container results in 
> executors not being able to communicate back to the Spark Master.
> This happens because of 2 things:
> Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} 
> transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port 
> different to what it bound to. The settings discussed are here: 
> https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376.
>  These do not exist in Akka {{2.3.x}}. Spark driver will always advertise 
> port {{}} as this is the one {{akka-remote}} is bound to.
> Any URIs the executors contact the Spark Master on, are prepared by Spark 
> Master and handed over to executors. These always contain the port number 
> used by the Master to find the service on. The services are:
> - {{spark.broadcast.port}}
> - {{spark.fileserver.port}}
> - {{spark.replClassServer.port}}
> all above ports are by default {{0}} (random assignment) but can be specified 
> using Spark configuration ( {{-Dspark...port}} ). However, they are limited 
> in the same way as the {{spark.driver.port}}; in the above example, an 
> executor should not contact the file server on port {{6677}} but rather on 
> the respective 31xxx assigned by Mesos.
> Spark currently does not allow any of that.
> h4. Taking on the problem, step 1: Spark Driver
> As mentioned above, Spark Driver is based on {{akka-remote}}. In order to 
> take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and 
> {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile 
> with Akka 2.4.x yet.
> What we want is the back port of mentioned {{akka-remote}} settings to 
> {{2.3.x}} versions. These patches are attached to this ticket - 
> {{2.3.4.patch}} and

[jira] [Created] (SPARK-16927) Mesos Cluster Dispatcher default properties

2016-08-05 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-16927:
---

 Summary: Mesos Cluster Dispatcher default properties
 Key: SPARK-16927
 URL: https://issues.apache.org/jira/browse/SPARK-16927
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Affects Versions: 2.0.0
Reporter: Michael Gummelt


Add the capability to set default driver properties for all jobs submitted 
through the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 179 matches

Mail list logo