[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-04-27 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-96850330 @mccheah is this work still active? If not should maybe close this PR --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-04-27 Thread mccheah
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/4106 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-04-27 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-96850574 No longer in progress. Closing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-25 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-76042475 Your views make sense! Thanks a lot =) the discussion was helpful and clarified the pitfalls here. I have learned a lot. I'm going to defer to @pwendell for the

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-25 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-76032123 The security model I want to support is: if the client application wants to execute a job that reads and writes from HDFS that has been secured with kerberos, they

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-25 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-76033180 if the client application wants to execute a job that reads and writes from HDFS that has been secured with kerberos, they should be allowed to do so if they have the

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-25 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-76028480 Hi @mccheah, just to clarify my comments, I'm thinking about someone looking at this feature and thinking that hey, Spark Standalone now supports kerberos!, while that's

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-25 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-76037757 Come to think of it, with my current approach, since the keytab is specified in the driver's SparkConf, theoretically different Spark applications can specify different

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-25 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-76038575 Come to think of it, with my current approach, since the keytab is specified in the driver's SparkConf, theoretically different Spark applications can specify different

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread harishreedharan
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75848850 From what I can see, the approach is to login using the same principal and keytab on all machines and login on every machine using that. This has a couple of

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75852773 When @pwendell and I originally discussed the feature, we wanted to design it to be simple and usable for small dedicated Spark clusters. We also explicitly wanted to

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r25313853 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -125,39 +125,39 @@ private[spark] object

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75894650 [Test build #27931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75863314 [Test build #27915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27915/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75894662 Okay, I started from scratch from master and cherry-picked my changes over - this PR was in a terrible state when I made an incorrect assumption when merging.

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r25313964 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -205,7 +208,7 @@ class SparkHadoopUtil extends Logging { object

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75856584 Spark only needs to read the specific keytab for the HDFS Namenode. It does not need to read any arbitrary keytab. I'm pushing a commit that makes this explicit in the

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75871551 One way around this could be to require the driver to also have read-access to the keytab. In this model, any user that wishes to run a Spark job must also be able to

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75876540 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75876528 [Test build #27913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27913/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75857289 [Test build #27913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27913/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75861282 The model I had in mind for this patch was to support dedicated clusters/appliances based on Spark where the Spark cluster itself is fully trusted and not multi-tenant.

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75862787 Wow I really screwed this up, let me fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75864651 The model I had in mind for this patch was to support dedicated clusters/appliances based on Spark where the Spark cluster itself is fully trusted and not multi-tenant.

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75881745 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75881742 [Test build #27915 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27915/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75859364 I'm confused as to how this case is different from anywhere else we use Kerberos authentication to HDFS in Spark. You're opening up the possibility that user

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75865634 I think this raises a core disconnect - Spark jobs themselves do not have an authentication mechanism in place. It would be more secure if we had a way to also

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75878253 One other model that would make sense is to require the driver to log in using the principal and the keytab specified in the configuration. If the driver is able to

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75857213 Spark only needs to read the specific keytab for the HDFS Namenode. That sounds even worse. Why would you run the Spark job with HDFS super user privileges?

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75862672 Sorry about the merge commit - I'm pretty sure I did it wrong, as I'm fairly certain I wasn't supposed to create the storm of commits listed above. What's the

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75869488 And come to think of it - in Standalone mode the Spark driver also needs to have the keytab file in the first place. In standalone mode, no matter what the driver will

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75899526 [Test build #27931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75899533 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread vanzin
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75854994 (3) I'm confused here - Spark can't read a keytab if the permissions on the keytab file deny access. But in standalone mode all executors run as the spark user,

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-24 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75858920 I'm confused as to how this case is different from anywhere else we use Kerberos authentication to HDFS in Spark. If HDFS is configured with Kerberos authentication,

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-23 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r25227412 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -193,17 +193,21 @@ class HadoopRDD[K, V]( override def getPartitions:

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75323707 Ah, I mixed up the history server with the event log directory. Let me try getting the history server up and see. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75325126 I get the same exception when I try to start the history server without running kinit first. My settings are: spark.history.kerberos.enabled true

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-20 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75323149 What does running spark-shell have to do with launching the history server without doing kinit? history server is a standalone process that simply reads from

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-20 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-75311934 Okay, so I actually ran the history server on Master with Kerberos authentication, and *without* running kinit, I got the following error when launching spark shell:

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-74762686 Able to come back to this now! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-17 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r24860380 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -193,17 +193,21 @@ class HadoopRDD[K, V]( override def getPartitions:

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-03 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72763704 Hey @mccheah - if you are too busy I think it's fine to let it slip past 1.3, given that there are still several unknowns. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72534474 [Test build #26533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26533/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-02 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72557914 @pwendell do we need this for Spark 1.3.0? Is the feature merge deadline already past? I'm uncertain of what my bandwidth will be like but if it needs to be sped up I

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-02 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72435710 Suggestions make sense. I'm currently on a business trip so it might be a bit of time before I can get back to this. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-02 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72546696 [Test build #26533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26533/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-72546706 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23816550 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-29 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23816373 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23326963 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70891705 That¹s correct. Definitely a work-in-progress so if there¹s another security model you¹d recommend I¹m all ears! -Matt Cheah From: Tom Graves

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23317321 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23326701 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread mccheah
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23320905 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23312771 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -193,17 +193,21 @@ class HadoopRDD[K, V]( override def getPartitions:

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23313017 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/4106#discussion_r23313220 --- Diff: core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala --- @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-21 Thread tgravescs
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70872378 So are you trying to add security such that spark cluster would run as one superuser who would have to be configured as proxy user on the hadoop cluster and then each

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70563821 [Test build #25768 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25768/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70563829 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/4106 [SPARK-5158] [core] [security] Spark standalone mode can authenticate against a Kerberos-secured Hadoop cluster Previously, Kerberos secured Hadoop clusters could only be accessed by Spark running

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70553364 Suggestions to unit test are welcome. This should not be merged until it is unit-tested. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70553954 [Test build #25768 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25768/consoleFull) for PR 4106 at commit

[GitHub] spark pull request: [SPARK-5158] [core] [security] Spark standalon...

2015-01-19 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4106#issuecomment-70553855 One other caveat I forgot to mention, and the commit message should be updated and this reflected in the docs: User proxying needs to be enabled. Basically, the user