Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-96850574
No longer in progress. Closing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not h
Github user mccheah closed the pull request at:
https://github.com/apache/spark/pull/4106
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is ena
Github user ash211 commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-96850330
@mccheah is this work still active? If not should maybe close this PR
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-76042475
Your views make sense! Thanks a lot =) the discussion was helpful and
clarified the pitfalls here. I have learned a lot.
I'm going to defer to @pwendell for the d
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-76038575
> Come to think of it, with my current approach, since the keytab is
specified in the driver's SparkConf, theoretically different Spark applications
can specify different
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-76037757
Come to think of it, with my current approach, since the keytab is
specified in the driver's SparkConf, theoretically different Spark applications
can specify different k
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-76033180
> if the client application wants to execute a job that reads and writes
from HDFS that has been secured with kerberos, they should be allowed to do so
if they have the k
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-76032123
The security model I want to support is: if the client application wants to
execute a job that reads and writes from HDFS that has been secured with
kerberos, they should
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-76028480
Hi @mccheah, just to clarify my comments, I'm thinking about someone
looking at this feature and thinking that "hey, Spark Standalone now supports
kerberos!", while that's
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75899526
[Test build #27931 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/consoleFull)
for PR 4106 at commit
[`626318d`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75899533
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r25313964
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
---
@@ -205,7 +208,7 @@ class SparkHadoopUtil extends Logging {
object
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75894662
Okay, I started from scratch from master and cherry-picked my changes over
- this PR was in a terrible state when I made an incorrect assumption when
merging.
Th
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75894650
[Test build #27931 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27931/consoleFull)
for PR 4106 at commit
[`626318d`](https://githu
Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r25313853
--- Diff:
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
---
@@ -125,39 +125,39 @@ private[spark] object CoarseGrainedExecu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75881742
[Test build #27915 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27915/consoleFull)
for PR 4106 at commit
[`c23254d`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75881745
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75878253
One other model that would make sense is to require the driver to log in
using the principal and the keytab specified in the configuration. If the
driver is able to authe
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75876540
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75876528
[Test build #27913 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27913/consoleFull)
for PR 4106 at commit
[`d18fbe7`](https://gith
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75871551
One way around this could be to require the driver to also have read-access
to the keytab. In this model, any user that wishes to run a Spark job must also
be able to aut
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75869488
And come to think of it - in Standalone mode the Spark driver also needs to
have the keytab file in the first place. In standalone mode, no matter what the
driver will ne
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75865634
I think this raises a core disconnect - Spark jobs themselves do not have
an authentication mechanism in place. It would be more secure if we had a way
to also authentica
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75864651
> The model I had in mind for this patch was to support dedicated
clusters/appliances based on Spark where the Spark cluster itself is fully
trusted and not multi-tenant.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75863314
[Test build #27915 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27915/consoleFull)
for PR 4106 at commit
[`c23254d`](https://githu
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75862787
Wow I really screwed this up, let me fix it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proje
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75862672
Sorry about the merge commit - I'm pretty sure I did it wrong, as I'm
fairly certain I wasn't supposed to create the storm of commits listed above.
What's the rig
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75861282
The model I had in mind for this patch was to support dedicated
clusters/appliances based on Spark where the Spark cluster itself is fully
trusted and not multi-tenant.
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75859364
> I'm confused as to how this case is different from anywhere else we use
Kerberos authentication to HDFS in Spark.
You're opening up the possibility that user cod
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75858920
I'm confused as to how this case is different from anywhere else we use
Kerberos authentication to HDFS in Spark. If HDFS is configured with Kerberos
authentication, natu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75857289
[Test build #27913 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27913/consoleFull)
for PR 4106 at commit
[`d18fbe7`](https://githu
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75857213
> Spark only needs to read the specific keytab for the HDFS Namenode.
That sounds even worse. Why would you run the Spark job with HDFS super
user privileges?
--
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75856584
Spark only needs to read the specific keytab for the HDFS Namenode. It does
not need to read any arbitrary keytab. I'm pushing a commit that makes this
explicit in the Sp
Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75854994
> (3) I'm confused here - Spark can't read a keytab if the permissions on
the keytab file deny access.
But in standalone mode all executors run as the "spark" use
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75852773
When @pwendell and I originally discussed the feature, we wanted to design
it to be simple and usable for small dedicated Spark clusters. We also
explicitly wanted to avo
Github user harishreedharan commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75848850
From what I can see, the approach is to login using the same principal and
keytab on all machines and login on every machine using that. This has a couple
of issu
Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r25227412
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -193,17 +193,21 @@ class HadoopRDD[K, V](
override def getPartitions: Array[Pa
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75325126
I get the same exception when I try to start the history server without
running kinit first. My settings are:
spark.history.kerberos.enabled true
spark.histor
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75323707
Ah, I mixed up the history server with the event log directory. Let me try
getting the history server up and see.
---
If your project is set up for it, you can reply to
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75323149
What does running spark-shell have to do with launching the history server
without doing kinit?
history server is a standalone process that simply reads from
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-75311934
Okay, so I actually ran the history server on Master with Kerberos
authentication, and *without* running kinit, I got the following error when
launching spark shell:
Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r24860380
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -193,17 +193,21 @@ class HadoopRDD[K, V](
override def getPartitions: Array[Pa
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-74762686
Able to come back to this now!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-72763704
Hey @mccheah - if you are too busy I think it's fine to let it slip past
1.3, given that there are still several unknowns.
---
If your project is set up for it, you can
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-72557914
@pwendell do we need this for Spark 1.3.0? Is the feature merge deadline
already past? I'm uncertain of what my bandwidth will be like but if it needs
to be sped up I can
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-72546706
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-72546696
[Test build #26533 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26533/consoleFull)
for PR 4106 at commit
[`5a7bd66`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-72534474
[Test build #26533 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26533/consoleFull)
for PR 4106 at commit
[`5a7bd66`](https://githu
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-72435710
Suggestions make sense. I'm currently on a business trip so it might be a
bit of time before I can get back to this.
---
If your project is set up for it, you can reply
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23816550
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Fo
Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23816373
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Fo
Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23326963
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Fou
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23326701
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software F
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-70891705
That¹s correct. Definitely a work-in-progress so if there¹s another
security
model you¹d recommend I¹m all ears!
-Matt Cheah
From: Tom Graves
Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23317321
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Fou
Github user mccheah commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23320905
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Fou
Github user tgravescs commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-70872378
So are you trying to add security such that spark cluster would run as one
superuser who would have to be configured as proxy user on the hadoop cluster
and then each j
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23313220
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software F
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23313017
--- Diff:
core/src/main/scala/org/apache/spark/deploy/StandaloneSparkHadoopUtil.scala ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software F
Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/4106#discussion_r23312771
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -193,17 +193,21 @@ class HadoopRDD[K, V](
override def getPartitions: Array[
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-70563821
[Test build #25768 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25768/consoleFull)
for PR 4106 at commit
[`5a7bd66`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-70563829
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-70553954
[Test build #25768 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25768/consoleFull)
for PR 4106 at commit
[`5a7bd66`](https://githu
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-70553855
One other caveat I forgot to mention, and the commit message should be
updated and this reflected in the docs: User proxying needs to be enabled.
Basically, the user prin
Github user mccheah commented on the pull request:
https://github.com/apache/spark/pull/4106#issuecomment-70553364
Suggestions to unit test are welcome. This should not be merged until it is
unit-tested.
---
If your project is set up for it, you can reply to this email and have your
GitHub user mccheah opened a pull request:
https://github.com/apache/spark/pull/4106
[SPARK-5158] [core] [security] Spark standalone mode can authenticate
against a Kerberos-secured Hadoop cluster
Previously, Kerberos secured Hadoop clusters could only be accessed by
Spark running
66 matches
Mail list logo