[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-07-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17530 gentle ping @themodernlife on ^. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enable

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-06-20 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/17530 Is this addressing similar issue with #17387 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-14 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17530 > Right now the PR doesn't set that, so it needs to be set under the user's HADOOP_CONF even though it had no real effect. That probably should be changed. Yep, same problem I'm seeing. Th

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-14 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 There is a spot in HadoopFSCredentialProvider where it looks for a Hadoop config key related to yarn to set the token renewer. In getTokenRenewer it calls Master.getMasterPrincipal(co

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-13 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17530 @themodernlife I'm trying to add Kerberos support for Mesos, and creating HadoopRDDs fail for me because YARN isn't configured: https://issues.apache.org/jira/browse/SPARK-20328 Did you ru

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 > That would work for cluster mode but in client mode the driver on the submitting nodes still needs the keytab unfortunately. You're setting up a special cluster for a single user. I'm prett

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 I'm just not sold on the idea that this is necessary in the first place. Personally I don't use standalone nor do I play with it at all, so my concerns are purely from a security standpoint. As in, i

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 BTW not trying to give you the hard sell and appreciate the help rounding out the requirements from the core committers' POV. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 That would work for cluster mode but in client mode the driver on the submitting nodes still needs the keytab unfortunately. Standalone clusters are best viewed as distributed single

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 And BTW, if you really want to pursue this, please write a detailed spec explaining everything that is being done, and describe all the security issues people need to be aware of. It might even be pr

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 I'm sorry but you won't convince me that it's a useful feature to have Spark be a big security hole when inserted into a kerberos environment. As I said, if you want to change your approach t

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 Said another way people need another layer to use spark standalone in secured environments anyway. --- If your project is set up for it, you can reply to this email and have your reply appea

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 To me it's basically the same as users including S3 credentials when submitting to spark standalone. Kerberos just requires more machinery. It might be a little harder to get at the spark conf

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 That is not what you change does, though. If you want to change the master / worker scripts to refresh kerberos credentials, that would be a lot more acceptable. This change is just not acce

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 That's right, but you still need a separate out of band process refreshing with the KDC. My thinking is why not have spark do that on your behalf? --- If your project is set up for it, you ca

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 Then in your setup you can configure things so that the cluster already has the user's keytab; Spark doesn't need to distribute it for you. --- If your project is set up for it, you can reply to thi

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 In our setup each user gets their own standalone cluster. Users cannot submit jobs to each other's clusters. By providing a keytab on cluster creation and having Spark manage renewal on behalf

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 > That shouldn't preclude standalone clusters from communicating with secured resources. Of course it should. You're inserting a service into your cluster that allows people to steal each ot

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 Hi @vanzin, spark standalone isn't really multi user in any sense since the executors for all jobs run as whatever user the worker daemon was started as. That shouldn't preclude standalone clu

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-04 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 How does this patch handle the security issues raised in other attempts at such feature, such as #4106? Specifically the following comment, if you don't want to go through all the discussion: https:

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17530 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat