[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-14 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 There is a spot in HadoopFSCredentialProvider where it looks for a Hadoop config key related to yarn to set the token renewer. In getTokenRenewer it calls Master.getMasterPrincipal

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 BTW not trying to give you the hard sell and appreciate the help rounding out the requirements from the core committers' POV. --- If your project is set up for it, you can reply

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 That would work for cluster mode but in client mode the driver on the submitting nodes still needs the keytab unfortunately. Standalone clusters are best viewed as distributed

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 Said another way people need another layer to use spark standalone in secured environments anyway. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 To me it's basically the same as users including S3 credentials when submitting to spark standalone. Kerberos just requires more machinery. It might be a little harder to get at the spark

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 That's right, but you still need a separate out of band process refreshing with the KDC. My thinking is why not have spark do that on your behalf? --- If your project is set up for it, you

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 In our setup each user gets their own standalone cluster. Users cannot submit jobs to each other's clusters. By providing a keytab on cluster creation and having Spark manage renewal

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 Hi @vanzin, spark standalone isn't really multi user in any sense since the executors for all jobs run as whatever user the worker daemon was started as. That shouldn't preclude standalone

[GitHub] spark pull request #17530: [SPARK-5158] Access kerberized HDFS from Spark st...

2017-04-04 Thread themodernlife
GitHub user themodernlife opened a pull request: https://github.com/apache/spark/pull/17530 [SPARK-5158] Access kerberized HDFS from Spark standalone ## What changes were proposed in this pull request? - Refactor `ConfigurableCredentialManager` and related

[GitHub] spark issue #16563: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2017-01-13 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/16563 Ok, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16563: [SPARK-17568][CORE][DEPLOY] Add spark-submit opti...

2017-01-13 Thread themodernlife
Github user themodernlife closed the pull request at: https://github.com/apache/spark/pull/16563 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16563: [SPARK-17568][CORE][DEPLOY] Add spark-submit opti...

2017-01-12 Thread themodernlife
GitHub user themodernlife opened a pull request: https://github.com/apache/spark/pull/16563 [SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings used to resolve packages/artifacts Backports #15119 to the 2.1 branch. Is it possible to include this in Spark

[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2017-01-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/15119 @BryanCutler - #1 I think you're right... the naming of that key is unfortuante, `spark.jars.ivyUserDir` or something would have been better... it affects `defaultIvyUserDir

[GitHub] spark pull request #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit opti...

2016-12-16 Thread themodernlife
Github user themodernlife commented on a diff in the pull request: https://github.com/apache/spark/pull/15119#discussion_r92817182 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -291,8 +292,12 @@ object SparkSubmit { } else

[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2016-11-08 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/15119 FYI I tried this out in our environment - Firewall (no access to maven central) - Custom ivysettings.xml to point to our internal Artifactory Everything worked just as I'd

[GitHub] spark pull request: [SPARK-3595] Respect configured OutputCommitte...

2014-09-19 Thread themodernlife
Github user themodernlife commented on a diff in the pull request: https://github.com/apache/spark/pull/2450#discussion_r17808274 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -478,6 +482,15 @@ class PairRDDFunctionsSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3595] Respect configured OutputCommitte...

2014-09-18 Thread themodernlife
GitHub user themodernlife opened a pull request: https://github.com/apache/spark/pull/2450 [SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile Addresses the issue in https://issues.apache.org/jira/browse/SPARK-3595, namely saveAsHadoopFile hardcoding