[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars
[ https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721258#comment-16721258 ] Steve Loughran commented on SPARK-19739: BTW: I'm not backporting HADOOP-14556 to any non-trunk version of Hadoop as its fairly dramatic. If you want to add a patch to all of Hadoop-2.8 to 3.2 to add the temp creds at the start of the list, I'll review and inevitably commit. I don't think there was any deliberate intention not to add it as an option, (unlike the special "no credentials" switch, or the assumed role one). Do read the [S3 test process doc|https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/testing.html#Policy_for_submitting_patches_which_affect_the_hadoop-aws_module.] before submitting though: you're required to run the whole s3a test suite and tell us what endpoint you ran against. Talk to some colleagues to get set up here, if you haven't done it before. > SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of > AWS env vars > -- > > Key: SPARK-19739 > URL: https://issues.apache.org/jira/browse/SPARK-19739 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Steve Loughran >Assignee: Genmao Yu >Priority: Minor > Fix For: 2.2.0 > > > {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS > user and secret key to s3n and s3a config options, so getting secrets from > the user to the cluster, if set. > AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and > region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you > want to address V4-auth-only endpoints like frankfurt and Seol. > These env vars should be picked up and passed down to S3a too. 4+ lines of > code, though impossible to test unless the existing code is refactored to > take the env var map[String, String], so allowing a test suite to set the > values in itds own map. > side issue: what if only half the env vars are set and users are trying to > understand why auth is failing? It may be good to build up a string > identifying which env vars had their value propagate, and log that @ debug, > while not logging the values, obviously. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars
[ https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720664#comment-16720664 ] Imran Rashid commented on SPARK-19739: -- ok, sounds good, and thanks for the quick response! sounds like a nice improvement :) > SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of > AWS env vars > -- > > Key: SPARK-19739 > URL: https://issues.apache.org/jira/browse/SPARK-19739 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Steve Loughran >Assignee: Genmao Yu >Priority: Minor > Fix For: 2.2.0 > > > {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS > user and secret key to s3n and s3a config options, so getting secrets from > the user to the cluster, if set. > AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and > region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you > want to address V4-auth-only endpoints like frankfurt and Seol. > These env vars should be picked up and passed down to S3a too. 4+ lines of > code, though impossible to test unless the existing code is refactored to > take the env var map[String, String], so allowing a test suite to set the > values in itds own map. > side issue: what if only half the env vars are set and users are trying to > understand why auth is failing? It may be good to build up a string > identifying which env vars had their value propagate, and log that @ debug, > while not logging the values, obviously. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars
[ https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720656#comment-16720656 ] Steve Loughran commented on SPARK-19739: no, leave alone. * In HADOOP-14556 I'm actually adding that temp one as ahead of the normal creds in the list, so it will be lifted by default. * That patch actually adds something way more profound: the ability to create Delegation Tokens off S3A endpoints, which will be automatic session credentials, or, with a bit more config, shorter lived role credentials with restricted access to only the resources you need (specific s3a, any matching ddb table). With that, you don't need to propagate S3 credentials at all, so your secrets stay on your local system. Yes, spark works with this. No, can't do a demo right now, but I'll stick a video up as soon as I can make one (next week) > SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of > AWS env vars > -- > > Key: SPARK-19739 > URL: https://issues.apache.org/jira/browse/SPARK-19739 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Steve Loughran >Assignee: Genmao Yu >Priority: Minor > Fix For: 2.2.0 > > > {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS > user and secret key to s3n and s3a config options, so getting secrets from > the user to the cluster, if set. > AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and > region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you > want to address V4-auth-only endpoints like frankfurt and Seol. > These env vars should be picked up and passed down to S3a too. 4+ lines of > code, though impossible to test unless the existing code is refactored to > take the env var map[String, String], so allowing a test suite to set the > values in itds own map. > side issue: what if only half the env vars are set and users are trying to > understand why auth is failing? It may be good to build up a string > identifying which env vars had their value propagate, and log that @ debug, > while not logging the values, obviously. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars
[ https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720650#comment-16720650 ] Imran Rashid commented on SPARK-19739: -- [~ste...@apache.org] I didn't realize when using this at first that I also needed to add the conf {{--conf "spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider"}} to have {{AWS_SESSION_TOKEN}} take any effect. You don't get any useful error msg when that happens -- just access forbidden. Do you think its useful to do that automatically as well when {{AWS_SESSION_TOKEN}} is set? > SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of > AWS env vars > -- > > Key: SPARK-19739 > URL: https://issues.apache.org/jira/browse/SPARK-19739 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Steve Loughran >Assignee: Genmao Yu >Priority: Minor > Fix For: 2.2.0 > > > {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS > user and secret key to s3n and s3a config options, so getting secrets from > the user to the cluster, if set. > AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and > region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you > want to address V4-auth-only endpoints like frankfurt and Seol. > These env vars should be picked up and passed down to S3a too. 4+ lines of > code, though impossible to test unless the existing code is refactored to > take the env var map[String, String], so allowing a test suite to set the > values in itds own map. > side issue: what if only half the env vars are set and users are trying to > understand why auth is failing? It may be good to build up a string > identifying which env vars had their value propagate, and log that @ debug, > while not logging the values, obviously. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars
[ https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885336#comment-15885336 ] Apache Spark commented on SPARK-19739: -- User 'uncleGen' has created a pull request for this issue: https://github.com/apache/spark/pull/17080 > SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of > AWS env vars > -- > > Key: SPARK-19739 > URL: https://issues.apache.org/jira/browse/SPARK-19739 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Steve Loughran >Priority: Minor > > {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS > user and secret key to s3n and s3a config options, so getting secrets from > the user to the cluster, if set. > AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and > region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you > want to address V4-auth-only endpoints like frankfurt and Seol. > These env vars should be picked up and passed down to S3a too. 4+ lines of > code, though impossible to test unless the existing code is refactored to > take the env var map[String, String], so allowing a test suite to set the > values in itds own map. > side issue: what if only half the env vars are set and users are trying to > understand why auth is failing? It may be good to build up a string > identifying which env vars had their value propagate, and log that @ debug, > while not logging the values, obviously. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org