[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars

2018-12-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721258#comment-16721258
 ] 

Steve Loughran commented on SPARK-19739:


BTW: I'm not backporting HADOOP-14556 to any non-trunk version of Hadoop as its 
fairly dramatic. 

If you want to add a patch to all of Hadoop-2.8 to 3.2 to add the temp creds at 
the start of the list, I'll review and inevitably commit. I don't think there 
was any deliberate intention not to add it as an option, (unlike the special 
"no credentials" switch, or the assumed role one). 
 Do read the [S3 test process 
doc|https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/testing.html#Policy_for_submitting_patches_which_affect_the_hadoop-aws_module.]
 before submitting though: you're required to run the whole s3a test suite and 
tell us what endpoint you ran against. Talk to some colleagues to get set up 
here, if you haven't done it before.

> SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of 
> AWS env vars
> --
>
> Key: SPARK-19739
> URL: https://issues.apache.org/jira/browse/SPARK-19739
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Steve Loughran
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 2.2.0
>
>
> {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS 
> user and secret key to s3n and s3a config options, so getting secrets from 
> the user to the cluster, if set.
> AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and 
> region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you 
> want to address V4-auth-only endpoints like frankfurt and Seol. 
> These env vars should be picked up and passed down to S3a too. 4+ lines of 
> code, though impossible to test unless the existing code is refactored to 
> take the env var map[String, String], so allowing a test suite to set the 
> values in itds own map.
> side issue: what if only half the env vars are set and users are trying to 
> understand why auth is failing? It may be good to build up a string 
> identifying which env vars had their value propagate, and log that @ debug, 
> while not logging the values, obviously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars

2018-12-13 Thread Imran Rashid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720664#comment-16720664
 ] 

Imran Rashid commented on SPARK-19739:
--

ok, sounds good, and thanks for the quick response!  sounds like a nice 
improvement :)

> SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of 
> AWS env vars
> --
>
> Key: SPARK-19739
> URL: https://issues.apache.org/jira/browse/SPARK-19739
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Steve Loughran
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 2.2.0
>
>
> {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS 
> user and secret key to s3n and s3a config options, so getting secrets from 
> the user to the cluster, if set.
> AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and 
> region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you 
> want to address V4-auth-only endpoints like frankfurt and Seol. 
> These env vars should be picked up and passed down to S3a too. 4+ lines of 
> code, though impossible to test unless the existing code is refactored to 
> take the env var map[String, String], so allowing a test suite to set the 
> values in itds own map.
> side issue: what if only half the env vars are set and users are trying to 
> understand why auth is failing? It may be good to build up a string 
> identifying which env vars had their value propagate, and log that @ debug, 
> while not logging the values, obviously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars

2018-12-13 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720656#comment-16720656
 ] 

Steve Loughran commented on SPARK-19739:


no, leave alone.

* In HADOOP-14556 I'm actually adding that temp one as ahead of the normal 
creds in the list, so it will be lifted by default.
* That patch actually adds something way more profound: the ability to create 
Delegation Tokens off  S3A endpoints, which will be automatic session 
credentials, or, with a bit more config, shorter lived role credentials with 
restricted access to only the resources you need (specific s3a, any matching 
ddb table). With that, you don't need to propagate S3 credentials at all, so 
your secrets stay on your local system.

Yes, spark works with this. No, can't do a demo right now, but I'll stick a 
video up as soon as I can make one (next week)

> SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of 
> AWS env vars
> --
>
> Key: SPARK-19739
> URL: https://issues.apache.org/jira/browse/SPARK-19739
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Steve Loughran
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 2.2.0
>
>
> {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS 
> user and secret key to s3n and s3a config options, so getting secrets from 
> the user to the cluster, if set.
> AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and 
> region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you 
> want to address V4-auth-only endpoints like frankfurt and Seol. 
> These env vars should be picked up and passed down to S3a too. 4+ lines of 
> code, though impossible to test unless the existing code is refactored to 
> take the env var map[String, String], so allowing a test suite to set the 
> values in itds own map.
> side issue: what if only half the env vars are set and users are trying to 
> understand why auth is failing? It may be good to build up a string 
> identifying which env vars had their value propagate, and log that @ debug, 
> while not logging the values, obviously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars

2018-12-13 Thread Imran Rashid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720650#comment-16720650
 ] 

Imran Rashid commented on SPARK-19739:
--

[~ste...@apache.org] I didn't realize when using this at first that I also 
needed to add the conf {{--conf 
"spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider"}}
 to have {{AWS_SESSION_TOKEN}} take any effect.  You don't get any useful error 
msg when that happens -- just access forbidden.  Do you think its useful to do 
that automatically as well when {{AWS_SESSION_TOKEN}} is set?

> SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of 
> AWS env vars
> --
>
> Key: SPARK-19739
> URL: https://issues.apache.org/jira/browse/SPARK-19739
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Steve Loughran
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 2.2.0
>
>
> {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS 
> user and secret key to s3n and s3a config options, so getting secrets from 
> the user to the cluster, if set.
> AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and 
> region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you 
> want to address V4-auth-only endpoints like frankfurt and Seol. 
> These env vars should be picked up and passed down to S3a too. 4+ lines of 
> code, though impossible to test unless the existing code is refactored to 
> take the env var map[String, String], so allowing a test suite to set the 
> values in itds own map.
> side issue: what if only half the env vars are set and users are trying to 
> understand why auth is failing? It may be good to build up a string 
> identifying which env vars had their value propagate, and log that @ debug, 
> while not logging the values, obviously.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19739) SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of AWS env vars

2017-02-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885336#comment-15885336
 ] 

Apache Spark commented on SPARK-19739:
--

User 'uncleGen' has created a pull request for this issue:
https://github.com/apache/spark/pull/17080

> SparkHadoopUtil.appendS3AndSparkHadoopConfigurations to propagate full set of 
> AWS env vars
> --
>
> Key: SPARK-19739
> URL: https://issues.apache.org/jira/browse/SPARK-19739
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Steve Loughran
>Priority: Minor
>
> {{SparkHadoopUtil.appendS3AndSparkHadoopConfigurations()}} propagates the AWS 
> user and secret key to s3n and s3a config options, so getting secrets from 
> the user to the cluster, if set.
> AWS also supports session authentication (env var {{AWS_SESSION_TOKEN}}) and 
> region endpoints {{AWS_DEFAULT_REGION}}, the latter being critical if you 
> want to address V4-auth-only endpoints like frankfurt and Seol. 
> These env vars should be picked up and passed down to S3a too. 4+ lines of 
> code, though impossible to test unless the existing code is refactored to 
> take the env var map[String, String], so allowing a test suite to set the 
> values in itds own map.
> side issue: what if only half the env vars are set and users are trying to 
> understand why auth is failing? It may be good to build up a string 
> identifying which env vars had their value propagate, and log that @ debug, 
> while not logging the values, obviously.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org