[ 
https://issues.apache.org/jira/browse/SPARK-19405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz resolved SPARK-19405.
---------------------------------
       Resolution: Fixed
         Assignee: Adam Budde
    Fix Version/s: 2.2.0

Resolved with: https://github.com/apache/spark/pull/16744

> Add support to KinesisUtils for cross-account Kinesis reads via STS
> -------------------------------------------------------------------
>
>                 Key: SPARK-19405
>                 URL: https://issues.apache.org/jira/browse/SPARK-19405
>             Project: Spark
>          Issue Type: Improvement
>          Components: DStreams
>            Reporter: Adam Budde
>            Assignee: Adam Budde
>            Priority: Minor
>             Fix For: 2.2.0
>
>
> h1. Summary
> Enable KinesisReceiver to utilize STSAssumeRoleSessionCredentialsProvider 
> when setting up the Kinesis Client Library in order to enable secure 
> cross-account Kinesis stream reads managed by AWS Simple Token Service (STS)
> h1. Details
> Spark's KinesisReceiver implementation utilizes the Kinesis Client Library in 
> order to allow users to write Spark Streaming jobs that operate on Kinesis 
> data. The KCL uses a few AWS services under the hood in order to provide 
> checkpointed, load-balanced processing of the underlying data in a Kinesis 
> stream.  Running the KCL requires permissions to be set up for the following 
> AWS resources.
> * AWS Kinesis for reading stream data
> * AWS DynamoDB for storing KCL shared state in tables
> * AWS CloudWatch for logging KCL metrics
> The KinesisUtils.createStream() API allows users to authenticate to these 
> services either by specifying an explicit AWS access key/secret key 
> credential pair or by using the default credential provider chain. This 
> supports authorizing to the three AWS services using either an AWS keypair 
> (either provided explicitly or parsed from environment variables, etc.):
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/KeypairOnly.png!
> Or the IAM instance profile (when running on EC2):
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/InstanceProfileOnly.png!
> AWS users often need to access resources across separate accounts. This could 
> be done in order to consume data produced by another organization or from a 
> service running in another account for resource isolation purposes. AWS 
> Simple Token Service (STS) provides a secure way to authorize cross-account 
> resource access by using temporary sessions to assuming an IAM role in the 
> AWS account with the resources being accessed.
> The [IAM 
> documentation|http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html]
>  covers the specifics of how cross account IAM role assumption works in much 
> greater detail, but if an actor in account A wanted to read from a Kinesis 
> stream in account B the general steps required would look something like this:
> * An IAM role is added to account B with read permissions for the Kinesis 
> stream
> ** Trust policy is configured to allow account A to assume the role 
> * Actor in account A uses its own long-lived credentials to tell STS to 
> assume the role in account B
> * STS returns temporary credentials with permission to read from the stream 
> in account B
> Applied to KinesisReceiver and the KCL, we could use a keypair as our 
> long-lived credentials to authenticate to STS and assume an external role 
> with the necessary KCL permissions:
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSKeypair.png!
> Or the instance profile as long-lived credentials:
> !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSInstanceProfile.png!
> The STSAssumeRoleSessionCredentialsProvider implementation of the 
> AWSCredentialsProviderChain interface from the AWS SDK abstracts all of the 
> management of the temporary session credentials away from the user. 
> STSAssumeRoleSessionCredentialsProvider simply needs the ARN of the AWS role 
> to be assumed, a session name for STS labeling purposes, an optional session 
> external ID and long-lived credentials to use for authenticating with the STS 
> service itself.
> Supporting cross-account Kinesis access via STS requires supplying the 
> following additional configuration parameters:
> * ARN of IAM role to assume in external account
> * A name to apply to the STS session
> * (optional) An IAM external ID to validate the assumed role against
> The STSAssumeRoleSessionCredentialsProvider implementation of the 
> AWSCredentialsProvider interface takes these parameters as input and 
> abstracts away all of the lifecycle management for the temporary session 
> credentials. Ideally, users could simply supply an AWSCredentialsProvider 
> instance as an argument when creating the stream that would be distributed to 
> the executors for use when setting up the KCL. Since these classes aren't 
> serializable this will require an approach similar to 
> SerializableAWSCredentials where the config parameters are passed via a 
> serializable object and the correct AWSCredentialsProvider implementation is 
> created on the executor from the params.
> Following the current conventions, adding optional arguments will mean having 
> to double the number of overloaded implementations of 
> KinesisUtils.createStream(). For this reason, we can make stsAssumeRoleArn, 
> stsSessionName and stsExternalId each required parameters for STS 
> authentication (external id is ignored if none is specified in the trust 
> policy < link > of the assumed role).
> Here are the providers that should be used for authentication depending on 
> the combination of AWS parameters provided:
> ||Input params||Kinesis credentials||Long-lived credentials||
> |(none)|Use long-lived|DefaultAWSCredentialsProviderChain|
> |awsAccessKeyId, awsSecretKey|Use long-lived|AWSCredentialsProvider w/keypair|
> |stsRoleArn, stsSessionName, stsExternalId 
> (optional)|STSAssumeRoleSessionCredentialsProvider|DefaultAWSCredentialsProviderChain|
> |awsAccessKeyId, awsSecretKey, stsRoleArn, stsSessionName, stsExternalId 
> (optional)|STSAssumeRoleSessionCredentialsProvider|AWSCredentialsProvider 
> w/keypair|
> Since there's now a wide variety of combinations of optional parameters for 
> KinesisUtils, I think a builder pattern may provide a more manageable 
> interface for creating streams in both Scala and Java. This would also make 
> it feasible to specify specific AWS config params for DynamoDB and 
> CloudWatch, which is supported by the KCL. I may look into submitting an 
> issue/PR for this as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to