[ https://issues.apache.org/jira/browse/SPARK-19405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Burak Yavuz resolved SPARK-19405. --------------------------------- Resolution: Fixed Assignee: Adam Budde Fix Version/s: 2.2.0 Resolved with: https://github.com/apache/spark/pull/16744 > Add support to KinesisUtils for cross-account Kinesis reads via STS > ------------------------------------------------------------------- > > Key: SPARK-19405 > URL: https://issues.apache.org/jira/browse/SPARK-19405 > Project: Spark > Issue Type: Improvement > Components: DStreams > Reporter: Adam Budde > Assignee: Adam Budde > Priority: Minor > Fix For: 2.2.0 > > > h1. Summary > Enable KinesisReceiver to utilize STSAssumeRoleSessionCredentialsProvider > when setting up the Kinesis Client Library in order to enable secure > cross-account Kinesis stream reads managed by AWS Simple Token Service (STS) > h1. Details > Spark's KinesisReceiver implementation utilizes the Kinesis Client Library in > order to allow users to write Spark Streaming jobs that operate on Kinesis > data. The KCL uses a few AWS services under the hood in order to provide > checkpointed, load-balanced processing of the underlying data in a Kinesis > stream. Running the KCL requires permissions to be set up for the following > AWS resources. > * AWS Kinesis for reading stream data > * AWS DynamoDB for storing KCL shared state in tables > * AWS CloudWatch for logging KCL metrics > The KinesisUtils.createStream() API allows users to authenticate to these > services either by specifying an explicit AWS access key/secret key > credential pair or by using the default credential provider chain. This > supports authorizing to the three AWS services using either an AWS keypair > (either provided explicitly or parsed from environment variables, etc.): > !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/KeypairOnly.png! > Or the IAM instance profile (when running on EC2): > !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/InstanceProfileOnly.png! > AWS users often need to access resources across separate accounts. This could > be done in order to consume data produced by another organization or from a > service running in another account for resource isolation purposes. AWS > Simple Token Service (STS) provides a secure way to authorize cross-account > resource access by using temporary sessions to assuming an IAM role in the > AWS account with the resources being accessed. > The [IAM > documentation|http://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html] > covers the specifics of how cross account IAM role assumption works in much > greater detail, but if an actor in account A wanted to read from a Kinesis > stream in account B the general steps required would look something like this: > * An IAM role is added to account B with read permissions for the Kinesis > stream > ** Trust policy is configured to allow account A to assume the role > * Actor in account A uses its own long-lived credentials to tell STS to > assume the role in account B > * STS returns temporary credentials with permission to read from the stream > in account B > Applied to KinesisReceiver and the KCL, we could use a keypair as our > long-lived credentials to authenticate to STS and assume an external role > with the necessary KCL permissions: > !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSKeypair.png! > Or the instance profile as long-lived credentials: > !https://raw.githubusercontent.com/budde/budde_asf_jira_images/master/spark/kinesis_sts_support/STSInstanceProfile.png! > The STSAssumeRoleSessionCredentialsProvider implementation of the > AWSCredentialsProviderChain interface from the AWS SDK abstracts all of the > management of the temporary session credentials away from the user. > STSAssumeRoleSessionCredentialsProvider simply needs the ARN of the AWS role > to be assumed, a session name for STS labeling purposes, an optional session > external ID and long-lived credentials to use for authenticating with the STS > service itself. > Supporting cross-account Kinesis access via STS requires supplying the > following additional configuration parameters: > * ARN of IAM role to assume in external account > * A name to apply to the STS session > * (optional) An IAM external ID to validate the assumed role against > The STSAssumeRoleSessionCredentialsProvider implementation of the > AWSCredentialsProvider interface takes these parameters as input and > abstracts away all of the lifecycle management for the temporary session > credentials. Ideally, users could simply supply an AWSCredentialsProvider > instance as an argument when creating the stream that would be distributed to > the executors for use when setting up the KCL. Since these classes aren't > serializable this will require an approach similar to > SerializableAWSCredentials where the config parameters are passed via a > serializable object and the correct AWSCredentialsProvider implementation is > created on the executor from the params. > Following the current conventions, adding optional arguments will mean having > to double the number of overloaded implementations of > KinesisUtils.createStream(). For this reason, we can make stsAssumeRoleArn, > stsSessionName and stsExternalId each required parameters for STS > authentication (external id is ignored if none is specified in the trust > policy < link > of the assumed role). > Here are the providers that should be used for authentication depending on > the combination of AWS parameters provided: > ||Input params||Kinesis credentials||Long-lived credentials|| > |(none)|Use long-lived|DefaultAWSCredentialsProviderChain| > |awsAccessKeyId, awsSecretKey|Use long-lived|AWSCredentialsProvider w/keypair| > |stsRoleArn, stsSessionName, stsExternalId > (optional)|STSAssumeRoleSessionCredentialsProvider|DefaultAWSCredentialsProviderChain| > |awsAccessKeyId, awsSecretKey, stsRoleArn, stsSessionName, stsExternalId > (optional)|STSAssumeRoleSessionCredentialsProvider|AWSCredentialsProvider > w/keypair| > Since there's now a wide variety of combinations of optional parameters for > KinesisUtils, I think a builder pattern may provide a more manageable > interface for creating streams in both Scala and Java. This would also make > it feasible to specify specific AWS config params for DynamoDB and > CloudWatch, which is supported by the KCL. I may look into submitting an > issue/PR for this as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org