I was able to solve the issue by providing a custom version of the presto
jar. I will create a ticket and raise a pull request so that others can
benefit from it. I will share the details here shortly.

Thanks everyone for your help and support. Especially Austin, he stands out
due to his interest in the issue and helping to find ways to resolve it.

Regards,
Swagat

On Tue, Apr 6, 2021 at 2:35 AM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> And actually, I've found that the correct version of the AWS SDK *is*
> included in Flink 1.12, which was reported and fixed in FLINK-18676
> (see[1]). Since you said you saw this also occur in 1.12, can you share
> more details about what you saw there?
>
> Best,
> Austin
>
> [1]: https://issues.apache.org/jira/browse/FLINK-18676
>
> On Mon, Apr 5, 2021 at 4:53 PM Austin Cawley-Edwards <
> austin.caw...@gmail.com> wrote:
>
>> That looks interesting! I've also found the full list of S3 properties[1]
>> for the version of presto-hive bundled with Flink 1.12 (see [2]), which
>> includes an option for a KMS key (hive.s3.kms-key-id).
>>
>> (also, adding back the user list)
>>
>> [1]:
>> https://prestodb.io/docs/0.187/connector/hive.html#amazon-s3-configuration
>> [2]:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/filesystems/s3.html#hadooppresto-s3-file-systems-plugins
>>
>> On Mon, Apr 5, 2021 at 4:21 PM Swagat Mishra <swaga...@gmail.com> wrote:
>>
>>> Btw, there is also an option to provide a custom credential provider,
>>> what are your thoughts on this?
>>>
>>> presto.s3.credentials-provider
>>>
>>>
>>> On Tue, Apr 6, 2021 at 12:43 AM Austin Cawley-Edwards <
>>> austin.caw...@gmail.com> wrote:
>>>
>>>> I've confirmed that for the bundled + shaded aws dependency, the only
>>>> way to upgrade it is to build a flink-s3-fs-presto jar with the updated
>>>> dependency. Let me know if this is feasible for you, if the KMS key
>>>> solution doesn't work.
>>>>
>>>> Best,
>>>> Austin
>>>>
>>>> On Mon, Apr 5, 2021 at 2:18 PM Austin Cawley-Edwards <
>>>> austin.caw...@gmail.com> wrote:
>>>>
>>>>> Hi Swagat,
>>>>>
>>>>> I don't believe there is an explicit configuration option for the KMS
>>>>> key – please let me know if you're able to make that work!
>>>>>
>>>>> Best,
>>>>> Austin
>>>>>
>>>>> On Mon, Apr 5, 2021 at 1:45 PM Swagat Mishra <swaga...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Austin,
>>>>>>
>>>>>> Let me know what you think on my latest email, if the approach might
>>>>>> work, or if it is already supported and I am not using the configurations
>>>>>> properly.
>>>>>>
>>>>>> Thanks for your interest and support.
>>>>>>
>>>>>> Regards,
>>>>>> Swagat
>>>>>>
>>>>>> On Mon, Apr 5, 2021 at 10:39 PM Austin Cawley-Edwards <
>>>>>> austin.caw...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Swagat,
>>>>>>>
>>>>>>> It looks like Flink 1.6 bundles the 1.11.165 version of the
>>>>>>> aws-java-sdk-core with the Presto implementation (transitively from 
>>>>>>> Presto
>>>>>>> 0.185[1]).
>>>>>>> The minimum support version for the ServiceAccount authentication
>>>>>>> approach is 1.11.704 (see [2]) which was released on Jan 9th, 2020[3], 
>>>>>>> long
>>>>>>> after Flink 1.6 was released. It looks like even the most recent Presto 
>>>>>>> is
>>>>>>> on a version below that, concretely 1.11.697 in the master branch[4], 
>>>>>>> so I
>>>>>>> don't think even upgrading Flink to 1.6+ will solve this though it 
>>>>>>> looks to
>>>>>>> me like the AWS dependency is managed better in more recent Flink 
>>>>>>> versions.
>>>>>>> I'll have more for you on that front tomorrow, after the Easter break.
>>>>>>>
>>>>>>> I think what you would have to do to make this authentication
>>>>>>> approach work for Flink 1.6 is building a custom version of the
>>>>>>> flink-s3-fs-presto jar, replacing the bundled AWS dependency with the
>>>>>>> 1.11.704 version, and then shading it the same way.
>>>>>>>
>>>>>>> In the meantime, would you mind creating a JIRA ticket with this use
>>>>>>> case? That'll give you the best insight into the status of fixing this 
>>>>>>> :)
>>>>>>>
>>>>>>> Let me know if that makes sense,
>>>>>>> Austin
>>>>>>>
>>>>>>> [1]:
>>>>>>> https://github.com/prestodb/presto/blob/1d4ee196df4327568c0982811d8459a44f1792b9/pom.xml#L53
>>>>>>> [2]:
>>>>>>> https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html
>>>>>>> [3]: https://github.com/aws/aws-sdk-java/releases/tag/1.11.704
>>>>>>> [4]: https://github.com/prestodb/presto/blob/master/pom.xml#L52
>>>>>>>
>>>>>>> On Sun, Apr 4, 2021 at 3:32 AM Swagat Mishra <swaga...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Austin -
>>>>>>>>
>>>>>>>> In my case the set up is such that services are deployed on
>>>>>>>> Kubernetes with Docker, running on EKS. There is also an istio service
>>>>>>>> mesh. So all the services communicate and access AWS resources like S3
>>>>>>>> using the service account. Service account is associated with IAM 
>>>>>>>> roles. I
>>>>>>>> have verified that the service account has access to S3, by running a
>>>>>>>> program that connects to S3 to read a file also aws client when
>>>>>>>> packaged into the pod is able to access S3. So that means the roles and
>>>>>>>> policies are good.
>>>>>>>>
>>>>>>>> When I am running flink, I am following the same configuration for
>>>>>>>> job manager and task manager as provided here:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/standalone/kubernetes.html
>>>>>>>>
>>>>>>>> The exception we are getting is -
>>>>>>>> org.apache.flink.fs.s3presto.shaded.com.amazonaws.SDKClientException:
>>>>>>>> Unable to load credentials from service end point.
>>>>>>>>
>>>>>>>> This happens in the EC2CredentialFetcher class method
>>>>>>>> fetchCredentials - line number 66, when it tries to read resource,
>>>>>>>> effectively executing
>>>>>>>> CURL 169.254.170.2/AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
>>>>>>>>
>>>>>>>> I am not setting the
>>>>>>>> variable AWS_CONTAINER_CREDENTIALS_RELATIVE_URI because its not the 
>>>>>>>> right
>>>>>>>> way to do it for us, we are on EKS. Similarly any of the
>>>>>>>> ~/.aws/credentials file approach will also not work for us.
>>>>>>>>
>>>>>>>>
>>>>>>>> Atm, I haven't tried the kuberenetes service account property you
>>>>>>>> mentioned above. I will try and let you know how it goes.
>>>>>>>>
>>>>>>>> Question - do i need to provide any parameters while building the
>>>>>>>> docker image or any configuration in the flink config to tell flink 
>>>>>>>> that
>>>>>>>> for all purposes it should be using the service account and not try to 
>>>>>>>> get
>>>>>>>> into the EC2CredentialFetcher class.
>>>>>>>>
>>>>>>>> One more thing - we were trying this on the 1.6 version of Flink
>>>>>>>> and not the 1.12 version.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Swagat
>>>>>>>>
>>>>>>>> On Sun, Apr 4, 2021 at 8:56 AM Sameer Wadkar <sam...@axiomine.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Kube2Iam needs to modify IPtables to proxy calls to ec2 metadata
>>>>>>>>> to a daemonset which runs privileged pods which maps a IP Address of 
>>>>>>>>> the
>>>>>>>>> pods and its associated service account to make STS calls and return
>>>>>>>>> temporary AWS credentials. Your pod “thinks” the ec2 metadata url 
>>>>>>>>> works
>>>>>>>>> locally like in an ec2 instance.
>>>>>>>>>
>>>>>>>>> I have found that mutating webhooks are easier to deploy (when you
>>>>>>>>> have no control over the Kubernetes environment - say you cannot 
>>>>>>>>> change
>>>>>>>>> iptables or run privileged pods). These can configure the
>>>>>>>>> ~/.aws/credentials file. The webhook can make the STS call for the 
>>>>>>>>> service
>>>>>>>>> account to role mapping. A side car container to which the main 
>>>>>>>>> container
>>>>>>>>> has no access can even renew credentials becoz STS returns temp
>>>>>>>>> credentials.
>>>>>>>>>
>>>>>>>>> Sent from my iPhone
>>>>>>>>>
>>>>>>>>> On Apr 3, 2021, at 10:29 PM, Austin Cawley-Edwards <
>>>>>>>>> austin.caw...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> 
>>>>>>>>> If you’re just looking to attach a service account to a pod using
>>>>>>>>> the native AWS EKS IAM mapping[1], you should be able to attach the 
>>>>>>>>> service
>>>>>>>>> account to the pod via the `kubernetes.service-account` configuration
>>>>>>>>> option[2].
>>>>>>>>>
>>>>>>>>> Let me know if that works for you!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Austin
>>>>>>>>>
>>>>>>>>> [1]:
>>>>>>>>> https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
>>>>>>>>> [2]:
>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/config.html#kubernetes-service-account
>>>>>>>>>
>>>>>>>>> On Sat, Apr 3, 2021 at 10:18 PM Austin Cawley-Edwards <
>>>>>>>>> austin.caw...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Can you describe your setup a little bit more? And perhaps how
>>>>>>>>>> you use this setup to grant access to other non-Flink pods?
>>>>>>>>>>
>>>>>>>>>> On Sat, Apr 3, 2021 at 2:29 PM Swagat Mishra <swaga...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes I looked at kube2iam, I haven't experimented with it.
>>>>>>>>>>>
>>>>>>>>>>> Given that the service account has access to S3, shouldn't we
>>>>>>>>>>> have a simpler mechanism to connect to underlying resources based 
>>>>>>>>>>> on the
>>>>>>>>>>> service account authorization?
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Apr 3, 2021, 10:10 PM Austin Cawley-Edwards <
>>>>>>>>>>> austin.caw...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Swagat,
>>>>>>>>>>>>
>>>>>>>>>>>> I’ve used kube2iam[1] for granting AWS access to Flink pods in
>>>>>>>>>>>> the past with good results. It’s all based on mapping pod 
>>>>>>>>>>>> annotations to
>>>>>>>>>>>> AWS IAM roles. Is this something that might work for you?
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Austin
>>>>>>>>>>>>
>>>>>>>>>>>> [1]: https://github.com/jtblin/kube2iam
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Apr 3, 2021 at 10:40 AM Swagat Mishra <
>>>>>>>>>>>> swaga...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> No we are running on aws. The mechanisms supported by flink to
>>>>>>>>>>>>> connect to resources like S3, need us to make changes that will 
>>>>>>>>>>>>> impact all
>>>>>>>>>>>>> services, something that we don't want to do. So providing the 
>>>>>>>>>>>>> aws secret
>>>>>>>>>>>>> key ID and passcode upfront or iam rules where it connects by 
>>>>>>>>>>>>> executing
>>>>>>>>>>>>> curl/ http calls to connect to S3 , don't work for me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I want to be able to connect to S3, using aws Api's and if
>>>>>>>>>>>>> that connection can be leveraged by the presto library, that is 
>>>>>>>>>>>>> what I am
>>>>>>>>>>>>> looking for.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Swagat
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Apr 3, 2021, 7:37 PM Israel Ekpo <israele...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are you running on Azure Kubernetes Service.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You should be able to do it because the identity can be
>>>>>>>>>>>>>> mapped to the labels of the pods not necessary Flink.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Apr 3, 2021 at 6:31 AM Swagat Mishra <
>>>>>>>>>>>>>> swaga...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think flink doesn't support pod identity, any plans tk
>>>>>>>>>>>>>>> achieve it in any subsequent release.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Swagat
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Reply via email to