All,
In case others are in the same situation as I am, I will tell you how I solved
this. After A LLLOOOOTTTT of digging through source code, I discovered the
following facts:
• Drill is using hadoop’s FileSystem to support S3 queries. So any
configuration items that work for that will also work if you place them in the
core-site.xml file here.
• In the Hadoop-aws jar/source code, it uses these classes to get credentials:
o S3AFileSystem
o S3AUtils
o [Default]S3ClientFactory
• If you configure nothing, then naturally credentials will be searched in this
order:
o BasicAWSCredentialsProvider – looks for access and secret in the core-site
xml file
o EnvironmentVariableCredentialsProvider – looks for access and secret in
environment variables.
o SharedInstanceProfileCredentialsProvider – tries to get credentials from the
instance metadata, THIS IS THE ONE THAT CAN FIND IAM CREDENTIALS!
So to solve this problem I had to do these steps:
1. Make sure that core-site.xml DOES NOT set the access and secret key
2. Make sure that your S3 Storage configuration DOES NOT set the access and
secret key from the Apache Drill web UI, Storage tab
3. In my case, I also needed server side encryption to be supported, there is a
property you can add to core-site.xml for that.
Here is what my core-site.xml file eventually looked like:
<configuration>
<property>
<name>fs.s3a.server-side-encryption-algorithm</name>
<value>YOUR_VALUE_HERE</value>
</property>
<property>
<name>fs.s3a.connection.maximum</name>
<value>100</value>
</property>
</configuration>
When you query from drill, the format should look like this:
SELECT * FROM s3.`s3a://my-bucket/drill/nation.parquet` limit 3;
Also, if somebody needs to troubleshoot this, then modify the logback.xml, add
these:
<logger name="com.amazonaws.services.s3" additivity="false">
<level value="trace"/>
<appender-ref ref="FILE" />
</logger>
<logger name="org.apache.drill.exec.store.dfs" additivity="false">
<level value="trace"/>
<appender-ref ref="FILE" />
</logger>
Then you can see log entries for these things in drillbit.log
I hope this may help other people who need to use IAM and/or server side
encryption with drill.
I also hope that somebody will update the Drill documentation to explain how to
do this, it could have saved me a day of work!
Michael Knapp
On 4/3/17, 1:13 PM, "Knapp, Michael" <[email protected]> wrote:
Drill Developers,
I am using IAM roles on EC2 instances, your documentation here:
https://drill.apache.org/docs/s3-storage-plugin/
instructs me to provide an access key and secret key, which I do not have
since I am using IAM roles.
I have been reviewing the source code a few hours now and still have not
found a point in the code where you connect with S3. I was surprised to find
that you do not use the AWS SDK.
Can you please tell me:
1. Does Drill support using IAM roles to provide credentials for S3
access?
2. Where in the code does Drill establish a connection with S3?
Michael Knapp
________________________________________________________
The information contained in this e-mail is confidential and/or proprietary
to Capital One and/or its affiliates and may only be used solely in performance
of work or services for Capital One. The information transmitted herewith is
intended only for use by the individual or entity to which it is addressed. If
the reader of this message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is
strictly prohibited. If you have received this communication in error, please
contact the sender and delete the material from your computer.
________________________________________________________
The information contained in this e-mail is confidential and/or proprietary to
Capital One and/or its affiliates and may only be used solely in performance of
work or services for Capital One. The information transmitted herewith is
intended only for use by the individual or entity to which it is addressed. If
the reader of this message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is
strictly prohibited. If you have received this communication in error, please
contact the sender and delete the material from your computer.