Drill on EMR hive with S3 backed storage

Vincent Meng Fri, 18 Mar 2016 22:40:08 -0700

Hi all,

I have a EMR hive cluster and all the tables are external tables where
files are stored on s3.


Following drill tutorials I've setup drill embed on my local and I can
successfully connect to remove hive cluster. I can list all the tables in
the hive cluster.

However when I do `select .. from` type of queries on those tables drill
complains about "Error: SYSTEM ERROR: IOException:
/path/to/hive/table/folder doesn't exist" (assume the actual s3 path is
"s3://my-bucket-name/path/to/hive/table/folder"). I can see
/path/to/hive/table/folder is the correct path (but without the
s3://my-bucket-name prefix).

My hive storage configuration is like this:

{
  "type": "hive",
  "enabled": true,
  "configProps": {
    "hive.metastore.uris": "thrift://ip-*.ec2.internal:9083",
    "javax.jdo.option.ConnectionURL":
"jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
    "hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
    "fs.default.name": "s3://<my-bucket-name>", # I also tried s3a and s3n,
none of them works..
    "hive.metastore.sasl.enabled": "false"
  }
}

I'm using drill 1.5.0 and jets3t-0.9.2 as 3rd party library. I tried to
enable s3 and it works fine, so my aws creds is all right and configured
correct.

Any help will be appreciated! I'm stuck on this for two days. I don't have
any clue to debug this now.

Thank you very much

-Vincent

Drill on EMR hive with S3 backed storage

Reply via email to