Hi all,
I have a EMR hive cluster and all the tables are external tables where
files are stored on s3.
Following drill tutorials I've setup drill embed on my local and I can
successfully connect to remove hive cluster. I can list all the tables in
the hive cluster.
However when I do `select .. from` type of queries on those tables drill
complains about "Error: SYSTEM ERROR: IOException:
/path/to/hive/table/folder doesn't exist" (assume the actual s3 path is
"s3://my-bucket-name/path/to/hive/table/folder"). I can see
/path/to/hive/table/folder is the correct path (but without the
s3://my-bucket-name prefix).
My hive storage configuration is like this:
{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://ip-*.ec2.internal:9083",
"javax.jdo.option.ConnectionURL":
"jdbc:derby:;databaseName=../sample-data/drill_hive_db;create=true",
"hive.metastore.warehouse.dir": "/tmp/drill_hive_wh",
"fs.default.name": "s3://<my-bucket-name>", # I also tried s3a and s3n,
none of them works..
"hive.metastore.sasl.enabled": "false"
}
}
I'm using drill 1.5.0 and jets3t-0.9.2 as 3rd party library. I tried to
enable s3 and it works fine, so my aws creds is all right and configured
correct.
Any help will be appreciated! I'm stuck on this for two days. I don't have
any clue to debug this now.
Thank you very much
-Vincent