I have scoured the Drill website and mailing list, and Google, and have come up with no advice. Can you help?
I started up an EMR cluster with AWS Hive 0.13.1 installed, started the metastore service: hive/bin/hive ‹service metastore, created a table: CREATE TABLE apachelog ( host STRING, IDENTITY STRING, USER STRING, TIME STRING, request STRING, STATUS STRING, SIZE STRING, referrer STRING, agent STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") ([0-9]*) ([0-9]*) ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\")" ) STORED AS TEXTFILE; And loaded a small amount of data: LOAD DATA LOCAL INPATH 'access_log_1' OVERWRITE INTO TABLE apache_log; ‹-source: http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/input/access_lo g_1 I can query this data from the Hive console or from SquirrelSQL using the AWS Hive JDBC4 driver from http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HiveJDBCD river.html I configured a Drill storage plugin: { "type": "hive", "enabled": true, "configProps": { "hive.metastore.uris": "thrift://172.24.7.81:10000", "hive.metastore.sasl.enabled": "false" } } But all I get from Drill is socket timeouts reading from the Hive metastore, whether I try to query the apache_log table or Drill¹s INFORMATION_SCHEMA. I have a guess that I need to swap in some AWS-provided Hive-related jar files for others that were included with Drill. Looking for suggestions on that approach, or something else I might be overlooking. Thanks, Paul
