Connecting to Hive provided by AWS EMR

Paul Mogren Fri, 26 Jun 2015 15:13:43 -0700

I have scoured the Drill website and mailing list, and Google, and have
come up with no advice. Can you help?


I started up an EMR cluster with AWS Hive 0.13.1 installed,

started the metastore service: hive/bin/hive ‹service metastore,

created a table:
CREATE TABLE apachelog (
  host STRING,
  IDENTITY STRING,
  USER STRING,
  TIME STRING,
  request STRING,
  STATUS STRING,
  SIZE STRING,
  referrer STRING,
  agent STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
  "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^
\"]*|\"[^\"]*\") ([0-9]*) ([0-9]*) ([^ \"]*|\"[^\"]*\") ([^
\"]*|\"[^\"]*\")"
)
STORED AS TEXTFILE;

And loaded a small amount of data:
LOAD DATA LOCAL INPATH 'access_log_1' OVERWRITE INTO TABLE apache_log;
 ‹-source: 
http://elasticmapreduce.s3.amazonaws.com/samples/pig-apache/input/access_lo
g_1



I can query this data from the Hive console or from SquirrelSQL using the
AWS Hive JDBC4 driver from
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/HiveJDBCD
river.html

I configured a Drill storage plugin:
{
  "type": "hive",
  "enabled": true,
  "configProps": {
    "hive.metastore.uris": "thrift://172.24.7.81:10000",
    "hive.metastore.sasl.enabled": "false"
  }
}


But all I get from Drill is socket timeouts reading from the Hive
metastore, whether I try to query the apache_log table or Drill¹s
INFORMATION_SCHEMA.

I have a guess that I need to swap in some AWS-provided Hive-related jar
files for others that were included with Drill. Looking for suggestions on
that approach, or something else I might be overlooking.

Thanks,
Paul

Connecting to Hive provided by AWS EMR

Reply via email to