Hi Scott,

I ran into similar issues before.


I think that you need to do two things to enable Hive access to s3:


1. add the following jars to your Hive classpath, for instance by using
symlinks

ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar
$HIVE_HOME/lib/.
ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-2.7.3.jar
$HIVE_HOME/lib/.


2. Configure your $HADOOP_HOME/conf/hdfs-site.xml with the following
parameters
(replace the $AWS_ values with your correct values, environment variable
substitution doesn't work until Hadoop 3 I believe)


    <configuration>
      <property>
        <name>fs.s3a.access.key</name>
        <value>$AWS_ACCESS_KEY_ID</value>
      </property>
      <property>
        <name>fs.s3a.secret.key</name>
        <value>$AWS_SECRET_ACCESS_KEY</value>
      </property>
    </configuration>



You can also check out the docker I made with Hive and Spark pre-installed
and pre-configured for Amazon
https://github.com/FurcyPin/docker-hive-spark

You can either use it directly or just look at how I did it.

Regards,

Furcy




2017-12-09 20:01 GMT+01:00 Scott Halgrim <scott.halg...@zapier.com>:

> Thanks, Elliott.
>
> I’m using Ubuntu 14.04.5 LTS, Hadoop 2.9.0, and Hive 2.3.2.
>
> I hadn’t had a CLASSPATH environment variable set, but I did just set it
> to have that jar on it and I get the same answer.
>
> I don’t really have the installation in “/path/to/…” but everything else
> in the path is correct. You’re right, I’m just obfuscating the actual value.
>
> A new odd behavior that didn’t happen yesterday, is that in hive, ‘SELECT
> * FROM TBLS;’ returns an error: ‘Table not found: TBLS’. Yet, I can connect
> to the MySQL backend and query it fine. In hive I can also query tables I
> know are there…just not TBLS.
>
> Hope that makes sense,
>
> Thanks again!
>
> Scott
>
> On Dec 9, 2017, 9:46 AM -0800, Elliot West <tea...@gmail.com>, wrote:
>
> Which distribution are you using? Do you have hadoop-aws on the class
> path? Is ‘/path/to/hadoop/install’ a literal value or a placeholder that
> you’ using for the actual location?
>
> Cheers,
>
> Elliot.
>
> On Sat, 9 Dec 2017 at 00:08, Scott Halgrim <scott.halg...@zapier.com>
> wrote:
>
>> Hi,
>>
>> I’ve been struggling with this for a few hours, hopefully somebody here
>> can help me out.
>>
>> We have a lot of data in parquet format on S3 and we want to use Hive to
>> query it. I’m running on ubuntu and we have a MySQL metadata store on AWS
>> RDS.
>>
>> The command in the hive client I’m trying to run is:
>>
>> CREATE EXTERNAL TABLE
>> my_schema.my_table
>> (account_id INT,
>> action VARCHAR(282),
>> another_id INT
>> yaid INT,
>> `date` TIMESTAMP,
>> deleted_at TIMESTAMP,
>> id INT,
>> lastchanged TIMESTAMP,
>> thing_index DOUBLE,
>> old_id INT,
>> parent_id INT,
>> running INT,
>> other_id INT,
>> another_thing VARCHAR(282),
>> title VARCHAR(282),
>> type_of VARCHAR(282))
>> PARTITIONED BY (snapshot_date DATE)
>> STORED AS parquet
>> LOCATION 's3a://bucket/folder/foo_my_schema.my_table’;
>>
>>
>> The error I get is:
>>
>> FAILED: SemanticException java.lang.RuntimeException: 
>> java.lang.ClassNotFoundException:
>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>
>>
>> I have this in my hive-site.xml file:
>>
>> <property>
>> <name>hive.aux.jars.path</name>
>> <value>/path/to/hadoop-install/hadoop-2.9.0/share/
>> hadoop/tools/lib</value>
>> </property>
>>
>>
>>
>> Another thing I tried was to create the external table without a location
>> and then alter it to have the location:
>>
>> alter table my_schema.my_table set location "s3a://bucket/folder/foo_my_
>> schema.my_table";
>>
>>
>> And then I get a different error:
>>
>> FAILED: SemanticException Cannot connect to namenode, please check if
>> host/port pair for s3a://bucket/folder/foo_my_schema.my_table is valid
>>
>>
>> What could I be missing?
>>
>> Thanks!
>>
>> Scott
>>
>

Reply via email to