Re: Cannot create external table on S3; class S3AFileSystem not found

2017-12-10 Thread Scott Halgrim
Furcy: Thank you. So. Much.

On Dec 10, 2017, 4:34 AM -0800, Furcy Pin , wrote:
> Hi Scott,
>
> I ran into similar issues before.
>
>
> I think that you need to do two things to enable Hive access to s3:
>
>
> 1. add the following jars to your Hive classpath, for instance by using 
> symlinks
>
> ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar 
> $HIVE_HOME/lib/.
> ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-2.7.3.jar 
> $HIVE_HOME/lib/.
>
>
> 2. Configure your $HADOOP_HOME/conf/hdfs-site.xml with the following 
> parameters
> (replace the $AWS_ values with your correct values, environment variable 
> substitution doesn't work until Hadoop 3 I believe)
>
>
>     
>       
>         fs.s3a.access.key
>         $AWS_ACCESS_KEY_ID
>       
>       
>         fs.s3a.secret.key
>         $AWS_SECRET_ACCESS_KEY
>       
>     
>
>
>
> You can also check out the docker I made with Hive and Spark pre-installed 
> and pre-configured for Amazon
> https://github.com/FurcyPin/docker-hive-spark
>
> You can either use it directly or just look at how I did it.
>
> Regards,
>
> Furcy
>
>
>
>
> > 2017-12-09 20:01 GMT+01:00 Scott Halgrim :
> > > Thanks, Elliott.
> > >
> > > I’m using Ubuntu 14.04.5 LTS, Hadoop 2.9.0, and Hive 2.3.2.
> > >
> > > I hadn’t had a CLASSPATH environment variable set, but I did just set it 
> > > to have that jar on it and I get the same answer.
> > >
> > > I don’t really have the installation in “/path/to/…” but everything else 
> > > in the path is correct. You’re right, I’m just obfuscating the actual 
> > > value.
> > >
> > > A new odd behavior that didn’t happen yesterday, is that in hive, ‘SELECT 
> > > * FROM TBLS;’ returns an error: ‘Table not found: TBLS’. Yet, I can 
> > > connect to the MySQL backend and query it fine. In hive I can also query 
> > > tables I know are there…just not TBLS.
> > >
> > > Hope that makes sense,
> > >
> > > Thanks again!
> > >
> > > Scott
> > >
> > > On Dec 9, 2017, 9:46 AM -0800, Elliot West , wrote:
> > > > Which distribution are you using? Do you have hadoop-aws on the class 
> > > > path? Is ‘/path/to/hadoop/install’ a literal value or a placeholder 
> > > > that you’ using for the actual location?
> > > >
> > > > Cheers,
> > > >
> > > > Elliot.
> > > >
> > > > > On Sat, 9 Dec 2017 at 00:08, Scott Halgrim  
> > > > > wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I’ve been struggling with this for a few hours, hopefully somebody 
> > > > > > here can help me out.
> > > > > >
> > > > > > We have a lot of data in parquet format on S3 and we want to use 
> > > > > > Hive to query it. I’m running on ubuntu and we have a MySQL 
> > > > > > metadata store on AWS RDS.
> > > > > >
> > > > > > The command in the hive client I’m trying to run is:
> > > > > > > CREATE EXTERNAL TABLE
> > > > > > > my_schema.my_table
> > > > > > > (account_id INT,
> > > > > > > action VARCHAR(282),
> > > > > > > another_id INT
> > > > > > > yaid INT,
> > > > > > > `date` TIMESTAMP,
> > > > > > > deleted_at TIMESTAMP,
> > > > > > > id INT,
> > > > > > > lastchanged TIMESTAMP,
> > > > > > > thing_index DOUBLE,
> > > > > > > old_id INT,
> > > > > > > parent_id INT,
> > > > > > > running INT,
> > > > > > > other_id INT,
> > > > > > > another_thing VARCHAR(282),
> > > > > > > title VARCHAR(282),
> > > > > > > type_of VARCHAR(282))
> > > > > > > PARTITIONED BY (snapshot_date DATE)
> > > > > > > STORED AS parquet
> > > > > > > LOCATION 's3a://bucket/folder/foo_my_schema.my_table’;
> > > > > >
> > > > > > The error I get is:
> > > > > > > FAILED: SemanticException java.lang.RuntimeException: 
> > > > > > > java.lang.ClassNotFoundException: Class 
> > > > > > > org.apache.hadoop.fs.s3a.S3AFileSystem not found
> > > > > >
> > > > > > I have this in my hive-site.xml file:
> > > > > > > 
> > > > > > > hive.aux.jars.path
> > > > > > > /path/to/hadoop-install/hadoop-2.9.0/share/hadoop/tools/lib
> > > > > > > 
> > > > > >
> > > > > >
> > > > > > Another thing I tried was to create the external table without a 
> > > > > > location and then alter it to have the location:
> > > > > >
> > > > > > > alter table my_schema.my_table set location 
> > > > > > > "s3a://bucket/folder/foo_my_schema.my_table";
> > > > > >
> > > > > > And then I get a different error:
> > > > > >
> > > > > > > FAILED: SemanticException Cannot connect to namenode, please 
> > > > > > > check if host/port pair for 
> > > > > > > s3a://bucket/folder/foo_my_schema.my_table is valid
> > > > > >
> > > > > > What could I be missing?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Scott
>


Re: Cannot create external table on S3; class S3AFileSystem not found

2017-12-10 Thread Furcy Pin
Hi Scott,

I ran into similar issues before.


I think that you need to do two things to enable Hive access to s3:


1. add the following jars to your Hive classpath, for instance by using
symlinks

ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar
$HIVE_HOME/lib/.
ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-2.7.3.jar
$HIVE_HOME/lib/.


2. Configure your $HADOOP_HOME/conf/hdfs-site.xml with the following
parameters
(replace the $AWS_ values with your correct values, environment variable
substitution doesn't work until Hadoop 3 I believe)



  
fs.s3a.access.key
$AWS_ACCESS_KEY_ID
  
  
fs.s3a.secret.key
$AWS_SECRET_ACCESS_KEY
  




You can also check out the docker I made with Hive and Spark pre-installed
and pre-configured for Amazon
https://github.com/FurcyPin/docker-hive-spark

You can either use it directly or just look at how I did it.

Regards,

Furcy




2017-12-09 20:01 GMT+01:00 Scott Halgrim :

> Thanks, Elliott.
>
> I’m using Ubuntu 14.04.5 LTS, Hadoop 2.9.0, and Hive 2.3.2.
>
> I hadn’t had a CLASSPATH environment variable set, but I did just set it
> to have that jar on it and I get the same answer.
>
> I don’t really have the installation in “/path/to/…” but everything else
> in the path is correct. You’re right, I’m just obfuscating the actual value.
>
> A new odd behavior that didn’t happen yesterday, is that in hive, ‘SELECT
> * FROM TBLS;’ returns an error: ‘Table not found: TBLS’. Yet, I can connect
> to the MySQL backend and query it fine. In hive I can also query tables I
> know are there…just not TBLS.
>
> Hope that makes sense,
>
> Thanks again!
>
> Scott
>
> On Dec 9, 2017, 9:46 AM -0800, Elliot West , wrote:
>
> Which distribution are you using? Do you have hadoop-aws on the class
> path? Is ‘/path/to/hadoop/install’ a literal value or a placeholder that
> you’ using for the actual location?
>
> Cheers,
>
> Elliot.
>
> On Sat, 9 Dec 2017 at 00:08, Scott Halgrim 
> wrote:
>
>> Hi,
>>
>> I’ve been struggling with this for a few hours, hopefully somebody here
>> can help me out.
>>
>> We have a lot of data in parquet format on S3 and we want to use Hive to
>> query it. I’m running on ubuntu and we have a MySQL metadata store on AWS
>> RDS.
>>
>> The command in the hive client I’m trying to run is:
>>
>> CREATE EXTERNAL TABLE
>> my_schema.my_table
>> (account_id INT,
>> action VARCHAR(282),
>> another_id INT
>> yaid INT,
>> `date` TIMESTAMP,
>> deleted_at TIMESTAMP,
>> id INT,
>> lastchanged TIMESTAMP,
>> thing_index DOUBLE,
>> old_id INT,
>> parent_id INT,
>> running INT,
>> other_id INT,
>> another_thing VARCHAR(282),
>> title VARCHAR(282),
>> type_of VARCHAR(282))
>> PARTITIONED BY (snapshot_date DATE)
>> STORED AS parquet
>> LOCATION 's3a://bucket/folder/foo_my_schema.my_table’;
>>
>>
>> The error I get is:
>>
>> FAILED: SemanticException java.lang.RuntimeException: 
>> java.lang.ClassNotFoundException:
>> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
>>
>>
>> I have this in my hive-site.xml file:
>>
>> 
>> hive.aux.jars.path
>> /path/to/hadoop-install/hadoop-2.9.0/share/
>> hadoop/tools/lib
>> 
>>
>>
>>
>> Another thing I tried was to create the external table without a location
>> and then alter it to have the location:
>>
>> alter table my_schema.my_table set location "s3a://bucket/folder/foo_my_
>> schema.my_table";
>>
>>
>> And then I get a different error:
>>
>> FAILED: SemanticException Cannot connect to namenode, please check if
>> host/port pair for s3a://bucket/folder/foo_my_schema.my_table is valid
>>
>>
>> What could I be missing?
>>
>> Thanks!
>>
>> Scott
>>
>