Re: Cannot create external table on S3; class S3AFileSystem not found
Furcy: Thank you. So. Much. On Dec 10, 2017, 4:34 AM -0800, Furcy Pin, wrote: > Hi Scott, > > I ran into similar issues before. > > > I think that you need to do two things to enable Hive access to s3: > > > 1. add the following jars to your Hive classpath, for instance by using > symlinks > > ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar > $HIVE_HOME/lib/. > ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-2.7.3.jar > $HIVE_HOME/lib/. > > > 2. Configure your $HADOOP_HOME/conf/hdfs-site.xml with the following > parameters > (replace the $AWS_ values with your correct values, environment variable > substitution doesn't work until Hadoop 3 I believe) > > > > > fs.s3a.access.key > $AWS_ACCESS_KEY_ID > > > fs.s3a.secret.key > $AWS_SECRET_ACCESS_KEY > > > > > > You can also check out the docker I made with Hive and Spark pre-installed > and pre-configured for Amazon > https://github.com/FurcyPin/docker-hive-spark > > You can either use it directly or just look at how I did it. > > Regards, > > Furcy > > > > > > 2017-12-09 20:01 GMT+01:00 Scott Halgrim : > > > Thanks, Elliott. > > > > > > I’m using Ubuntu 14.04.5 LTS, Hadoop 2.9.0, and Hive 2.3.2. > > > > > > I hadn’t had a CLASSPATH environment variable set, but I did just set it > > > to have that jar on it and I get the same answer. > > > > > > I don’t really have the installation in “/path/to/…” but everything else > > > in the path is correct. You’re right, I’m just obfuscating the actual > > > value. > > > > > > A new odd behavior that didn’t happen yesterday, is that in hive, ‘SELECT > > > * FROM TBLS;’ returns an error: ‘Table not found: TBLS’. Yet, I can > > > connect to the MySQL backend and query it fine. In hive I can also query > > > tables I know are there…just not TBLS. > > > > > > Hope that makes sense, > > > > > > Thanks again! > > > > > > Scott > > > > > > On Dec 9, 2017, 9:46 AM -0800, Elliot West , wrote: > > > > Which distribution are you using? Do you have hadoop-aws on the class > > > > path? Is ‘/path/to/hadoop/install’ a literal value or a placeholder > > > > that you’ using for the actual location? > > > > > > > > Cheers, > > > > > > > > Elliot. > > > > > > > > > On Sat, 9 Dec 2017 at 00:08, Scott Halgrim > > > > > wrote: > > > > > > Hi, > > > > > > > > > > > > I’ve been struggling with this for a few hours, hopefully somebody > > > > > > here can help me out. > > > > > > > > > > > > We have a lot of data in parquet format on S3 and we want to use > > > > > > Hive to query it. I’m running on ubuntu and we have a MySQL > > > > > > metadata store on AWS RDS. > > > > > > > > > > > > The command in the hive client I’m trying to run is: > > > > > > > CREATE EXTERNAL TABLE > > > > > > > my_schema.my_table > > > > > > > (account_id INT, > > > > > > > action VARCHAR(282), > > > > > > > another_id INT > > > > > > > yaid INT, > > > > > > > `date` TIMESTAMP, > > > > > > > deleted_at TIMESTAMP, > > > > > > > id INT, > > > > > > > lastchanged TIMESTAMP, > > > > > > > thing_index DOUBLE, > > > > > > > old_id INT, > > > > > > > parent_id INT, > > > > > > > running INT, > > > > > > > other_id INT, > > > > > > > another_thing VARCHAR(282), > > > > > > > title VARCHAR(282), > > > > > > > type_of VARCHAR(282)) > > > > > > > PARTITIONED BY (snapshot_date DATE) > > > > > > > STORED AS parquet > > > > > > > LOCATION 's3a://bucket/folder/foo_my_schema.my_table’; > > > > > > > > > > > > The error I get is: > > > > > > > FAILED: SemanticException java.lang.RuntimeException: > > > > > > > java.lang.ClassNotFoundException: Class > > > > > > > org.apache.hadoop.fs.s3a.S3AFileSystem not found > > > > > > > > > > > > I have this in my hive-site.xml file: > > > > > > > > > > > > > > hive.aux.jars.path > > > > > > > /path/to/hadoop-install/hadoop-2.9.0/share/hadoop/tools/lib > > > > > > > > > > > > > > > > > > > > > > > > > Another thing I tried was to create the external table without a > > > > > > location and then alter it to have the location: > > > > > > > > > > > > > alter table my_schema.my_table set location > > > > > > > "s3a://bucket/folder/foo_my_schema.my_table"; > > > > > > > > > > > > And then I get a different error: > > > > > > > > > > > > > FAILED: SemanticException Cannot connect to namenode, please > > > > > > > check if host/port pair for > > > > > > > s3a://bucket/folder/foo_my_schema.my_table is valid > > > > > > > > > > > > What could I be missing? > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Scott >
Re: Cannot create external table on S3; class S3AFileSystem not found
Hi Scott, I ran into similar issues before. I think that you need to do two things to enable Hive access to s3: 1. add the following jars to your Hive classpath, for instance by using symlinks ln -s $HADOOP_HOME/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar $HIVE_HOME/lib/. ln -s $HADOOP_HOME/share/hadoop/tools/lib/hadoop-aws-2.7.3.jar $HIVE_HOME/lib/. 2. Configure your $HADOOP_HOME/conf/hdfs-site.xml with the following parameters (replace the $AWS_ values with your correct values, environment variable substitution doesn't work until Hadoop 3 I believe) fs.s3a.access.key $AWS_ACCESS_KEY_ID fs.s3a.secret.key $AWS_SECRET_ACCESS_KEY You can also check out the docker I made with Hive and Spark pre-installed and pre-configured for Amazon https://github.com/FurcyPin/docker-hive-spark You can either use it directly or just look at how I did it. Regards, Furcy 2017-12-09 20:01 GMT+01:00 Scott Halgrim: > Thanks, Elliott. > > I’m using Ubuntu 14.04.5 LTS, Hadoop 2.9.0, and Hive 2.3.2. > > I hadn’t had a CLASSPATH environment variable set, but I did just set it > to have that jar on it and I get the same answer. > > I don’t really have the installation in “/path/to/…” but everything else > in the path is correct. You’re right, I’m just obfuscating the actual value. > > A new odd behavior that didn’t happen yesterday, is that in hive, ‘SELECT > * FROM TBLS;’ returns an error: ‘Table not found: TBLS’. Yet, I can connect > to the MySQL backend and query it fine. In hive I can also query tables I > know are there…just not TBLS. > > Hope that makes sense, > > Thanks again! > > Scott > > On Dec 9, 2017, 9:46 AM -0800, Elliot West , wrote: > > Which distribution are you using? Do you have hadoop-aws on the class > path? Is ‘/path/to/hadoop/install’ a literal value or a placeholder that > you’ using for the actual location? > > Cheers, > > Elliot. > > On Sat, 9 Dec 2017 at 00:08, Scott Halgrim > wrote: > >> Hi, >> >> I’ve been struggling with this for a few hours, hopefully somebody here >> can help me out. >> >> We have a lot of data in parquet format on S3 and we want to use Hive to >> query it. I’m running on ubuntu and we have a MySQL metadata store on AWS >> RDS. >> >> The command in the hive client I’m trying to run is: >> >> CREATE EXTERNAL TABLE >> my_schema.my_table >> (account_id INT, >> action VARCHAR(282), >> another_id INT >> yaid INT, >> `date` TIMESTAMP, >> deleted_at TIMESTAMP, >> id INT, >> lastchanged TIMESTAMP, >> thing_index DOUBLE, >> old_id INT, >> parent_id INT, >> running INT, >> other_id INT, >> another_thing VARCHAR(282), >> title VARCHAR(282), >> type_of VARCHAR(282)) >> PARTITIONED BY (snapshot_date DATE) >> STORED AS parquet >> LOCATION 's3a://bucket/folder/foo_my_schema.my_table’; >> >> >> The error I get is: >> >> FAILED: SemanticException java.lang.RuntimeException: >> java.lang.ClassNotFoundException: >> Class org.apache.hadoop.fs.s3a.S3AFileSystem not found >> >> >> I have this in my hive-site.xml file: >> >> >> hive.aux.jars.path >> /path/to/hadoop-install/hadoop-2.9.0/share/ >> hadoop/tools/lib >> >> >> >> >> Another thing I tried was to create the external table without a location >> and then alter it to have the location: >> >> alter table my_schema.my_table set location "s3a://bucket/folder/foo_my_ >> schema.my_table"; >> >> >> And then I get a different error: >> >> FAILED: SemanticException Cannot connect to namenode, please check if >> host/port pair for s3a://bucket/folder/foo_my_schema.my_table is valid >> >> >> What could I be missing? >> >> Thanks! >> >> Scott >> >
Re: Cannot create external table on S3; class S3AFileSystem not found
Thanks, Elliott. I’m using Ubuntu 14.04.5 LTS, Hadoop 2.9.0, and Hive 2.3.2. I hadn’t had a CLASSPATH environment variable set, but I did just set it to have that jar on it and I get the same answer. I don’t really have the installation in “/path/to/…” but everything else in the path is correct. You’re right, I’m just obfuscating the actual value. A new odd behavior that didn’t happen yesterday, is that in hive, ‘SELECT * FROM TBLS;’ returns an error: ‘Table not found: TBLS’. Yet, I can connect to the MySQL backend and query it fine. In hive I can also query tables I know are there…just not TBLS. Hope that makes sense, Thanks again! Scott On Dec 9, 2017, 9:46 AM -0800, Elliot West, wrote: > Which distribution are you using? Do you have hadoop-aws on the class path? > Is ‘/path/to/hadoop/install’ a literal value or a placeholder that you’ using > for the actual location? > > Cheers, > > Elliot. > > > On Sat, 9 Dec 2017 at 00:08, Scott Halgrim wrote: > > > Hi, > > > > > > I’ve been struggling with this for a few hours, hopefully somebody here > > > can help me out. > > > > > > We have a lot of data in parquet format on S3 and we want to use Hive to > > > query it. I’m running on ubuntu and we have a MySQL metadata store on AWS > > > RDS. > > > > > > The command in the hive client I’m trying to run is: > > > > CREATE EXTERNAL TABLE > > > > my_schema.my_table > > > > (account_id INT, > > > > action VARCHAR(282), > > > > another_id INT > > > > yaid INT, > > > > `date` TIMESTAMP, > > > > deleted_at TIMESTAMP, > > > > id INT, > > > > lastchanged TIMESTAMP, > > > > thing_index DOUBLE, > > > > old_id INT, > > > > parent_id INT, > > > > running INT, > > > > other_id INT, > > > > another_thing VARCHAR(282), > > > > title VARCHAR(282), > > > > type_of VARCHAR(282)) > > > > PARTITIONED BY (snapshot_date DATE) > > > > STORED AS parquet > > > > LOCATION 's3a://bucket/folder/foo_my_schema.my_table’; > > > > > > The error I get is: > > > > FAILED: SemanticException java.lang.RuntimeException: > > > > java.lang.ClassNotFoundException: Class > > > > org.apache.hadoop.fs.s3a.S3AFileSystem not found > > > > > > I have this in my hive-site.xml file: > > > > > > > > hive.aux.jars.path > > > > /path/to/hadoop-install/hadoop-2.9.0/share/hadoop/tools/lib > > > > > > > > > > > > > Another thing I tried was to create the external table without a location > > > and then alter it to have the location: > > > > > > > alter table my_schema.my_table set location > > > > "s3a://bucket/folder/foo_my_schema.my_table"; > > > > > > And then I get a different error: > > > > > > > FAILED: SemanticException Cannot connect to namenode, please check if > > > > host/port pair for s3a://bucket/folder/foo_my_schema.my_table is valid > > > > > > What could I be missing? > > > > > > Thanks! > > > > > > Scott
Re: Cannot create external table on S3; class S3AFileSystem not found
Which distribution are you using? Do you have hadoop-aws on the class path? Is ‘/path/to/hadoop/install’ a literal value or a placeholder that you’ using for the actual location? Cheers, Elliot. On Sat, 9 Dec 2017 at 00:08, Scott Halgrimwrote: > Hi, > > I’ve been struggling with this for a few hours, hopefully somebody here > can help me out. > > We have a lot of data in parquet format on S3 and we want to use Hive to > query it. I’m running on ubuntu and we have a MySQL metadata store on AWS > RDS. > > The command in the hive client I’m trying to run is: > > CREATE EXTERNAL TABLE > my_schema.my_table > (account_id INT, > action VARCHAR(282), > another_id INT > yaid INT, > `date` TIMESTAMP, > deleted_at TIMESTAMP, > id INT, > lastchanged TIMESTAMP, > thing_index DOUBLE, > old_id INT, > parent_id INT, > running INT, > other_id INT, > another_thing VARCHAR(282), > title VARCHAR(282), > type_of VARCHAR(282)) > PARTITIONED BY (snapshot_date DATE) > STORED AS parquet > LOCATION 's3a://bucket/folder/foo_my_schema.my_table’; > > > The error I get is: > > FAILED: SemanticException java.lang.RuntimeException: > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found > > > I have this in my hive-site.xml file: > > > hive.aux.jars.path > /path/to/hadoop-install/hadoop-2.9.0/share/hadoop/tools/lib > > > > > Another thing I tried was to create the external table without a location > and then alter it to have the location: > > alter table my_schema.my_table set location > "s3a://bucket/folder/foo_my_schema.my_table"; > > > And then I get a different error: > > FAILED: SemanticException Cannot connect to namenode, please check if > host/port pair for s3a://bucket/folder/foo_my_schema.my_table is valid > > > What could I be missing? > > Thanks! > > Scott >
Cannot create external table on S3; class S3AFileSystem not found
Hi, I’ve been struggling with this for a few hours, hopefully somebody here can help me out. We have a lot of data in parquet format on S3 and we want to use Hive to query it. I’m running on ubuntu and we have a MySQL metadata store on AWS RDS. The command in the hive client I’m trying to run is: > CREATE EXTERNAL TABLE > my_schema.my_table > (account_id INT, > action VARCHAR(282), > another_id INT > yaid INT, > `date` TIMESTAMP, > deleted_at TIMESTAMP, > id INT, > lastchanged TIMESTAMP, > thing_index DOUBLE, > old_id INT, > parent_id INT, > running INT, > other_id INT, > another_thing VARCHAR(282), > title VARCHAR(282), > type_of VARCHAR(282)) > PARTITIONED BY (snapshot_date DATE) > STORED AS parquet > LOCATION 's3a://bucket/folder/foo_my_schema.my_table’; The error I get is: > FAILED: SemanticException java.lang.RuntimeException: > java.lang.ClassNotFoundException: Class > org.apache.hadoop.fs.s3a.S3AFileSystem not found I have this in my hive-site.xml file: > > hive.aux.jars.path > /path/to/hadoop-install/hadoop-2.9.0/share/hadoop/tools/lib > Another thing I tried was to create the external table without a location and then alter it to have the location: > alter table my_schema.my_table set location > "s3a://bucket/folder/foo_my_schema.my_table"; And then I get a different error: > FAILED: SemanticException Cannot connect to namenode, please check if > host/port pair for s3a://bucket/folder/foo_my_schema.my_table is valid What could I be missing? Thanks! Scott