Has anyone managed to get Sqoop with hcatalog integration working on AWS EMR when Hive is configured to use AWS Glue Data Catalog?
I'm attempting to import from a MySQL db into Hive on an AWS EMR cluster. Hive is configured to use AWS Glue Data Catalog as the metadata catalog. sqoop import \ -Dmapred.output.direct.NativeS3FileSystem=false \ -Dmapred.output.direct.EmrFileSystem=false \ --connect jdbc:mysql:// ec2-18-221-214-250.us-east-2.compute.amazonaws.com:3306/test1 \ --username XXX -P \ -m 1 \ --table sampledata1 \ --hcatalog-database greg5 \ --hcatalog-table sampledata1_orc1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as orc' It appears that the EMR setup wizard properly configures Hive to use the Glue Data Catalog but not Sqoop. I had to add the Glue jar to Sqoop: sudo ln -s /usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive2-client.jar /usr/lib/sqoop/lib/aws-glue-datacatalog-hive2-client.jar When I run the above Sqoop command the table gets created but the import then fails with and exception saying it can't find the table. I've checked in Glue (and Hive) and the table is created correctly. Here is the exception: 18/02/21 20:17:41 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.dist/hive-site.xml 18/02/21 20:17:42 INFO common.HiveClientCache: Initializing cache: eviction-timeout=120 initial-capacity=50 maximum-capacity=50 18/02/21 20:17:42 INFO hive.metastore: Trying to connect to metastore with URI thrift://ip-172-31-27-114.us-east-2.compute.internal:9083 18/02/21 20:17:42 INFO hive.metastore: Opened a connection to metastore, current connections: 1 18/02/21 20:17:42 INFO hive.metastore: Connected to metastore. 18/02/21 20:17:43 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: NoSuchObjectException(message:greg5.sampledata1_orc1 table not found) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:343) at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783) at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: NoSuchObjectException(message:greg5.sampledata1_orc1 table not found) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55064) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55032) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java:54963) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1344) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169) at com.sun.proxy.$Proxy5.getTable(Unknown Source) at org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180) at org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105) at org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88) at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95) ... 15 more The Hive config file has this property: /etc/hive/conf.dist/hive-site.xml <property> <name>hive.metastore.client.factory.class</name> <value>com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory</value> </property> Does anyone have any suggestions? /Greg