Re: Sqoop with Hcat integration on AWS EMR with AWS Glue Data Catalog

2018-02-28 Thread Mario Amatucci
hi
no idea but as a test try to add some data manually to the table try to use
sqoop export, if that does not work too likely it is about db and table
permission on r/w

sent from honor8

On 22 Feb 2018 22:24, "Greg Lindholm"  wrote:

> Has anyone managed to get Sqoop with hcatalog integration working on AWS
> EMR when Hive is configured to use AWS Glue Data Catalog?
>
> I'm attempting to import from a MySQL db into Hive on an AWS EMR cluster.
> Hive is configured to use AWS Glue Data Catalog as the metadata catalog.
>
> sqoop import \
>   -Dmapred.output.direct.NativeS3FileSystem=false \
>   -Dmapred.output.direct.EmrFileSystem=false \
>   --connect jdbc:mysql://ec2-18-221-214-250.us-east-2.compute.
> amazonaws.com:3306/test1 \
>   --username XXX -P \
>   -m 1 \
>   --table sampledata1 \
>   --hcatalog-database greg5 \
>   --hcatalog-table sampledata1_orc1 \
>   --create-hcatalog-table \
>   --hcatalog-storage-stanza 'stored as orc'
>
> It appears that the EMR setup wizard properly configures Hive to use the
> Glue Data Catalog but not Sqoop.
>
> I had to add the Glue jar to Sqoop:
> sudo ln -s /usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive2-client.jar
> /usr/lib/sqoop/lib/aws-glue-datacatalog-hive2-client.jar
>
> When I run the above Sqoop command the table gets created but the import
> then fails with and exception saying it can't find the table.
>
> I've checked in Glue (and Hive) and the table is created correctly.
>
> Here is the exception:
> 18/02/21 20:17:41 INFO conf.HiveConf: Found configuration file
> file:/etc/hive/conf.dist/hive-site.xml
> 18/02/21 20:17:42 INFO common.HiveClientCache: Initializing cache:
> eviction-timeout=120 initial-capacity=50 maximum-capacity=50
> 18/02/21 20:17:42 INFO hive.metastore: Trying to connect to metastore with
> URI thrift://ip-172-31-27-114.us-east-2.compute.internal:9083
> 18/02/21 20:17:42 INFO hive.metastore: Opened a connection to metastore,
> current connections: 1
> 18/02/21 20:17:42 INFO hive.metastore: Connected to metastore.
> 18/02/21 20:17:43 ERROR tool.ImportTool: Encountered IOException running
> import job: java.io.IOException: 
> NoSuchObjectException(message:greg5.sampledata1_orc1
> table not found)
> at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.
> setInput(HCatInputFormat.java:97)
> at org.apache.hive.hcatalog.mapreduce.HCatInputFormat.
> setInput(HCatInputFormat.java:51)
> at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.
> configureHCat(SqoopHCatUtilities.java:343)
> at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.
> configureImportOutputFormat(SqoopHCatUtilities.java:783)
> at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(
> ImportJobBase.java:98)
> at org.apache.sqoop.mapreduce.ImportJobBase.runImport(
> ImportJobBase.java:259)
> at org.apache.sqoop.manager.SqlManager.importTable(
> SqlManager.java:673)
> at org.apache.sqoop.manager.MySQLManager.importTable(
> MySQLManager.java:118)
> at org.apache.sqoop.tool.ImportTool.importTable(
> ImportTool.java:497)
> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> Caused by: NoSuchObjectException(message:greg5.sampledata1_orc1 table not
> found)
> at org.apache.hadoop.hive.metastore.api.
> ThriftHiveMetastore$get_table_req_result$get_table_req_
> resultStandardScheme.read(ThriftHiveMetastore.java:55064)
> at org.apache.hadoop.hive.metastore.api.
> ThriftHiveMetastore$get_table_req_result$get_table_req_
> resultStandardScheme.read(ThriftHiveMetastore.java:55032)
> at org.apache.hadoop.hive.metastore.api.
> ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java:
> 54963)
> at org.apache.thrift.TServiceClient.receiveBase(
> TServiceClient.java:86)
> at org.apache.hadoop.hive.metastore.api.
> ThriftHiveMetastore$Client.recv_get_table_req(
> ThriftHiveMetastore.java:1563)
> at org.apache.hadoop.hive.metastore.api.
> ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
> at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.
> getTable(HiveMetaStoreClient.java:1344)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.
> 

Sqoop with Hcat integration on AWS EMR with AWS Glue Data Catalog

2018-02-22 Thread Greg Lindholm
Has anyone managed to get Sqoop with hcatalog integration working on AWS
EMR when Hive is configured to use AWS Glue Data Catalog?

I'm attempting to import from a MySQL db into Hive on an AWS EMR cluster.
Hive is configured to use AWS Glue Data Catalog as the metadata catalog.

sqoop import \
  -Dmapred.output.direct.NativeS3FileSystem=false \
  -Dmapred.output.direct.EmrFileSystem=false \
  --connect jdbc:mysql://
ec2-18-221-214-250.us-east-2.compute.amazonaws.com:3306/test1 \
  --username XXX -P \
  -m 1 \
  --table sampledata1 \
  --hcatalog-database greg5 \
  --hcatalog-table sampledata1_orc1 \
  --create-hcatalog-table \
  --hcatalog-storage-stanza 'stored as orc'

It appears that the EMR setup wizard properly configures Hive to use the
Glue Data Catalog but not Sqoop.

I had to add the Glue jar to Sqoop:
sudo ln -s
/usr/share/aws/hmclient/lib/aws-glue-datacatalog-hive2-client.jar
/usr/lib/sqoop/lib/aws-glue-datacatalog-hive2-client.jar

When I run the above Sqoop command the table gets created but the import
then fails with and exception saying it can't find the table.

I've checked in Glue (and Hive) and the table is created correctly.

Here is the exception:
18/02/21 20:17:41 INFO conf.HiveConf: Found configuration file
file:/etc/hive/conf.dist/hive-site.xml
18/02/21 20:17:42 INFO common.HiveClientCache: Initializing cache:
eviction-timeout=120 initial-capacity=50 maximum-capacity=50
18/02/21 20:17:42 INFO hive.metastore: Trying to connect to metastore with
URI thrift://ip-172-31-27-114.us-east-2.compute.internal:9083
18/02/21 20:17:42 INFO hive.metastore: Opened a connection to metastore,
current connections: 1
18/02/21 20:17:42 INFO hive.metastore: Connected to metastore.
18/02/21 20:17:43 ERROR tool.ImportTool: Encountered IOException running
import job: java.io.IOException:
NoSuchObjectException(message:greg5.sampledata1_orc1 table not found)
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:97)
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:51)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:343)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:783)
at
org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
at
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:259)
at
org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at
org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: NoSuchObjectException(message:greg5.sampledata1_orc1 table not
found)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55064)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:55032)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java:54963)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:1563)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:1550)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1344)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
at com.sun.proxy.$Proxy5.getTable(Unknown Source)
at
org.apache.hive.hcatalog.common.HCatUtil.getTable(HCatUtil.java:180)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.getInputJobInfo(InitializeInput.java:105)
at
org.apache.hive.hcatalog.mapreduce.InitializeInput.setInput(InitializeInput.java:88)
at
org.apache.hive.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95)
... 15 more

The Hive