Hi Sqoop Users, I was attempting to Sqoop import with HCat on an AWS EMR cluster. I was importing from a MySQL database and writing to a S3 location.
sudo sqoop import \ --connect jdbc:mysql://xxx.us-east-2.compute.amazonaws.com:3306/test1 \ --username xxx -P\ --table sampledata1 \ --hcatalog-database greg3 \ --hcatalog-table sampledata1_orc1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as orc' The database (greg3) was created in hive with a location to an S3 bucket. The sqoop job would run and succeed but the data file was never being written. The table was created correctly in Hive HCatalog and the table folders were created on S3 but no data file was being written. I found the solution buried in a page on HCatalog under EMR. You have to set these mapred config values to "Disable Direct Write When Using HCatalog HStorer" -Dmapred.output.direct.NativeS3FileSystem=false \ -Dmapred.output.direct.EmrFileSystem=false \ Here is the link: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hcatalog-using.html Hopefully this will save someone else a lot of trouble. /Greg