Hi, I am running the Cloudera CDH3 Hive distribution in pseudo-distributed mode on my local Mac OS Lion laptop. Hive generally works fine except when I use it together with Sqoop. A command like
sqoop import --connect jdbc:mysql://localhost/db --username root --password foobar --table sometable --warehouse-dir /user/hive/warehouse completes succesfully and generates part_files, a _logs directory and a _SUCCESS file in the hive warehouse directory on HDFS. However, when I add the --import-hive part to the Sqoop command, the import still works but Hive seems to get into an infinite loop. Looking at the logs I find entries 2011-11-28 22:54:57,279 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /user/hive/warehouse/sometable/_SUCCESS to /user/hive/warehouse/sometable/_SUCCESS_copy_2 because source does not exist 2011-11-28 22:54:57,281 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /user/hive/warehouse/sometable/_SUCCESS to /user/hive/warehouse/sometable/_SUCCESS_copy_3 because source does not exist 2011-11-28 22:54:57,282 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /user/hive/warehouse/sometable/_SUCCESS to /user/hive/warehouse/sometable/_SUCCESS_copy_4 because source does not exist I started digging into the source code and can trace it back to ql/metadata/Hive.java:checkPaths which tries to find a name for a _SUCCESS file during the actual Hive load but somehow fails because the Sqoop import MR job already created a _SUCCESS file. I already tried disabling MR creation of _SUCCESS files but Hive seems to wait for that file to kick off the Hive import and hence fails. Does anyone have any suggestions on where to search next? Thanks! Jurgen