Hi Thanks for the quick reply.
Regarding my incremental import, do I need to remove the directory every time when I want to re-import the updated data from PostgreSQL? Because I am working with some GB size data and want to confirm this. Thanks. On Mon, Apr 16, 2012 at 9:48 AM, Krishnan K <[email protected]> wrote: > Hi Roshan, > > If you have run the sqoop command once and even if it fails, it creates an > output directory in HDFS. > You can delete this folder (users) and then try running this sqoop command > once again - > > *hadoop dfs -rmr users* and execute this command. > > Sqoop first imports the data from PostGRE into HDFS and > then moves into the hive default directory > (/user/hive/warehouse/<tablename>) > > For sqoop to copy this first into HDFS, you must ensure that it does not > already exist. > > -Krishnan > > > On Mon, Apr 16, 2012 at 4:49 AM, Roshan Pradeep <[email protected]>wrote: > >> Hi All >> >> I want to import the updated data from my source (PostgreSQL) to hive >> based on a column (lastmodifiedtime) in postgreSQL >> >> *The command I am using* >> >> /app/sqoop/bin/sqoop import --hive-table users --connect >> jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX >> --password YYYYYY --hive-home /app/hive --hive-import --incremental >> lastmodified --check-column lastmodifiedtime >> >> With the above command, I am getting the below error >> >> 12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file: >> /tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar >> 12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on >> column "lastmodifiedtime" >> 12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13 >> 16:31:21.865429' >> 12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you are >> importing from postgresql. >> 12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be >> faster! Use the --direct >> 12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a >> postgresql-specific fast path. >> 12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of users >> 12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException running >> import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output >> directory users already exists >> at >> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123) >> at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770) >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) >> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) >> at >> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141) >> at >> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201) >> at >> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413) >> at >> org.apache.sqoop.manager.PostgresqlManager.importTable(PostgresqlManager.java:102) >> at >> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380) >> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453) >> at org.apache.sqoop.Sqoop.run(Sqoop.java:145) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) >> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) >> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) >> at org.apache.sqoop.Sqoop.main(Sqoop.java:238) >> at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57) >> >> According to the above stack trace, sqoop it identify the updated data >> from postgreSQL, but it says output directory already exists. Could someone >> please help me to correct this issue. >> >> *I am using * >> >> Hadoop - 0.20.2 >> Hive - 0.8.1 >> Sqoop - 1.4.1-incubating >> >> >> Thanks. >> > >
