Hi All
I want to import the updated data from my source (PostgreSQL) to hive based
on a column (lastmodifiedtime) in postgreSQL
*The command I am using*
/app/sqoop/bin/sqoop import --hive-table users --connect
jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX
--password YYYYYY --hive-home /app/hive --hive-import --incremental
lastmodified --check-column lastmodifiedtime
With the above command, I am getting the below error
12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar
12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on column
"lastmodifiedtime"
12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13
16:31:21.865429'
12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you are
importing from postgresql.
12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be
faster! Use the --direct
12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a
postgresql-specific fast path.
12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of users
12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException running
import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory users already exists
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at
org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
at
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
at
org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
at
org.apache.sqoop.manager.PostgresqlManager.importTable(PostgresqlManager.java:102)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
According to the above stack trace, sqoop it identify the updated data from
postgreSQL, but it says output directory already exists. Could someone
please help me to correct this issue.
*I am using *
Hadoop - 0.20.2
Hive - 0.8.1
Sqoop - 1.4.1-incubating
Thanks.