Hi

Thanks for the quick reply.

Regarding my incremental import, do I need to remove the directory every
time when I want to re-import the updated data from PostgreSQL?

Because I am working with some GB size data and want to confirm this.

Thanks.

On Mon, Apr 16, 2012 at 9:48 AM, Krishnan K <[email protected]> wrote:

> Hi Roshan,
>
> If you have run the sqoop command once and even if it fails, it creates an
> output directory in HDFS.
> You can delete this folder (users) and then try running this sqoop command
> once again -
>
> *hadoop dfs -rmr users* and execute this command.
>
> Sqoop first imports the data from PostGRE into HDFS and
> then moves into the hive default directory
> (/user/hive/warehouse/<tablename>)
>
> For sqoop to copy this first into HDFS, you must ensure that it does not
> already exist.
>
> -Krishnan
>
>
> On Mon, Apr 16, 2012 at 4:49 AM, Roshan Pradeep <[email protected]>wrote:
>
>> Hi All
>>
>> I want to import the updated data from my source (PostgreSQL) to hive
>> based on a column (lastmodifiedtime) in postgreSQL
>>
>> *The command I am using*
>>
>> /app/sqoop/bin/sqoop import --hive-table users --connect
>> jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX
>> --password YYYYYY --hive-home /app/hive --hive-import --incremental
>> lastmodified --check-column lastmodifiedtime
>>
>> With the above command, I am getting the below error
>>
>> 12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file:
>> /tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar
>> 12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on
>> column "lastmodifiedtime"
>> 12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13
>> 16:31:21.865429'
>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you are
>> importing from postgresql.
>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be
>> faster! Use the --direct
>> 12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a
>> postgresql-specific fast path.
>> 12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of users
>> 12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException running
>> import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
>> directory users already exists
>>         at
>> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
>>         at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>>         at
>> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
>>         at
>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
>>         at
>> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
>>         at
>> org.apache.sqoop.manager.PostgresqlManager.importTable(PostgresqlManager.java:102)
>>         at
>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
>>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
>>         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>>         at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
>>         at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
>>
>> According to the above stack trace, sqoop it identify the updated data
>> from postgreSQL, but it says output directory already exists. Could someone
>> please help me to correct this issue.
>>
>> *I am using *
>>
>> Hadoop - 0.20.2
>> Hive - 0.8.1
>> Sqoop - 1.4.1-incubating
>>
>>
>> Thanks.
>>
>
>

Reply via email to