Hi all,

I am using sqoop-1.4.2 with cloudera hadoop and doing some tesing. We need
to export some tables from CSV's in HDFS.
As sqoop provides a mechanism of staging tables to write data in main
tables only if all maps are succeeded.

While executing a sqoop job on hadoop , suppose a map fails & hadoop
reattempt the map to re-run and finish after 3 attempts, it results in
 duplicate records in staging table and the job finished but data inserted
is higher than in CSV's. Below is the output :

12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Exported 4071315 records.
12/09/13 14:46:55 INFO mapreduce.ExportJobBase: Starting to migrate data
from staging table to destination.
12/09/13 14:47:29 INFO manager.SqlManager: Migrated 5391315 records from
table1_tmp to table

Is this is a bug in Sqoop and is there any fix or patch for it. Please let
me know.


Thanks

Reply via email to