Hi Adarsh,
you can achieve similar functionality with sqoop using several ways, based on 
the connector that you will use:

1) You can always manually (or by script) remove previously imported data if 
you know how to easily identify them prior executing sqoop. E.g. you might 
create script that will remove previously imported data (if present) and then 
execute sqoop.

2) You can benefit from staging table using parameters --staging-table and 
--clear-staging-table. This way, sqoop will firstly import your data in 
parallel to staging table and promote them to destination table only if all 
parallel execution threads will succeed. Please note that staging option is not 
available in all connectors (typically direct connectors are not supporting it).

3) Lastly, you might use "upsert" functionality. Some connectors (MySQL, 
Oracle) are supporting --update-mode allowinsert which will either insert new 
row or update the previous one if it's present in the table already.  Please 
note that this solution have the worst performance from all others.

Jarcec

On Sun, Sep 09, 2012 at 12:42:45PM +0530, Adarsh Sharma wrote:
> Hi,
> 
> I am using Sqoop-1.4.2 from the past few days in a hadoop cluster of 10
> nodes.
> As per the documentation of sqoop 9.4 Export & Transactions , the export
> operation is not atomic in database becuase it creates separate
> transactions to insert records.
> 
> Fore.g if a map task failed to export transaction while others succeeded ,
> it would lead to partial & incomplete results in database tables.
> 
> I created a script in bash to load data from a CSV ( daily csvs ) of  500
> thousand records into db in which i delete the records of the  day csvs
> before loading the csv into db so that if there is issue while loading a
> day CSV , we get correct results by again running the job.
> 
> Can we achieve the same functionality in Sqoop , so that if a job in sqoop
> fails some map tasks, we achive correct & complete ( no duplicates )
>  records  in db.
> 
> 
> Thanks

Attachment: signature.asc
Description: Digital signature

Reply via email to