Hi, I am using Sqoop-1.4.2 from the past few days in a hadoop cluster of 10 nodes. As per the documentation of sqoop 9.4 Export & Transactions , the export operation is not atomic in database becuase it creates separate transactions to insert records.
Fore.g if a map task failed to export transaction while others succeeded , it would lead to partial & incomplete results in database tables. I created a script in bash to load data from a CSV ( daily csvs ) of 500 thousand records into db in which i delete the records of the day csvs before loading the csv into db so that if there is issue while loading a day CSV , we get correct results by again running the job. Can we achieve the same functionality in Sqoop , so that if a job in sqoop fails some map tasks, we achive correct & complete ( no duplicates ) records in db. Thanks
