I am using apache sqoop 1.4.6 (distributed with HortonWorks HDP 2.3 package) to import and export data between rdbms systems and hdfs. I have to deploy this in a production environment and was wondering about the network resilience of sqoop.
Say I'm done with about 90% of the import/export job and there is a network failure between the rdbms system and my hadoop cluster. Since sqoop internally executes a map/reduce job for this I'm guessing the job will fail completely and require a manual restart. In this regard I have the following questions 1. Does sqoop perform a clean up of the already imported/exported data? 2. Does sqoop automatically restart the job in the case of network failure? 3. If a manual clean up and restart is required, what other technology alongside sqoop do people generally use to achieve network resilience? 4. Is there a different version of sqoop that offers this feature? Your answers and suggestions would highly appreciated. Thanks!