I am using apache sqoop 1.4.6 (distributed with HortonWorks HDP 2.3
package) to import and export data between rdbms systems and hdfs. I have
to deploy this in a production environment and was wondering about the
network resilience of sqoop.

Say I'm done with about 90% of the import/export job and there is a network
failure between the rdbms system and my hadoop cluster. Since sqoop
internally executes a map/reduce job for this I'm guessing the job will
fail completely and require a manual restart. In this regard I have the
following questions

   1. Does sqoop perform a clean up of the already imported/exported data?
   2. Does sqoop automatically restart the job in the case of network
   failure?
   3. If a manual clean up and restart is required, what other technology
   alongside sqoop do people generally use to achieve network resilience?
   4. Is there a different version of sqoop that offers this feature?

Your answers and suggestions would highly appreciated.

Thanks!

Reply via email to