[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

Fokko Fri, 16 Feb 2018 00:01:41 -0800

Github user Fokko commented on the issue:

    https://github.com/apache/spark/pull/20057
  
    Hi @gatorsmile, thanks for putting it to the tests. The main reason why I 
personally dislike Sqoop is:
    
    - **Legacy.** The old map-reduce should be buried in the upcoming years. As 
a data engineering consultant, I see more people questioning the whole Hadoop 
stack. Using Sqoop you still need to run map-reduce tasks, and this isn't easy 
on other platforms like kubernetes.
    - **Stability.** I see Sqoop jobs fail quite often, and there isn't a nice 
way of retrying this in an atomic way. For example, when having a Sqoop job on 
Airflow, we cannot simply retry the operation. We when we import data from a 
rmdbs to hdfs, we have to make sure that the target directory of the previous 
run has been deleted.
    
    This is also where Spark-jdbc comes in, for example, in the future we would 
like to delete single partitions, but this is wip. Maybe @danielvdende can 
elaborate a bit on their use-case.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...

Reply via email to