Re: Python Spark for full fledged ETL

2017-06-29 Thread Matt Deaver
While you could do this in Spark it stinks of over-engineering. An ETL tool would be more appropriate, and if budget is an issue you could look at alternatives like Pentaho or Talend. On Thu, Jun 29, 2017 at 8:48 PM, wrote: > Hi, > > One more thing - i am talking about

Re: Python Spark for full fledged ETL

2017-06-29 Thread upkar . kohli
Hi, One more thing - i am talking about spark in cluster mode without hadoop. Regards, Upkar Sent from my iPhone > On 30-Jun-2017, at 07:55, upkar.ko...@gmail.com wrote: > > Hi, > > This is my line of thinking - Spark offers a variety of transformations which > would support most of the use

Re: Python Spark for full fledged ETL

2017-06-29 Thread upkar . kohli
Hi, This is my line of thinking - Spark offers a variety of transformations which would support most of the use cases for replacing an ETL tool such as Informatica. ET part of ETL is perfectly covered. Loading may generally require more functionality though. Spinning up Informatica cluster

Re: Python Spark for full fledged ETL

2017-06-29 Thread Gourav Sengupta
SPARK + JDBC. But Why? Regards, Gourav Sengupta On Thu, Jun 29, 2017 at 3:44 PM, upkar_kohli wrote: > Hi, > > Has anyone tried mixing Spark with some of the other python jdbc/odbc > packages to create an end to end ETL framework. Framwork would enable > making update,

Python Spark for full fledged ETL

2017-06-29 Thread upkar_kohli
Hi, Has anyone tried mixing Spark with some of the other python jdbc/odbc packages to create an end to end ETL framework. Framwork would enable making update, delete and other DML operations along with Stored proc / function calls across variety of databases. Any setup that would be easy to use.