While you could do this in Spark it stinks of over-engineering. An ETL tool would be more appropriate, and if budget is an issue you could look at alternatives like Pentaho or Talend.
On Thu, Jun 29, 2017 at 8:48 PM, <upkar.ko...@gmail.com> wrote: > Hi, > > One more thing - i am talking about spark in cluster mode without hadoop. > > Regards, > Upkar > > Sent from my iPhone > > On 30-Jun-2017, at 07:55, upkar.ko...@gmail.com wrote: > > Hi, > > This is my line of thinking - Spark offers a variety of transformations > which would support most of the use cases for replacing an ETL tool such as > Informatica. ET part of ETL is perfectly covered. Loading may generally > require more functionality though. Spinning up Informatica cluster which > also has a master slave architecture would cost $$. I know pentaho and > other such tools are there to support the use case. But, can we do the same > with spark cluster. > > Regards, > Upkar > > Sent from my iPhone > > On 29-Jun-2017, at 22:06, Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > > SPARK + JDBC. > > But Why? > > Regards, > Gourav Sengupta > > On Thu, Jun 29, 2017 at 3:44 PM, upkar_kohli <upkar.ko...@gmail.com> > wrote: > >> Hi, >> >> Has anyone tried mixing Spark with some of the other python jdbc/odbc >> packages to create an end to end ETL framework. Framwork would enable >> making update, delete and other DML operations along with Stored proc / >> function calls across variety of databases. Any setup that would be easy to >> use. >> >> I know only know of few odbc python packages that are production ready >> and widely used, such as pyodbc or sqlAlchemy >> >> JayDeBeApi which can interface with JDBC is in Beta stage >> >> Would it be a bad use case if this is attempted with foreachpartition >> through Spark? If not, what could be a good stack for such an >> implementation using python. >> >> Regards, >> Upkar >> > > -- Regards, Matt Data Engineer https://www.linkedin.com/in/mdeaver http://mattdeav.pythonanywhere.com/