Re: Python Spark for full fledged ETL

Matt Deaver Thu, 29 Jun 2017 20:05:15 -0700

While you could do this in Spark it stinks of over-engineering. An ETL tool
would be more appropriate, and if budget is an issue you could look at
alternatives like Pentaho or Talend.


On Thu, Jun 29, 2017 at 8:48 PM, <upkar.ko...@gmail.com> wrote:

> Hi,
>
> One more thing - i am talking about spark in cluster mode without hadoop.
>
> Regards,
> Upkar
>
> Sent from my iPhone
>
> On 30-Jun-2017, at 07:55, upkar.ko...@gmail.com wrote:
>
> Hi,
>
> This is my line of thinking - Spark offers a variety of transformations
> which would support most of the use cases for replacing an ETL tool such as
> Informatica. ET part of ETL is perfectly covered. Loading may generally
> require more functionality though. Spinning up Informatica cluster which
> also has a master slave architecture would cost $$. I know pentaho and
> other such tools are there to support the use case. But, can we do the same
> with spark cluster.
>
> Regards,
> Upkar
>
> Sent from my iPhone
>
> On 29-Jun-2017, at 22:06, Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
> SPARK + JDBC.
>
> But Why?
>
> Regards,
> Gourav Sengupta
>
> On Thu, Jun 29, 2017 at 3:44 PM, upkar_kohli <upkar.ko...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Has anyone tried mixing Spark with some of the other python jdbc/odbc
>> packages to create an end to end ETL framework. Framwork would enable
>> making update, delete and other DML operations along with Stored proc /
>> function calls across variety of databases. Any setup that would be easy to
>> use.
>>
>> I know only know of few odbc python packages that are production ready
>> and widely used, such as pyodbc or sqlAlchemy
>>
>> JayDeBeApi which can interface with JDBC is in Beta stage
>>
>> Would it be a bad use case if this is attempted with foreachpartition
>> through Spark? If not, what could be a good stack for such an
>> implementation using python.
>>
>> Regards,
>> Upkar
>>
>
>


-- 
Regards,

Matt
Data Engineer
https://www.linkedin.com/in/mdeaver
http://mattdeav.pythonanywhere.com/

Re: Python Spark for full fledged ETL

Reply via email to