It's really a very big discussion around Pyspark Vs Scala. I have little bit experience about how we can automate the CI/CD when it's a JVM based language. I would like to take this as an opportunity to understand the end-to-end CI/CD flow for Pyspark based ETL pipelines.
Could someone please list down the steps how the pipeline automation works when it comes to Pyspark based pipelines in Production ? //William On Fri, Oct 23, 2020 at 11:24 AM Wim Van Leuven < wim.vanleu...@highestpoint.biz> wrote: > I think Sean is right, but in your argumentation you mention that > 'functionality > is sacrificed in favour of the availability of resources'. That's where I > disagree with you but agree with Sean. That is mostly not true. > > In your previous posts you also mentioned this . The only reason we > sometimes have to bail out to Scala is for performance with certain udfs > > On Thu, 22 Oct 2020 at 23:11, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Thanks for the feedback Sean. >> >> Kind regards, >> >> Mich >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Thu, 22 Oct 2020 at 20:34, Sean Owen <sro...@gmail.com> wrote: >> >>> I don't find this trolling; I agree with the observation that 'the >>> skills you have' are a valid and important determiner of what tools you >>> pick. >>> I disagree that you just have to pick the optimal tool for everything. >>> Sounds good until that comes in contact with the real world. >>> For Spark, Python vs Scala just doesn't matter a lot, especially if >>> you're doing DataFrame operations. By design. So I can't see there being >>> one answer to this. >>> >>> On Thu, Oct 22, 2020 at 2:23 PM Gourav Sengupta < >>> gourav.sengu...@gmail.com> wrote: >>> >>>> Hi Mich, >>>> >>>> this is turning into a troll now, can you please stop this? >>>> >>>> No one uses Scala where Python should be used, and no one uses Python >>>> where Scala should be used - it all depends on requirements. Everyone >>>> understands polyglot programming and how to use relevant technologies best >>>> to their advantage. >>>> >>>> >>>> Regards, >>>> Gourav Sengupta >>>> >>>> >>>>>> -- Regards, William R +919037075164