Scala and Python have their advantages and disadvantages with Spark. In my experience with performance is super important you’ll end up needing to do some of your work in the JVM, but in many situations what matters work is what your team and company are familiar with and the ecosystem of tooling for your domain.
Since that can change so much between people and projects I think arguing about the one true language is likely to be unproductive. We’re all here because we want Spark and more broadly open source data tooling to succeed — let’s keep that in mind. There is far too much stress in the world, and I know I’ve sometimes used word choices I regret especially this year. Let’s all take the weekend to do something we enjoy away from Spark :) On Sat, Oct 17, 2020 at 7:58 AM "Yuri Oleynikov (יורי אולייניקוב)" < yur...@gmail.com> wrote: > It seems that thread converted to holy war that has nothing to do with > original question. If it is, it’s super disappointing > > Отправлено с iPhone > > > 17 окт. 2020 г., в 15:53, Molotch <ma...@kth.se> написал(а): > > > > I would say the pros and cons of Python vs Scala is both down to Spark, > the > > languages in themselves and what kind of data engineer you will get when > you > > try to hire for the different solutions. > > > > With Pyspark you get less functionality and increased complexity with the > > py4j java interop compared to vanilla Spark. Why would you want that? > Maybe > > you want the Python ML tools and have a clear use case, then go for it. > If > > not, avoid the increased complexity and reduced functionality of Pyspark. > > > > Python vs Scala? Idiomatic Python is a lesson in bad programming > > habits/ideas, there's no other way to put it. Do you really want > programmers > > enjoying coding i such a language hacking away at your system? > > > > Scala might be far from perfect with the plethora of ways to express > > yourself. But Python < 3.5 is not fit for anything except simple > scripting > > IMO. > > > > Doing exploratory data analysis in a Jupiter notebook, Pyspark seems > like a > > fine idea. Coding an entire ETL library including state management, the > > whole kitchen including the sink, Scala everyday of the week. > > > > > > > > -- > > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau