Thanks All for your replies. Feature Parity:
MLLib, RDD and dataframes features are totally comparable. Streaming is now at par in functionality too, I believe. However, what really worries me is not having Dataset APIs at all in Python. I think thats a deal breaker. Performance: I do get this bit when RDDs are involved, but not when Data frame is the only construct I am operating on. Dataframe supposed to be language-agnostic in terms of performance. So why people think python is slower? is it because of using UDF? Any other reason? *Is there any kind of benchmarking/stats around Python UDF vs Scala UDF comparison? like the one out there b/w RDDs.* @Kant: I am not comparing ANY applications. I am comparing SPARK applications only. I would be glad to hear your opinion on why pyspark applications will not work, if you have any benchmarks please share if possible. On Fri, Sep 2, 2016 at 12:57 AM, kant kodali <kanth...@gmail.com> wrote: > c'mon man this is no Brainer..Dynamic Typed Languages for Large Code Bases > or Large Scale Distributed Systems makes absolutely no sense. I can write a > 10 page essay on why that wouldn't work so great. you might be wondering > why would spark have it then? well probably because its ease of use for ML > (that would be my best guess). > > > > On Wed, Aug 31, 2016 11:45 PM, AssafMendelson assaf.mendel...@rsa.com > wrote: > >> I believe this would greatly depend on your use case and your familiarity >> with the languages. >> >> >> >> In general, scala would have a much better performance than python and >> not all interfaces are available in python. >> >> That said, if you are planning to use dataframes without any UDF then the >> performance hit is practically nonexistent. >> >> Even if you need UDF, it is possible to write those in scala and wrap >> them for python and still get away without the performance hit. >> >> Python does not have interfaces for UDAFs. >> >> >> >> I believe that if you have large structured data and do not generally >> need UDF/UDAF you can certainly work in python without losing too much. >> >> >> >> >> >> *From:* ayan guha [mailto:[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=27637&i=0>] >> *Sent:* Thursday, September 01, 2016 5:03 AM >> *To:* user >> *Subject:* Scala Vs Python >> >> >> >> Hi Users >> >> >> >> Thought to ask (again and again) the question: While I am building any >> production application, should I use Scala or Python? >> >> >> >> I have read many if not most articles but all seems pre-Spark 2. Anything >> changed with Spark 2? Either pro-scala way or pro-python way? >> >> >> >> I am thinking performance, feature parity and future direction, not so >> much in terms of skillset or ease of use. >> >> >> >> Or, if you think it is a moot point, please say so as well. >> >> >> >> Any real life example, production experience, anecdotes, personal taste, >> profanity all are welcome :) >> >> >> >> -- >> >> Best Regards, >> Ayan Guha >> >> ------------------------------ >> View this message in context: RE: Scala Vs Python >> <http://apache-spark-user-list.1001560.n3.nabble.com/RE-Scala-Vs-Python-tp27637.html> >> Sent from the Apache Spark User List mailing list archive >> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >> > -- Best Regards, Ayan Guha