If you are using Dataframes in PySpark, then the performance will be the same as Scala. However, if you need to implement your own UDF, or run a map() against a DataFrame in Python, then you will pay the penalty for performance when executing those functions since all of your data has to go through a gateway to Python and back.
In regards to API features, Scala does get better treatment, but things are much better in the Python API than it was even 10 months ago. -Don On Tue, Oct 6, 2015 at 5:15 PM, dant <dan.tr...@gmail.com> wrote: > Hi > > I'm hearing a common theme running that I should only do serious > programming > in Scala on Spark (1.5.1). Real power users use Scala. It is said that > Python is great for analytics but in the end the code should be written to > Scala to finalise. There are a number of reasons I'm hearing: > > 1. Spark is written in Scala so will always be faster than any other > language implementation on top of it. > 2. Spark releases always favour more features being visible and enabled for > Scala API than Python API. > > Are there any truth's to the above? I'm a little sceptical. > > Apologies for the duplication, my previous message was held up due to > subscription issue. Reposting now. > > Thanks > Dan > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Donald Drake Drake Consulting http://www.drakeconsulting.com/ https://twitter.com/dondrake <http://www.MailLaunder.com/> 800-733-2143