If you are using Dataframes in PySpark, then the performance will be the
same as Scala.  However, if you need to implement your own UDF, or run a
map() against a DataFrame in Python, then you will pay the penalty for
performance when executing those functions since all of your data has to go
through a gateway to Python and back.

In regards to API features, Scala does get better treatment, but things are
much better in the Python API than it was even 10 months ago.

-Don


On Tue, Oct 6, 2015 at 5:15 PM, dant <dan.tr...@gmail.com> wrote:

> Hi
>
> I'm hearing a common theme running that I should only do serious
> programming
> in Scala on Spark (1.5.1). Real power users use Scala. It is said that
> Python is great for analytics but in the end the code should be written to
> Scala to finalise. There are a number of reasons I'm hearing:
>
> 1. Spark is written in Scala so will always be faster than any other
> language implementation on top of it.
> 2. Spark releases always favour more features being visible and enabled for
> Scala API than Python API.
>
> Are there any truth's to the above? I'm a little sceptical.
>
> Apologies for the duplication, my previous message was held up due to
> subscription issue. Reposting now.
>
> Thanks
> Dan
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Does-feature-parity-exist-between-Spark-and-PySpark-tp24963.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143

Reply via email to