Re: Scala vs Python performance differences

2015-01-16 Thread Davies Liu
Hey Phil,

Thank you sharing this. The result didn't surprise me a lot, it's normal to do
the prototype in Python, once it get stable and you really need the performance,
then rewrite part of it in C or whole of it in another language does make sense,
it will not cause you much time.

Davies

On Fri, Jan 16, 2015 at 7:38 AM, philpearl  wrote:
> I was interested in this as I had some Spark code in Python that was too slow
> and wanted to know whether Scala would fix it for me.  So I re-wrote my code
> in Scala.
>
> In my particular case the Scala version was 10 times faster.  But I think
> that is because I did an awful lot of computation in my own code rather than
> in a library like numpy. (I put a bit more detail  here
> <http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala>
> in case you are interested)
>
> So there's one data point, if only for the obvious data point comparing
> computations in Scala to computations in pure Python.
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Scala vs Python performance differences

2015-01-16 Thread philpearl
I was interested in this as I had some Spark code in Python that was too slow
and wanted to know whether Scala would fix it for me.  So I re-wrote my code
in Scala.

In my particular case the Scala version was 10 times faster.  But I think
that is because I did an awful lot of computation in my own code rather than
in a library like numpy. (I put a bit more detail  here
<http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala>  
in case you are interested)

So there's one data point, if only for the obvious data point comparing
computations in Scala to computations in pure Python.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Scala vs Python performance differences

2014-11-12 Thread Samarth Mailinglist
I was about to ask this question.

On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash  wrote:

> Jeremy,
>
> Did you complete this benchmark in a way that's shareable with those
> interested here?
>
> Andrew
>
> On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> I'd also be interested in seeing such a benchmark.
>>
>>
>> On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira 
>> wrote:
>>
>>> This would be super useful. Thanks.
>>>
>>> On 4/15/14, 1:30 AM, "Jeremy Freeman"  wrote:
>>>
>>> >Hi Andrew,
>>> >
>>> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing
>>> on
>>> >ML algorithms, as I'm particularly curious about the relative
>>> performance
>>> >of
>>> >MLlib in Scala vs the Python MLlib API vs pure Python implementations.
>>> >
>>> >Will share real results as soon as I have them, but roughly, in our
>>> hands,
>>> >that 40% number is ballpark correct, at least for some basic operations
>>> >(e.g
>>> >textFile, count, reduce).
>>> >
>>> >-- Jeremy
>>> >
>>> >-
>>> >Jeremy Freeman, PhD
>>> >Neuroscientist
>>> >@thefreemanlab
>>> >
>>> >
>>> >
>>> >--
>>> >View this message in context:
>>> >
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
>>> >mance-differences-tp4247p4261.html
>>> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>
>>>
>>
>


Re: Scala vs Python performance differences

2014-11-12 Thread Andrew Ash
Jeremy,

Did you complete this benchmark in a way that's shareable with those
interested here?

Andrew

On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> I'd also be interested in seeing such a benchmark.
>
>
> On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira 
> wrote:
>
>> This would be super useful. Thanks.
>>
>> On 4/15/14, 1:30 AM, "Jeremy Freeman"  wrote:
>>
>> >Hi Andrew,
>> >
>> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing
>> on
>> >ML algorithms, as I'm particularly curious about the relative performance
>> >of
>> >MLlib in Scala vs the Python MLlib API vs pure Python implementations.
>> >
>> >Will share real results as soon as I have them, but roughly, in our
>> hands,
>> >that 40% number is ballpark correct, at least for some basic operations
>> >(e.g
>> >textFile, count, reduce).
>> >
>> >-- Jeremy
>> >
>> >-
>> >Jeremy Freeman, PhD
>> >Neuroscientist
>> >@thefreemanlab
>> >
>> >
>> >
>> >--
>> >View this message in context:
>> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
>> >mance-differences-tp4247p4261.html
>> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>>
>>
>


Re: Scala vs Python performance differences

2014-04-15 Thread Nicholas Chammas
I'd also be interested in seeing such a benchmark.


On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira wrote:

> This would be super useful. Thanks.
>
> On 4/15/14, 1:30 AM, "Jeremy Freeman"  wrote:
>
> >Hi Andrew,
> >
> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
> >ML algorithms, as I'm particularly curious about the relative performance
> >of
> >MLlib in Scala vs the Python MLlib API vs pure Python implementations.
> >
> >Will share real results as soon as I have them, but roughly, in our hands,
> >that 40% number is ballpark correct, at least for some basic operations
> >(e.g
> >textFile, count, reduce).
> >
> >-- Jeremy
> >
> >-
> >Jeremy Freeman, PhD
> >Neuroscientist
> >@thefreemanlab
> >
> >
> >
> >--
> >View this message in context:
> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
> >mance-differences-tp4247p4261.html
> >Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
>


Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
This would be super useful. Thanks.

On 4/15/14, 1:30 AM, "Jeremy Freeman"  wrote:

>Hi Andrew,
>
>I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
>ML algorithms, as I'm particularly curious about the relative performance
>of
>MLlib in Scala vs the Python MLlib API vs pure Python implementations.
>
>Will share real results as soon as I have them, but roughly, in our hands,
>that 40% number is ballpark correct, at least for some basic operations
>(e.g
>textFile, count, reduce).
>
>-- Jeremy
>
>-
>Jeremy Freeman, PhD
>Neuroscientist
>@thefreemanlab
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
>mance-differences-tp4247p4261.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.




Re: Scala vs Python performance differences

2014-04-14 Thread Jeremy Freeman
Hi Andrew,

I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
ML algorithms, as I'm particularly curious about the relative performance of
MLlib in Scala vs the Python MLlib API vs pure Python implementations. 

Will share real results as soon as I have them, but roughly, in our hands,
that 40% number is ballpark correct, at least for some basic operations (e.g
textFile, count, reduce).

-- Jeremy

-
Jeremy Freeman, PhD
Neuroscientist
@thefreemanlab



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p4261.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Scala vs Python performance differences

2014-04-14 Thread Bin Wang
At least, Spark Streaming doesn't support Python at this moment, right?


On Mon, Apr 14, 2014 at 6:48 PM, Andrew Ash  wrote:

> Hi Spark users,
>
> I've always done all my Spark work in Scala, but occasionally people ask
> about Python and its performance impact vs the same algorithm
> implementation in Scala.
>
> Has anyone done tests to measure the difference?
>
> Anecdotally I've heard Python is a 40% slowdown but that's entirely
> hearsay.
>
> Cheers,
> Andrew
>


Scala vs Python performance differences

2014-04-14 Thread Andrew Ash
Hi Spark users,

I've always done all my Spark work in Scala, but occasionally people ask
about Python and its performance impact vs the same algorithm
implementation in Scala.

Has anyone done tests to measure the difference?

Anecdotally I've heard Python is a 40% slowdown but that's entirely hearsay.

Cheers,
Andrew