Re: Scala vs Python performance differences
Hey Phil, Thank you sharing this. The result didn't surprise me a lot, it's normal to do the prototype in Python, once it get stable and you really need the performance, then rewrite part of it in C or whole of it in another language does make sense, it will not cause you much time. Davies On Fri, Jan 16, 2015 at 7:38 AM, philpearl wrote: > I was interested in this as I had some Spark code in Python that was too slow > and wanted to know whether Scala would fix it for me. So I re-wrote my code > in Scala. > > In my particular case the Scala version was 10 times faster. But I think > that is because I did an awful lot of computation in my own code rather than > in a library like numpy. (I put a bit more detail here > <http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala> > in case you are interested) > > So there's one data point, if only for the obvious data point comparing > computations in Scala to computations in pure Python. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Scala vs Python performance differences
I was interested in this as I had some Spark code in Python that was too slow and wanted to know whether Scala would fix it for me. So I re-wrote my code in Scala. In my particular case the Scala version was 10 times faster. But I think that is because I did an awful lot of computation in my own code rather than in a library like numpy. (I put a bit more detail here <http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala> in case you are interested) So there's one data point, if only for the obvious data point comparing computations in Scala to computations in pure Python. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Scala vs Python performance differences
I was about to ask this question. On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash wrote: > Jeremy, > > Did you complete this benchmark in a way that's shareable with those > interested here? > > Andrew > > On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I'd also be interested in seeing such a benchmark. >> >> >> On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira >> wrote: >> >>> This would be super useful. Thanks. >>> >>> On 4/15/14, 1:30 AM, "Jeremy Freeman" wrote: >>> >>> >Hi Andrew, >>> > >>> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing >>> on >>> >ML algorithms, as I'm particularly curious about the relative >>> performance >>> >of >>> >MLlib in Scala vs the Python MLlib API vs pure Python implementations. >>> > >>> >Will share real results as soon as I have them, but roughly, in our >>> hands, >>> >that 40% number is ballpark correct, at least for some basic operations >>> >(e.g >>> >textFile, count, reduce). >>> > >>> >-- Jeremy >>> > >>> >- >>> >Jeremy Freeman, PhD >>> >Neuroscientist >>> >@thefreemanlab >>> > >>> > >>> > >>> >-- >>> >View this message in context: >>> > >>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor >>> >mance-differences-tp4247p4261.html >>> >Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> >>> >> >
Re: Scala vs Python performance differences
Jeremy, Did you complete this benchmark in a way that's shareable with those interested here? Andrew On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I'd also be interested in seeing such a benchmark. > > > On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira > wrote: > >> This would be super useful. Thanks. >> >> On 4/15/14, 1:30 AM, "Jeremy Freeman" wrote: >> >> >Hi Andrew, >> > >> >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing >> on >> >ML algorithms, as I'm particularly curious about the relative performance >> >of >> >MLlib in Scala vs the Python MLlib API vs pure Python implementations. >> > >> >Will share real results as soon as I have them, but roughly, in our >> hands, >> >that 40% number is ballpark correct, at least for some basic operations >> >(e.g >> >textFile, count, reduce). >> > >> >-- Jeremy >> > >> >- >> >Jeremy Freeman, PhD >> >Neuroscientist >> >@thefreemanlab >> > >> > >> > >> >-- >> >View this message in context: >> > >> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor >> >mance-differences-tp4247p4261.html >> >Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> >> >
Re: Scala vs Python performance differences
I'd also be interested in seeing such a benchmark. On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira wrote: > This would be super useful. Thanks. > > On 4/15/14, 1:30 AM, "Jeremy Freeman" wrote: > > >Hi Andrew, > > > >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on > >ML algorithms, as I'm particularly curious about the relative performance > >of > >MLlib in Scala vs the Python MLlib API vs pure Python implementations. > > > >Will share real results as soon as I have them, but roughly, in our hands, > >that 40% number is ballpark correct, at least for some basic operations > >(e.g > >textFile, count, reduce). > > > >-- Jeremy > > > >- > >Jeremy Freeman, PhD > >Neuroscientist > >@thefreemanlab > > > > > > > >-- > >View this message in context: > > > http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor > >mance-differences-tp4247p4261.html > >Sent from the Apache Spark User List mailing list archive at Nabble.com. > > >
Re: Scala vs Python performance differences
This would be super useful. Thanks. On 4/15/14, 1:30 AM, "Jeremy Freeman" wrote: >Hi Andrew, > >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on >ML algorithms, as I'm particularly curious about the relative performance >of >MLlib in Scala vs the Python MLlib API vs pure Python implementations. > >Will share real results as soon as I have them, but roughly, in our hands, >that 40% number is ballpark correct, at least for some basic operations >(e.g >textFile, count, reduce). > >-- Jeremy > >- >Jeremy Freeman, PhD >Neuroscientist >@thefreemanlab > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor >mance-differences-tp4247p4261.html >Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Scala vs Python performance differences
Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python MLlib API vs pure Python implementations. Will share real results as soon as I have them, but roughly, in our hands, that 40% number is ballpark correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p4261.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Scala vs Python performance differences
At least, Spark Streaming doesn't support Python at this moment, right? On Mon, Apr 14, 2014 at 6:48 PM, Andrew Ash wrote: > Hi Spark users, > > I've always done all my Spark work in Scala, but occasionally people ask > about Python and its performance impact vs the same algorithm > implementation in Scala. > > Has anyone done tests to measure the difference? > > Anecdotally I've heard Python is a 40% slowdown but that's entirely > hearsay. > > Cheers, > Andrew >
Scala vs Python performance differences
Hi Spark users, I've always done all my Spark work in Scala, but occasionally people ask about Python and its performance impact vs the same algorithm implementation in Scala. Has anyone done tests to measure the difference? Anecdotally I've heard Python is a 40% slowdown but that's entirely hearsay. Cheers, Andrew