Re: Python vs Scala performance

2014-10-22 Thread Davies Liu
Sorry, there is not, you can try clone from github and build it from scratch, see [1] [1] https://github.com/apache/spark Davies On Wed, Oct 22, 2014 at 2:31 PM, Marius Soutier wrote: > Can’t install that on our cluster, but I can try locally. Is there a > pre-built binary available? > > On 22

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
Can’t install that on our cluster, but I can try locally. Is there a pre-built binary available? On 22.10.2014, at 19:01, Davies Liu wrote: > In the master, you can easily profile you job, find the bottlenecks, > see https://github.com/apache/spark/pull/2556 > > Could you try it and show the s

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
Yeah we’re using Python 2.7.3. On 22.10.2014, at 20:06, Nicholas Chammas wrote: > On Wed, Oct 22, 2014 at 11:34 AM, Eustache DIEMERT > wrote: > > > > Wild guess maybe, but do you decode the json records in Python ? it could be > much slower as the default lib is quite slow. > > > Oh yea

Re: Python vs Scala performance

2014-10-22 Thread Nicholas Chammas
On Wed, Oct 22, 2014 at 11:34 AM, Eustache DIEMERT wrote: Wild guess maybe, but do you decode the json records in Python ? it could > be much slower as the default lib is quite slow. > Oh yeah, this is a good place to look. Also, just upgrading to Python 2.7 may be enough performance improvement

Re: Python vs Scala performance

2014-10-22 Thread Davies Liu
vm. The >>>> python >>>> server bit "translates" the python calls to those in the jvm. The python >>>> spark context is like an adapter to the jvm spark context. If you're seeing >>>> performance discrepancies, this might be the reason

Re: Python vs Scala performance

2014-10-22 Thread Eustache DIEMERT
bit "translates" the python calls to those in the jvm. >>>> The python spark context is like an adapter to the jvm spark context. If >>>> you're seeing performance discrepancies, this might be the reason why. If >>>> the code can be organised to

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
he jvm spark context. If you're seeing >> performance discrepancies, this might be the reason why. If the code can be >> organised to require fewer interactions with the adapter, that may improve >> things. Take this with a pinch of salt...I might be way off on this :) >

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
Didn’t seem to help: conf = SparkConf().set("spark.shuffle.spill", "false").set("spark.default.parallelism", "12") sc = SparkContext(appName=’app_name', conf = conf) but still taking as much time On 22.10.2014, at 14:17, Nicholas Chammas wrote: > Total guess without knowing anything about you

Re: Python vs Scala performance

2014-10-22 Thread Arian Pasquali
t; The python spark context is like an adapter to the jvm spark context. If >>> you're seeing performance discrepancies, this might be the reason why. If >>> the code can be organised to require fewer interactions with the adapter, >>> that may improve things. Take this w

Re: Python vs Scala performance

2014-10-22 Thread Nicholas Chammas
is like an adapter to the jvm spark context. If you're >> seeing performance discrepancies, this might be the reason why. If the code >> can be organised to require fewer interactions with the adapter, that may >> improve things. Take this with a pinch of salt...I might be way of

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
why. If the code can be > organised to require fewer interactions with the adapter, that may improve > things. Take this with a pinch of salt...I might be way off on this :) > > Cheers, > Ashic. > > > From: mps@gmail.com > > Subject: Python vs Scala performa

Re: Python vs Scala performance

2014-10-22 Thread Nicholas Chammas
things. Take this with a pinch of salt...I might be way off on this > :) > > Cheers, > Ashic. > > > From: mps@gmail.com > > Subject: Python vs Scala performance > > Date: Wed, 22 Oct 2014 12:00:41 +0200 > > To: user@spark.apache.org > > > >

RE: Python vs Scala performance

2014-10-22 Thread Ashic Mahtab
t...I might be way off on this :) Cheers, Ashic. > From: mps....@gmail.com > Subject: Python vs Scala performance > Date: Wed, 22 Oct 2014 12:00:41 +0200 > To: user@spark.apache.org > > Hi there, > > we have a small Spark cluster running and are processing around 40 GB of

Python vs Scala performance

2014-10-22 Thread Marius Soutier
Hi there, we have a small Spark cluster running and are processing around 40 GB of Gzip-compressed JSON data per day. I have written a couple of word count-like Scala jobs that essentially pull in all the data, do some joins, group bys and aggregations. A job takes around 40 minutes to complete