Sorry, there is not, you can try clone from github and build it from
scratch, see [1]
[1] https://github.com/apache/spark
Davies
On Wed, Oct 22, 2014 at 2:31 PM, Marius Soutier wrote:
> Can’t install that on our cluster, but I can try locally. Is there a
> pre-built binary available?
>
> On 22
Can’t install that on our cluster, but I can try locally. Is there a pre-built
binary available?
On 22.10.2014, at 19:01, Davies Liu wrote:
> In the master, you can easily profile you job, find the bottlenecks,
> see https://github.com/apache/spark/pull/2556
>
> Could you try it and show the s
Yeah we’re using Python 2.7.3.
On 22.10.2014, at 20:06, Nicholas Chammas wrote:
> On Wed, Oct 22, 2014 at 11:34 AM, Eustache DIEMERT
> wrote:
>
>
>
> Wild guess maybe, but do you decode the json records in Python ? it could be
> much slower as the default lib is quite slow.
>
>
> Oh yea
On Wed, Oct 22, 2014 at 11:34 AM, Eustache DIEMERT
wrote:
Wild guess maybe, but do you decode the json records in Python ? it could
> be much slower as the default lib is quite slow.
>
Oh yeah, this is a good place to look. Also, just upgrading to Python 2.7
may be enough performance improvement
vm. The
>>>> python
>>>> server bit "translates" the python calls to those in the jvm. The python
>>>> spark context is like an adapter to the jvm spark context. If you're seeing
>>>> performance discrepancies, this might be the reason
bit "translates" the python calls to those in the jvm.
>>>> The python spark context is like an adapter to the jvm spark context. If
>>>> you're seeing performance discrepancies, this might be the reason why. If
>>>> the code can be organised to
he jvm spark context. If you're seeing
>> performance discrepancies, this might be the reason why. If the code can be
>> organised to require fewer interactions with the adapter, that may improve
>> things. Take this with a pinch of salt...I might be way off on this :)
>
Didn’t seem to help:
conf = SparkConf().set("spark.shuffle.spill",
"false").set("spark.default.parallelism", "12")
sc = SparkContext(appName=’app_name', conf = conf)
but still taking as much time
On 22.10.2014, at 14:17, Nicholas Chammas wrote:
> Total guess without knowing anything about you
t; The python spark context is like an adapter to the jvm spark context. If
>>> you're seeing performance discrepancies, this might be the reason why. If
>>> the code can be organised to require fewer interactions with the adapter,
>>> that may improve things. Take this w
is like an adapter to the jvm spark context. If you're
>> seeing performance discrepancies, this might be the reason why. If the code
>> can be organised to require fewer interactions with the adapter, that may
>> improve things. Take this with a pinch of salt...I might be way of
why. If the code can be
> organised to require fewer interactions with the adapter, that may improve
> things. Take this with a pinch of salt...I might be way off on this :)
>
> Cheers,
> Ashic.
>
> > From: mps@gmail.com
> > Subject: Python vs Scala performa
things. Take this with a pinch of salt...I might be way off on this
> :)
>
> Cheers,
> Ashic.
>
> > From: mps@gmail.com
> > Subject: Python vs Scala performance
> > Date: Wed, 22 Oct 2014 12:00:41 +0200
> > To: user@spark.apache.org
>
> >
>
t...I might be way off on this :)
Cheers,
Ashic.
> From: mps....@gmail.com
> Subject: Python vs Scala performance
> Date: Wed, 22 Oct 2014 12:00:41 +0200
> To: user@spark.apache.org
>
> Hi there,
>
> we have a small Spark cluster running and are processing around 40 GB of
Hi there,
we have a small Spark cluster running and are processing around 40 GB of
Gzip-compressed JSON data per day. I have written a couple of word count-like
Scala jobs that essentially pull in all the data, do some joins, group bys and
aggregations. A job takes around 40 minutes to complete
14 matches
Mail list logo