No reference. I opened a ticket about missing documentation for it, and was
answered by Sean Owen that this is not meant for spark users. I explained
that it's an issue, but no news so far.
As for the memory management, I'm not experienced with it, but I suggest
you read: http://0x0fff.com/spark-m
with the locality issue resolved, i am still struggling with the new memory
management.
i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are
spend in GC. with the legacy memory management (spark.memory.useLegacyMode
= false ) they complete in 1 - 2 seconds.
since we are perm
setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks
is there any reference to the benefits of setting reduceLocality to true? i
am tempted to disable it across the board.
On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang wrote:
> The default value for spark.shuffle.reduceLocality.
The default value for spark.shuffle.reduceLocality.enabled is true.
To reduce surprise to users of 1.5 and earlier releases, should the default
value be set to false ?
On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga wrote:
> Hi Koret,
> Try spark.shuffle.reduceLocality.enabled=false
> This is an un
Hi Koret,
Try spark.shuffle.reduceLocality.enabled=false
This is an undocumented configuration.
See:
https://github.com/apache/spark/pull/8280
https://issues.apache.org/jira/browse/SPARK-10567
It solved the problem for me (both with and without memory legacy mode)
On Sun, Feb 28, 2016 at 11:16 P
i find it particularly confusing that a new memory management module would
change the locations. its not like the hash partitioner got replaced. i can
switch back and forth between legacy and "new" memory management and see
the distribution change... fully reproducible
On Sun, Feb 28, 2016 at 11:2
Hi,
I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
The data is not evenly distributed across executors, but in my case it also
reproduced with legacy mode.
Also tried 1.6.1 rc-1, with same results.
Still looking for resolution.
Lior
On Fri, Feb 19, 2016 at 2:01 AM, Koe
looking at the cached rdd i see a similar story:
with useLegacyMode = true the cached rdd is spread out across 10 executors,
but with useLegacyMode = false the data for the cached rdd sits on only 3
executors (the rest all show 0s). my cached RDD is a key-value RDD that got
partitioned (hash partit
hello all,
we are just testing a semi-realtime application (it should return results
in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it
used to run on spark 1.5.1
in spark 1.6.0 the performance is similar to 1.5.1 if i set
spark.memory.useLegacyMode = true, however if i switc