setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks
is there any reference to the benefits of setting reduceLocality to true? i am tempted to disable it across the board. On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang <yy201...@gmail.com> wrote: > The default value for spark.shuffle.reduceLocality.enabled is true. > > To reduce surprise to users of 1.5 and earlier releases, should the > default value be set to false ? > > On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga <lio...@taboola.com> wrote: > >> Hi Koret, >> Try spark.shuffle.reduceLocality.enabled=false >> This is an undocumented configuration. >> See: >> https://github.com/apache/spark/pull/8280 >> https://issues.apache.org/jira/browse/SPARK-10567 >> >> It solved the problem for me (both with and without memory legacy mode) >> >> >> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers <ko...@tresata.com> >> wrote: >> >>> i find it particularly confusing that a new memory management module >>> would change the locations. its not like the hash partitioner got replaced. >>> i can switch back and forth between legacy and "new" memory management and >>> see the distribution change... fully reproducible >>> >>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com> wrote: >>> >>>> Hi, >>>> I've experienced a similar problem upgrading from spark 1.4 to spark >>>> 1.6. >>>> The data is not evenly distributed across executors, but in my case it >>>> also reproduced with legacy mode. >>>> Also tried 1.6.1 rc-1, with same results. >>>> >>>> Still looking for resolution. >>>> >>>> Lior >>>> >>>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com> >>>> wrote: >>>> >>>>> looking at the cached rdd i see a similar story: >>>>> with useLegacyMode = true the cached rdd is spread out across 10 >>>>> executors, but with useLegacyMode = false the data for the cached rdd sits >>>>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value >>>>> RDD that got partitioned (hash partitioner, 50 partitions) before being >>>>> cached. >>>>> >>>>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com> >>>>> wrote: >>>>> >>>>>> hello all, >>>>>> we are just testing a semi-realtime application (it should return >>>>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before >>>>>> this it used to run on spark 1.5.1 >>>>>> >>>>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set >>>>>> spark.memory.useLegacyMode = true, however if i switch to >>>>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% >>>>>> more >>>>>> time. >>>>>> >>>>>> the issue becomes clear when i focus on a single stage: the >>>>>> individual tasks are not slower at all, but they run on less executors. >>>>>> in my test query i have 50 tasks and 10 executors. both with >>>>>> useLegacyMode = true and useLegacyMode = false the tasks finish in about >>>>>> 3 >>>>>> seconds and show as running PROCESS_LOCAL. however when useLegacyMode = >>>>>> false the tasks run on just 3 executors out of 10, while with >>>>>> useLegacyMode >>>>>> = true they spread out across 10 executors. all the tasks running on >>>>>> just a >>>>>> few executors leads to the slower results. >>>>>> >>>>>> any idea why this would happen? >>>>>> thanks! koert >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >