i find it particularly confusing that a new memory management module would change the locations. its not like the hash partitioner got replaced. i can switch back and forth between legacy and "new" memory management and see the distribution change... fully reproducible
On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com> wrote: > Hi, > I've experienced a similar problem upgrading from spark 1.4 to spark 1.6. > The data is not evenly distributed across executors, but in my case it > also reproduced with legacy mode. > Also tried 1.6.1 rc-1, with same results. > > Still looking for resolution. > > Lior > > On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com> wrote: > >> looking at the cached rdd i see a similar story: >> with useLegacyMode = true the cached rdd is spread out across 10 >> executors, but with useLegacyMode = false the data for the cached rdd sits >> on only 3 executors (the rest all show 0s). my cached RDD is a key-value >> RDD that got partitioned (hash partitioner, 50 partitions) before being >> cached. >> >> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> hello all, >>> we are just testing a semi-realtime application (it should return >>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before >>> this it used to run on spark 1.5.1 >>> >>> in spark 1.6.0 the performance is similar to 1.5.1 if i set >>> spark.memory.useLegacyMode = true, however if i switch to >>> spark.memory.useLegacyMode = false the queries take about 50% to 100% more >>> time. >>> >>> the issue becomes clear when i focus on a single stage: the individual >>> tasks are not slower at all, but they run on less executors. >>> in my test query i have 50 tasks and 10 executors. both with >>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3 >>> seconds and show as running PROCESS_LOCAL. however when useLegacyMode = >>> false the tasks run on just 3 executors out of 10, while with useLegacyMode >>> = true they spread out across 10 executors. all the tasks running on just a >>> few executors leads to the slower results. >>> >>> any idea why this would happen? >>> thanks! koert >>> >>> >>> >> >