Re: spark 1.6 new memory management - some issues with tasks not using all executors

Koert Kuipers Sun, 28 Feb 2016 13:16:30 -0800

i find it particularly confusing that a new memory management module would
change the locations. its not like the hash partitioner got replaced. i can
switch back and forth between legacy and "new" memory management and see
the distribution change... fully reproducible


On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com> wrote:

> Hi,
> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
> The data is not evenly distributed across executors, but in my case it
> also reproduced with legacy mode.
> Also tried 1.6.1 rc-1, with same results.
>
> Still looking for resolution.
>
> Lior
>
> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> looking at the cached rdd i see a similar story:
>> with useLegacyMode = true the cached rdd is spread out across 10
>> executors, but with useLegacyMode = false the data for the cached rdd sits
>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>> cached.
>>
>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> hello all,
>>> we are just testing a semi-realtime application (it should return
>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
>>> this it used to run on spark 1.5.1
>>>
>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>>> spark.memory.useLegacyMode = true, however if i switch to
>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
>>> time.
>>>
>>> the issue becomes clear when i focus on a single stage: the individual
>>> tasks are not slower at all, but they run on less executors.
>>> in my test query i have 50 tasks and 10 executors. both with
>>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3
>>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>>> false the tasks run on just 3 executors out of 10, while with useLegacyMode
>>> = true they spread out across 10 executors. all the tasks running on just a
>>> few executors leads to the slower results.
>>>
>>> any idea why this would happen?
>>> thanks! koert
>>>
>>>
>>>
>>
>

Re: spark 1.6 new memory management - some issues with tasks not using all executors

Reply via email to