Re: spark 1.6 new memory management - some issues with tasks not using all executors

Koert Kuipers Mon, 29 Feb 2016 09:48:07 -0800

setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks


is there any reference to the benefits of setting reduceLocality to true? i
am tempted to disable it across the board.

On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang <yy201...@gmail.com> wrote:

> The default value for spark.shuffle.reduceLocality.enabled is true.
>
> To reduce surprise to users of 1.5 and earlier releases, should the
> default value be set to false ?
>
> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga <lio...@taboola.com> wrote:
>
>> Hi Koret,
>> Try spark.shuffle.reduceLocality.enabled=false
>> This is an undocumented configuration.
>> See:
>> https://github.com/apache/spark/pull/8280
>> https://issues.apache.org/jira/browse/SPARK-10567
>>
>> It solved the problem for me (both with and without memory legacy mode)
>>
>>
>> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>>> i find it particularly confusing that a new memory management module
>>> would change the locations. its not like the hash partitioner got replaced.
>>> i can switch back and forth between legacy and "new" memory management and
>>> see the distribution change... fully reproducible
>>>
>>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com> wrote:
>>>
>>>> Hi,
>>>> I've experienced a similar problem upgrading from spark 1.4 to spark
>>>> 1.6.
>>>> The data is not evenly distributed across executors, but in my case it
>>>> also reproduced with legacy mode.
>>>> Also tried 1.6.1 rc-1, with same results.
>>>>
>>>> Still looking for resolution.
>>>>
>>>> Lior
>>>>
>>>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> looking at the cached rdd i see a similar story:
>>>>> with useLegacyMode = true the cached rdd is spread out across 10
>>>>> executors, but with useLegacyMode = false the data for the cached rdd sits
>>>>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>>>>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>>>>> cached.
>>>>>
>>>>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> hello all,
>>>>>> we are just testing a semi-realtime application (it should return
>>>>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
>>>>>> this it used to run on spark 1.5.1
>>>>>>
>>>>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>>>>>> spark.memory.useLegacyMode = true, however if i switch to
>>>>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% 
>>>>>> more
>>>>>> time.
>>>>>>
>>>>>> the issue becomes clear when i focus on a single stage: the
>>>>>> individual tasks are not slower at all, but they run on less executors.
>>>>>> in my test query i have 50 tasks and 10 executors. both with
>>>>>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 
>>>>>> 3
>>>>>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>>>>>> false the tasks run on just 3 executors out of 10, while with 
>>>>>> useLegacyMode
>>>>>> = true they spread out across 10 executors. all the tasks running on 
>>>>>> just a
>>>>>> few executors leads to the slower results.
>>>>>>
>>>>>> any idea why this would happen?
>>>>>> thanks! koert
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: spark 1.6 new memory management - some issues with tasks not using all executors

Reply via email to