The default value for spark.shuffle.reduceLocality.enabled is true.

To reduce surprise to users of 1.5 and earlier releases, should the default
value be set to false ?

On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga <lio...@taboola.com> wrote:

> Hi Koret,
> Try spark.shuffle.reduceLocality.enabled=false
> This is an undocumented configuration.
> See:
> https://github.com/apache/spark/pull/8280
> https://issues.apache.org/jira/browse/SPARK-10567
>
> It solved the problem for me (both with and without memory legacy mode)
>
>
> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i find it particularly confusing that a new memory management module
>> would change the locations. its not like the hash partitioner got replaced.
>> i can switch back and forth between legacy and "new" memory management and
>> see the distribution change... fully reproducible
>>
>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga <lio...@taboola.com> wrote:
>>
>>> Hi,
>>> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
>>> The data is not evenly distributed across executors, but in my case it
>>> also reproduced with legacy mode.
>>> Also tried 1.6.1 rc-1, with same results.
>>>
>>> Still looking for resolution.
>>>
>>> Lior
>>>
>>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> looking at the cached rdd i see a similar story:
>>>> with useLegacyMode = true the cached rdd is spread out across 10
>>>> executors, but with useLegacyMode = false the data for the cached rdd sits
>>>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>>>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>>>> cached.
>>>>
>>>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> hello all,
>>>>> we are just testing a semi-realtime application (it should return
>>>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
>>>>> this it used to run on spark 1.5.1
>>>>>
>>>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>>>>> spark.memory.useLegacyMode = true, however if i switch to
>>>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
>>>>> time.
>>>>>
>>>>> the issue becomes clear when i focus on a single stage: the individual
>>>>> tasks are not slower at all, but they run on less executors.
>>>>> in my test query i have 50 tasks and 10 executors. both with
>>>>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3
>>>>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>>>>> false the tasks run on just 3 executors out of 10, while with 
>>>>> useLegacyMode
>>>>> = true they spread out across 10 executors. all the tasks running on just 
>>>>> a
>>>>> few executors leads to the slower results.
>>>>>
>>>>> any idea why this would happen?
>>>>> thanks! koert
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to