Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-03-03 Thread Lior Chaga
No reference. I opened a ticket about missing documentation for it, and was
answered by Sean Owen that this is not meant for spark users. I explained
that it's an issue, but no news so far.

As for the memory management, I'm not experienced with it, but I suggest
you read: http://0x0fff.com/spark-memory-management/ and
http://0x0fff.com/spark-architecture/
Could be that the effective default storage memory in spark 1.6 is a bit
lower than in spark 1.5, and your application can't borrow from the
execution memory.



On Thu, Mar 3, 2016 at 2:35 AM, Koert Kuipers  wrote:

> with the locality issue resolved, i am still struggling with the new
> memory management.
>
> i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are
> spend in GC. with the legacy memory management (spark.memory.useLegacyMode
> = false ) they complete in 1 - 2 seconds.
>
> since we are permanently caching a very large number of RDDs, my suspicion
> is that with the new memory management these cached RDDs happily gobble up
> all the memory, and need to be evicted to run my small job, leading to the
> slowness.
>
> i can revert to legacy memory management mode, so this is not an issue,
> but i am worried that at some point the legacy memory management will be
> deprecated and then i am stuck with this performance issue.
>
> On Mon, Feb 29, 2016 at 12:47 PM, Koert Kuipers  wrote:
>
>> setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks
>>
>>
>> is there any reference to the benefits of setting reduceLocality to true?
>> i am tempted to disable it across the board.
>>
>> On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang  wrote:
>>
>>> The default value for spark.shuffle.reduceLocality.enabled is true.
>>>
>>> To reduce surprise to users of 1.5 and earlier releases, should the
>>> default value be set to false ?
>>>
>>> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga  wrote:
>>>
 Hi Koret,
 Try spark.shuffle.reduceLocality.enabled=false
 This is an undocumented configuration.
 See:
 https://github.com/apache/spark/pull/8280
 https://issues.apache.org/jira/browse/SPARK-10567

 It solved the problem for me (both with and without memory legacy mode)


 On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers 
 wrote:

> i find it particularly confusing that a new memory management module
> would change the locations. its not like the hash partitioner got 
> replaced.
> i can switch back and forth between legacy and "new" memory management and
> see the distribution change... fully reproducible
>
> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga 
> wrote:
>
>> Hi,
>> I've experienced a similar problem upgrading from spark 1.4 to spark
>> 1.6.
>> The data is not evenly distributed across executors, but in my case
>> it also reproduced with legacy mode.
>> Also tried 1.6.1 rc-1, with same results.
>>
>> Still looking for resolution.
>>
>> Lior
>>
>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers 
>> wrote:
>>
>>> looking at the cached rdd i see a similar story:
>>> with useLegacyMode = true the cached rdd is spread out across 10
>>> executors, but with useLegacyMode = false the data for the cached rdd 
>>> sits
>>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>>> cached.
>>>
>>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers 
>>> wrote:
>>>
 hello all,
 we are just testing a semi-realtime application (it should return
 results in less than 20 seconds from cached RDDs) on spark 1.6.0. 
 before
 this it used to run on spark 1.5.1

 in spark 1.6.0 the performance is similar to 1.5.1 if i set
 spark.memory.useLegacyMode = true, however if i switch to
 spark.memory.useLegacyMode = false the queries take about 50% to 100% 
 more
 time.

 the issue becomes clear when i focus on a single stage: the
 individual tasks are not slower at all, but they run on less executors.
 in my test query i have 50 tasks and 10 executors. both with
 useLegacyMode = true and useLegacyMode = false the tasks finish in 
 about 3
 seconds and show as running PROCESS_LOCAL. however when  useLegacyMode 
 =
 false the tasks run on just 3 executors out of 10, while with 
 useLegacyMode
 = true they spread out across 10 executors. all the tasks running on 
 just a
 few executors leads to the slower results.

 any idea why this would happen?
 thanks! koert



>>>
>>
>

>>>
>>
>


Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-03-02 Thread Koert Kuipers
with the locality issue resolved, i am still struggling with the new memory
management.

i am seeing tasks on tiny amounts of data take 15 seconds, of which 14 are
spend in GC. with the legacy memory management (spark.memory.useLegacyMode
= false ) they complete in 1 - 2 seconds.

since we are permanently caching a very large number of RDDs, my suspicion
is that with the new memory management these cached RDDs happily gobble up
all the memory, and need to be evicted to run my small job, leading to the
slowness.

i can revert to legacy memory management mode, so this is not an issue, but
i am worried that at some point the legacy memory management will be
deprecated and then i am stuck with this performance issue.

On Mon, Feb 29, 2016 at 12:47 PM, Koert Kuipers  wrote:

> setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks
>
>
> is there any reference to the benefits of setting reduceLocality to true?
> i am tempted to disable it across the board.
>
> On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang  wrote:
>
>> The default value for spark.shuffle.reduceLocality.enabled is true.
>>
>> To reduce surprise to users of 1.5 and earlier releases, should the
>> default value be set to false ?
>>
>> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga  wrote:
>>
>>> Hi Koret,
>>> Try spark.shuffle.reduceLocality.enabled=false
>>> This is an undocumented configuration.
>>> See:
>>> https://github.com/apache/spark/pull/8280
>>> https://issues.apache.org/jira/browse/SPARK-10567
>>>
>>> It solved the problem for me (both with and without memory legacy mode)
>>>
>>>
>>> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers 
>>> wrote:
>>>
 i find it particularly confusing that a new memory management module
 would change the locations. its not like the hash partitioner got replaced.
 i can switch back and forth between legacy and "new" memory management and
 see the distribution change... fully reproducible

 On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga 
 wrote:

> Hi,
> I've experienced a similar problem upgrading from spark 1.4 to spark
> 1.6.
> The data is not evenly distributed across executors, but in my case it
> also reproduced with legacy mode.
> Also tried 1.6.1 rc-1, with same results.
>
> Still looking for resolution.
>
> Lior
>
> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers 
> wrote:
>
>> looking at the cached rdd i see a similar story:
>> with useLegacyMode = true the cached rdd is spread out across 10
>> executors, but with useLegacyMode = false the data for the cached rdd 
>> sits
>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>> cached.
>>
>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers 
>> wrote:
>>
>>> hello all,
>>> we are just testing a semi-realtime application (it should return
>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
>>> this it used to run on spark 1.5.1
>>>
>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>>> spark.memory.useLegacyMode = true, however if i switch to
>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% 
>>> more
>>> time.
>>>
>>> the issue becomes clear when i focus on a single stage: the
>>> individual tasks are not slower at all, but they run on less executors.
>>> in my test query i have 50 tasks and 10 executors. both with
>>> useLegacyMode = true and useLegacyMode = false the tasks finish in 
>>> about 3
>>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>>> false the tasks run on just 3 executors out of 10, while with 
>>> useLegacyMode
>>> = true they spread out across 10 executors. all the tasks running on 
>>> just a
>>> few executors leads to the slower results.
>>>
>>> any idea why this would happen?
>>> thanks! koert
>>>
>>>
>>>
>>
>

>>>
>>
>


Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Koert Kuipers
setting spark.shuffle.reduceLocality.enabled=false worked for me, thanks


is there any reference to the benefits of setting reduceLocality to true? i
am tempted to disable it across the board.

On Mon, Feb 29, 2016 at 9:51 AM, Yin Yang  wrote:

> The default value for spark.shuffle.reduceLocality.enabled is true.
>
> To reduce surprise to users of 1.5 and earlier releases, should the
> default value be set to false ?
>
> On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga  wrote:
>
>> Hi Koret,
>> Try spark.shuffle.reduceLocality.enabled=false
>> This is an undocumented configuration.
>> See:
>> https://github.com/apache/spark/pull/8280
>> https://issues.apache.org/jira/browse/SPARK-10567
>>
>> It solved the problem for me (both with and without memory legacy mode)
>>
>>
>> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers 
>> wrote:
>>
>>> i find it particularly confusing that a new memory management module
>>> would change the locations. its not like the hash partitioner got replaced.
>>> i can switch back and forth between legacy and "new" memory management and
>>> see the distribution change... fully reproducible
>>>
>>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga  wrote:
>>>
 Hi,
 I've experienced a similar problem upgrading from spark 1.4 to spark
 1.6.
 The data is not evenly distributed across executors, but in my case it
 also reproduced with legacy mode.
 Also tried 1.6.1 rc-1, with same results.

 Still looking for resolution.

 Lior

 On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers 
 wrote:

> looking at the cached rdd i see a similar story:
> with useLegacyMode = true the cached rdd is spread out across 10
> executors, but with useLegacyMode = false the data for the cached rdd sits
> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
> RDD that got partitioned (hash partitioner, 50 partitions) before being
> cached.
>
> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers 
> wrote:
>
>> hello all,
>> we are just testing a semi-realtime application (it should return
>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
>> this it used to run on spark 1.5.1
>>
>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>> spark.memory.useLegacyMode = true, however if i switch to
>> spark.memory.useLegacyMode = false the queries take about 50% to 100% 
>> more
>> time.
>>
>> the issue becomes clear when i focus on a single stage: the
>> individual tasks are not slower at all, but they run on less executors.
>> in my test query i have 50 tasks and 10 executors. both with
>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 
>> 3
>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>> false the tasks run on just 3 executors out of 10, while with 
>> useLegacyMode
>> = true they spread out across 10 executors. all the tasks running on 
>> just a
>> few executors leads to the slower results.
>>
>> any idea why this would happen?
>> thanks! koert
>>
>>
>>
>

>>>
>>
>


Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Yin Yang
The default value for spark.shuffle.reduceLocality.enabled is true.

To reduce surprise to users of 1.5 and earlier releases, should the default
value be set to false ?

On Mon, Feb 29, 2016 at 5:38 AM, Lior Chaga  wrote:

> Hi Koret,
> Try spark.shuffle.reduceLocality.enabled=false
> This is an undocumented configuration.
> See:
> https://github.com/apache/spark/pull/8280
> https://issues.apache.org/jira/browse/SPARK-10567
>
> It solved the problem for me (both with and without memory legacy mode)
>
>
> On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers  wrote:
>
>> i find it particularly confusing that a new memory management module
>> would change the locations. its not like the hash partitioner got replaced.
>> i can switch back and forth between legacy and "new" memory management and
>> see the distribution change... fully reproducible
>>
>> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga  wrote:
>>
>>> Hi,
>>> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
>>> The data is not evenly distributed across executors, but in my case it
>>> also reproduced with legacy mode.
>>> Also tried 1.6.1 rc-1, with same results.
>>>
>>> Still looking for resolution.
>>>
>>> Lior
>>>
>>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers 
>>> wrote:
>>>
 looking at the cached rdd i see a similar story:
 with useLegacyMode = true the cached rdd is spread out across 10
 executors, but with useLegacyMode = false the data for the cached rdd sits
 on only 3 executors (the rest all show 0s). my cached RDD is a key-value
 RDD that got partitioned (hash partitioner, 50 partitions) before being
 cached.

 On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers 
 wrote:

> hello all,
> we are just testing a semi-realtime application (it should return
> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
> this it used to run on spark 1.5.1
>
> in spark 1.6.0 the performance is similar to 1.5.1 if i set
> spark.memory.useLegacyMode = true, however if i switch to
> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
> time.
>
> the issue becomes clear when i focus on a single stage: the individual
> tasks are not slower at all, but they run on less executors.
> in my test query i have 50 tasks and 10 executors. both with
> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3
> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
> false the tasks run on just 3 executors out of 10, while with 
> useLegacyMode
> = true they spread out across 10 executors. all the tasks running on just 
> a
> few executors leads to the slower results.
>
> any idea why this would happen?
> thanks! koert
>
>
>

>>>
>>
>


Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-29 Thread Lior Chaga
Hi Koret,
Try spark.shuffle.reduceLocality.enabled=false
This is an undocumented configuration.
See:
https://github.com/apache/spark/pull/8280
https://issues.apache.org/jira/browse/SPARK-10567

It solved the problem for me (both with and without memory legacy mode)


On Sun, Feb 28, 2016 at 11:16 PM, Koert Kuipers  wrote:

> i find it particularly confusing that a new memory management module would
> change the locations. its not like the hash partitioner got replaced. i can
> switch back and forth between legacy and "new" memory management and see
> the distribution change... fully reproducible
>
> On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga  wrote:
>
>> Hi,
>> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
>> The data is not evenly distributed across executors, but in my case it
>> also reproduced with legacy mode.
>> Also tried 1.6.1 rc-1, with same results.
>>
>> Still looking for resolution.
>>
>> Lior
>>
>> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers  wrote:
>>
>>> looking at the cached rdd i see a similar story:
>>> with useLegacyMode = true the cached rdd is spread out across 10
>>> executors, but with useLegacyMode = false the data for the cached rdd sits
>>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>>> cached.
>>>
>>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers 
>>> wrote:
>>>
 hello all,
 we are just testing a semi-realtime application (it should return
 results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
 this it used to run on spark 1.5.1

 in spark 1.6.0 the performance is similar to 1.5.1 if i set
 spark.memory.useLegacyMode = true, however if i switch to
 spark.memory.useLegacyMode = false the queries take about 50% to 100% more
 time.

 the issue becomes clear when i focus on a single stage: the individual
 tasks are not slower at all, but they run on less executors.
 in my test query i have 50 tasks and 10 executors. both with
 useLegacyMode = true and useLegacyMode = false the tasks finish in about 3
 seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
 false the tasks run on just 3 executors out of 10, while with useLegacyMode
 = true they spread out across 10 executors. all the tasks running on just a
 few executors leads to the slower results.

 any idea why this would happen?
 thanks! koert



>>>
>>
>


Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-28 Thread Koert Kuipers
i find it particularly confusing that a new memory management module would
change the locations. its not like the hash partitioner got replaced. i can
switch back and forth between legacy and "new" memory management and see
the distribution change... fully reproducible

On Sun, Feb 28, 2016 at 11:24 AM, Lior Chaga  wrote:

> Hi,
> I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
> The data is not evenly distributed across executors, but in my case it
> also reproduced with legacy mode.
> Also tried 1.6.1 rc-1, with same results.
>
> Still looking for resolution.
>
> Lior
>
> On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers  wrote:
>
>> looking at the cached rdd i see a similar story:
>> with useLegacyMode = true the cached rdd is spread out across 10
>> executors, but with useLegacyMode = false the data for the cached rdd sits
>> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
>> RDD that got partitioned (hash partitioner, 50 partitions) before being
>> cached.
>>
>> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers  wrote:
>>
>>> hello all,
>>> we are just testing a semi-realtime application (it should return
>>> results in less than 20 seconds from cached RDDs) on spark 1.6.0. before
>>> this it used to run on spark 1.5.1
>>>
>>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>>> spark.memory.useLegacyMode = true, however if i switch to
>>> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
>>> time.
>>>
>>> the issue becomes clear when i focus on a single stage: the individual
>>> tasks are not slower at all, but they run on less executors.
>>> in my test query i have 50 tasks and 10 executors. both with
>>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3
>>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>>> false the tasks run on just 3 executors out of 10, while with useLegacyMode
>>> = true they spread out across 10 executors. all the tasks running on just a
>>> few executors leads to the slower results.
>>>
>>> any idea why this would happen?
>>> thanks! koert
>>>
>>>
>>>
>>
>


Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-28 Thread Lior Chaga
Hi,
I've experienced a similar problem upgrading from spark 1.4 to spark 1.6.
The data is not evenly distributed across executors, but in my case it also
reproduced with legacy mode.
Also tried 1.6.1 rc-1, with same results.

Still looking for resolution.

Lior

On Fri, Feb 19, 2016 at 2:01 AM, Koert Kuipers  wrote:

> looking at the cached rdd i see a similar story:
> with useLegacyMode = true the cached rdd is spread out across 10
> executors, but with useLegacyMode = false the data for the cached rdd sits
> on only 3 executors (the rest all show 0s). my cached RDD is a key-value
> RDD that got partitioned (hash partitioner, 50 partitions) before being
> cached.
>
> On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers  wrote:
>
>> hello all,
>> we are just testing a semi-realtime application (it should return results
>> in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it
>> used to run on spark 1.5.1
>>
>> in spark 1.6.0 the performance is similar to 1.5.1 if i set
>> spark.memory.useLegacyMode = true, however if i switch to
>> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
>> time.
>>
>> the issue becomes clear when i focus on a single stage: the individual
>> tasks are not slower at all, but they run on less executors.
>> in my test query i have 50 tasks and 10 executors. both with
>> useLegacyMode = true and useLegacyMode = false the tasks finish in about 3
>> seconds and show as running PROCESS_LOCAL. however when  useLegacyMode =
>> false the tasks run on just 3 executors out of 10, while with useLegacyMode
>> = true they spread out across 10 executors. all the tasks running on just a
>> few executors leads to the slower results.
>>
>> any idea why this would happen?
>> thanks! koert
>>
>>
>>
>


Re: spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-18 Thread Koert Kuipers
looking at the cached rdd i see a similar story:
with useLegacyMode = true the cached rdd is spread out across 10 executors,
but with useLegacyMode = false the data for the cached rdd sits on only 3
executors (the rest all show 0s). my cached RDD is a key-value RDD that got
partitioned (hash partitioner, 50 partitions) before being cached.

On Thu, Feb 18, 2016 at 6:51 PM, Koert Kuipers  wrote:

> hello all,
> we are just testing a semi-realtime application (it should return results
> in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it
> used to run on spark 1.5.1
>
> in spark 1.6.0 the performance is similar to 1.5.1 if i set
> spark.memory.useLegacyMode = true, however if i switch to
> spark.memory.useLegacyMode = false the queries take about 50% to 100% more
> time.
>
> the issue becomes clear when i focus on a single stage: the individual
> tasks are not slower at all, but they run on less executors.
> in my test query i have 50 tasks and 10 executors. both with useLegacyMode
> = true and useLegacyMode = false the tasks finish in about 3 seconds and
> show as running PROCESS_LOCAL. however when  useLegacyMode = false the
> tasks run on just 3 executors out of 10, while with useLegacyMode = true
> they spread out across 10 executors. all the tasks running on just a few
> executors leads to the slower results.
>
> any idea why this would happen?
> thanks! koert
>
>
>


spark 1.6 new memory management - some issues with tasks not using all executors

2016-02-18 Thread Koert Kuipers
hello all,
we are just testing a semi-realtime application (it should return results
in less than 20 seconds from cached RDDs) on spark 1.6.0. before this it
used to run on spark 1.5.1

in spark 1.6.0 the performance is similar to 1.5.1 if i set
spark.memory.useLegacyMode = true, however if i switch to
spark.memory.useLegacyMode = false the queries take about 50% to 100% more
time.

the issue becomes clear when i focus on a single stage: the individual
tasks are not slower at all, but they run on less executors.
in my test query i have 50 tasks and 10 executors. both with useLegacyMode
= true and useLegacyMode = false the tasks finish in about 3 seconds and
show as running PROCESS_LOCAL. however when  useLegacyMode = false the
tasks run on just 3 executors out of 10, while with useLegacyMode = true
they spread out across 10 executors. all the tasks running on just a few
executors leads to the slower results.

any idea why this would happen?
thanks! koert