Re: bitten by spark.yarn.executor.memoryOverhead

2015-03-02 Thread Sean Owen
The problem is, you're left with two competing options then. You can
go through the process of deprecating the absolute one and removing it
eventually. You take away ability to set this value directly though,
meaning you'd have to set absolute values by depending on a % of what
you set your app memory too. I think there's non-trivial downside that
way too.

No value can always be right, or else it wouldn't be configurable. I
think of this one like any other param that's set in absolute terms,
but with an attempt to be smart about the default.

On Mon, Mar 2, 2015 at 4:36 PM, Ryan Williams
 wrote:
> For reference, the initial version of #3525 (still open) made this fraction
> a configurable value, but consensus went against that being desirable so I
> removed it and marked SPARK-4665 as "won't fix".
>
> My team wasted a lot of time on this failure mode as well and has settled in
> to passing "--conf spark.yarn.executor.memoryOverhead=1024" to most jobs
> (that works out to 10-20% of --executor-memory, depending on the job).
>
> I agree that learning about this the hard way is a weak part of the
> Spark-on-YARN onboarding experience.
>
> The fact that our instinct here is to increase the 0.07 minimum instead of
> the alternate 384MB minimum seems like evidence that the fraction is the
> thing we should allow people to configure, instead of absolute amount that
> is currently configurable.
>
> Finally, do we feel confident that 0.1 is "always" enough?
>
>
> On Sat, Feb 28, 2015 at 4:51 PM Corey Nolet  wrote:
>>
>> Thanks for taking this on Ted!
>>
>> On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu  wrote:
>>>
>>> I have created SPARK-6085 with pull request:
>>> https://github.com/apache/spark/pull/4836
>>>
>>> Cheers
>>>
>>> On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet  wrote:

 +1 to a better default as well.

 We were working find until we ran against a real dataset which was much
 larger than the test dataset we were using locally. It took me a couple 
 days
 and digging through many logs to figure out this value was what was causing
 the problem.

 On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu  wrote:
>
> Having good out-of-box experience is desirable.
>
> +1 on increasing the default.
>
>
> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen  wrote:
>>
>> There was a recent discussion about whether to increase or indeed make
>> configurable this kind of default fraction. I believe the suggestion
>> there too was that 9-10% is a safer default.
>>
>> Advanced users can lower the resulting overhead value; it may still
>> have to be increased in some cases, but a fatter default may make this
>> kind of surprise less frequent.
>>
>> I'd support increasing the default; any other thoughts?
>>
>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers 
>> wrote:
>> > hey,
>> > running my first map-red like (meaning disk-to-disk, avoiding in
>> > memory
>> > RDDs) computation in spark on yarn i immediately got bitten by a too
>> > low
>> > spark.yarn.executor.memoryOverhead. however it took me about an hour
>> > to find
>> > out this was the cause. at first i observed failing shuffles leading
>> > to
>> > restarting of tasks, then i realized this was because executors
>> > could not be
>> > reached, then i noticed in containers got shut down and reallocated
>> > in
>> > resourcemanager logs (no mention of errors, it seemed the containers
>> > finished their business and shut down successfully), and finally i
>> > found the
>> > reason in nodemanager logs.
>> >
>> > i dont think this is a pleasent first experience. i realize
>> > spark.yarn.executor.memoryOverhead needs to be set differently from
>> > situation to situation. but shouldnt the default be a somewhat
>> > higher value
>> > so that these errors are unlikely, and then the experts that are
>> > willing to
>> > deal with these errors can tune it lower? so why not make the
>> > default 10%
>> > instead of 7%? that gives something that works in most situations
>> > out of the
>> > box (at the cost of being a little wasteful). it worked for me.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

>>>
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: bitten by spark.yarn.executor.memoryOverhead

2015-03-02 Thread Ted Yu
bq. that 0.1 is "always" enough?

The answer is: it depends (on use cases).
The value of 0.1 has been validated by several users. I think it is a
reasonable default.

Cheers

On Mon, Mar 2, 2015 at 8:36 AM, Ryan Williams  wrote:

> For reference, the initial version of #3525
>  (still open) made this
> fraction a configurable value, but consensus went against that being
> desirable so I removed it and marked SPARK-4665
>  as "won't fix".
>
> My team wasted a lot of time on this failure mode as well and has settled
> in to passing "--conf spark.yarn.executor.memoryOverhead=1024" to most
> jobs (that works out to 10-20% of --executor-memory, depending on the job).
>
> I agree that learning about this the hard way is a weak part of the
> Spark-on-YARN onboarding experience.
>
> The fact that our instinct here is to increase the 0.07 minimum instead of
> the alternate 384MB
> 
> minimum seems like evidence that the fraction is the thing we should allow
> people to configure, instead of absolute amount that is currently
> configurable.
>
> Finally, do we feel confident that 0.1 is "always" enough?
>
>
> On Sat, Feb 28, 2015 at 4:51 PM Corey Nolet  wrote:
>
>> Thanks for taking this on Ted!
>>
>> On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu  wrote:
>>
>>> I have created SPARK-6085 with pull request:
>>> https://github.com/apache/spark/pull/4836
>>>
>>> Cheers
>>>
>>> On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet  wrote:
>>>
 +1 to a better default as well.

 We were working find until we ran against a real dataset which was much
 larger than the test dataset we were using locally. It took me a couple
 days and digging through many logs to figure out this value was what was
 causing the problem.

 On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu  wrote:

> Having good out-of-box experience is desirable.
>
> +1 on increasing the default.
>
>
> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen  wrote:
>
>> There was a recent discussion about whether to increase or indeed make
>> configurable this kind of default fraction. I believe the suggestion
>> there too was that 9-10% is a safer default.
>>
>> Advanced users can lower the resulting overhead value; it may still
>> have to be increased in some cases, but a fatter default may make this
>> kind of surprise less frequent.
>>
>> I'd support increasing the default; any other thoughts?
>>
>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers 
>> wrote:
>> > hey,
>> > running my first map-red like (meaning disk-to-disk, avoiding in
>> memory
>> > RDDs) computation in spark on yarn i immediately got bitten by a
>> too low
>> > spark.yarn.executor.memoryOverhead. however it took me about an
>> hour to find
>> > out this was the cause. at first i observed failing shuffles
>> leading to
>> > restarting of tasks, then i realized this was because executors
>> could not be
>> > reached, then i noticed in containers got shut down and reallocated
>> in
>> > resourcemanager logs (no mention of errors, it seemed the containers
>> > finished their business and shut down successfully), and finally i
>> found the
>> > reason in nodemanager logs.
>> >
>> > i dont think this is a pleasent first experience. i realize
>> > spark.yarn.executor.memoryOverhead needs to be set differently from
>> > situation to situation. but shouldnt the default be a somewhat
>> higher value
>> > so that these errors are unlikely, and then the experts that are
>> willing to
>> > deal with these errors can tune it lower? so why not make the
>> default 10%
>> > instead of 7%? that gives something that works in most situations
>> out of the
>> > box (at the cost of being a little wasteful). it worked for me.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

>>>
>>


Re: bitten by spark.yarn.executor.memoryOverhead

2015-03-02 Thread Ryan Williams
For reference, the initial version of #3525
 (still open) made this fraction
a configurable value, but consensus went against that being desirable so I
removed it and marked SPARK-4665
 as "won't fix".

My team wasted a lot of time on this failure mode as well and has settled
in to passing "--conf spark.yarn.executor.memoryOverhead=1024" to most jobs
(that works out to 10-20% of --executor-memory, depending on the job).

I agree that learning about this the hard way is a weak part of the
Spark-on-YARN onboarding experience.

The fact that our instinct here is to increase the 0.07 minimum instead of
the alternate 384MB

minimum seems like evidence that the fraction is the thing we should allow
people to configure, instead of absolute amount that is currently
configurable.

Finally, do we feel confident that 0.1 is "always" enough?


On Sat, Feb 28, 2015 at 4:51 PM Corey Nolet  wrote:

> Thanks for taking this on Ted!
>
> On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu  wrote:
>
>> I have created SPARK-6085 with pull request:
>> https://github.com/apache/spark/pull/4836
>>
>> Cheers
>>
>> On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet  wrote:
>>
>>> +1 to a better default as well.
>>>
>>> We were working find until we ran against a real dataset which was much
>>> larger than the test dataset we were using locally. It took me a couple
>>> days and digging through many logs to figure out this value was what was
>>> causing the problem.
>>>
>>> On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu  wrote:
>>>
 Having good out-of-box experience is desirable.

 +1 on increasing the default.


 On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen  wrote:

> There was a recent discussion about whether to increase or indeed make
> configurable this kind of default fraction. I believe the suggestion
> there too was that 9-10% is a safer default.
>
> Advanced users can lower the resulting overhead value; it may still
> have to be increased in some cases, but a fatter default may make this
> kind of surprise less frequent.
>
> I'd support increasing the default; any other thoughts?
>
> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers 
> wrote:
> > hey,
> > running my first map-red like (meaning disk-to-disk, avoiding in
> memory
> > RDDs) computation in spark on yarn i immediately got bitten by a too
> low
> > spark.yarn.executor.memoryOverhead. however it took me about an hour
> to find
> > out this was the cause. at first i observed failing shuffles leading
> to
> > restarting of tasks, then i realized this was because executors
> could not be
> > reached, then i noticed in containers got shut down and reallocated
> in
> > resourcemanager logs (no mention of errors, it seemed the containers
> > finished their business and shut down successfully), and finally i
> found the
> > reason in nodemanager logs.
> >
> > i dont think this is a pleasent first experience. i realize
> > spark.yarn.executor.memoryOverhead needs to be set differently from
> > situation to situation. but shouldnt the default be a somewhat
> higher value
> > so that these errors are unlikely, and then the experts that are
> willing to
> > deal with these errors can tune it lower? so why not make the
> default 10%
> > instead of 7%? that gives something that works in most situations
> out of the
> > box (at the cost of being a little wasteful). it worked for me.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

>>>
>>
>


Re: bitten by spark.yarn.executor.memoryOverhead

2015-02-28 Thread Corey Nolet
Thanks for taking this on Ted!

On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu  wrote:

> I have created SPARK-6085 with pull request:
> https://github.com/apache/spark/pull/4836
>
> Cheers
>
> On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet  wrote:
>
>> +1 to a better default as well.
>>
>> We were working find until we ran against a real dataset which was much
>> larger than the test dataset we were using locally. It took me a couple
>> days and digging through many logs to figure out this value was what was
>> causing the problem.
>>
>> On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu  wrote:
>>
>>> Having good out-of-box experience is desirable.
>>>
>>> +1 on increasing the default.
>>>
>>>
>>> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen  wrote:
>>>
 There was a recent discussion about whether to increase or indeed make
 configurable this kind of default fraction. I believe the suggestion
 there too was that 9-10% is a safer default.

 Advanced users can lower the resulting overhead value; it may still
 have to be increased in some cases, but a fatter default may make this
 kind of surprise less frequent.

 I'd support increasing the default; any other thoughts?

 On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers 
 wrote:
 > hey,
 > running my first map-red like (meaning disk-to-disk, avoiding in
 memory
 > RDDs) computation in spark on yarn i immediately got bitten by a too
 low
 > spark.yarn.executor.memoryOverhead. however it took me about an hour
 to find
 > out this was the cause. at first i observed failing shuffles leading
 to
 > restarting of tasks, then i realized this was because executors could
 not be
 > reached, then i noticed in containers got shut down and reallocated in
 > resourcemanager logs (no mention of errors, it seemed the containers
 > finished their business and shut down successfully), and finally i
 found the
 > reason in nodemanager logs.
 >
 > i dont think this is a pleasent first experience. i realize
 > spark.yarn.executor.memoryOverhead needs to be set differently from
 > situation to situation. but shouldnt the default be a somewhat higher
 value
 > so that these errors are unlikely, and then the experts that are
 willing to
 > deal with these errors can tune it lower? so why not make the default
 10%
 > instead of 7%? that gives something that works in most situations out
 of the
 > box (at the cost of being a little wasteful). it worked for me.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


>>>
>>
>


Re: bitten by spark.yarn.executor.memoryOverhead

2015-02-28 Thread Ted Yu
I have created SPARK-6085 with pull request:
https://github.com/apache/spark/pull/4836

Cheers

On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet  wrote:

> +1 to a better default as well.
>
> We were working find until we ran against a real dataset which was much
> larger than the test dataset we were using locally. It took me a couple
> days and digging through many logs to figure out this value was what was
> causing the problem.
>
> On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu  wrote:
>
>> Having good out-of-box experience is desirable.
>>
>> +1 on increasing the default.
>>
>>
>> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen  wrote:
>>
>>> There was a recent discussion about whether to increase or indeed make
>>> configurable this kind of default fraction. I believe the suggestion
>>> there too was that 9-10% is a safer default.
>>>
>>> Advanced users can lower the resulting overhead value; it may still
>>> have to be increased in some cases, but a fatter default may make this
>>> kind of surprise less frequent.
>>>
>>> I'd support increasing the default; any other thoughts?
>>>
>>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers 
>>> wrote:
>>> > hey,
>>> > running my first map-red like (meaning disk-to-disk, avoiding in memory
>>> > RDDs) computation in spark on yarn i immediately got bitten by a too
>>> low
>>> > spark.yarn.executor.memoryOverhead. however it took me about an hour
>>> to find
>>> > out this was the cause. at first i observed failing shuffles leading to
>>> > restarting of tasks, then i realized this was because executors could
>>> not be
>>> > reached, then i noticed in containers got shut down and reallocated in
>>> > resourcemanager logs (no mention of errors, it seemed the containers
>>> > finished their business and shut down successfully), and finally i
>>> found the
>>> > reason in nodemanager logs.
>>> >
>>> > i dont think this is a pleasent first experience. i realize
>>> > spark.yarn.executor.memoryOverhead needs to be set differently from
>>> > situation to situation. but shouldnt the default be a somewhat higher
>>> value
>>> > so that these errors are unlikely, and then the experts that are
>>> willing to
>>> > deal with these errors can tune it lower? so why not make the default
>>> 10%
>>> > instead of 7%? that gives something that works in most situations out
>>> of the
>>> > box (at the cost of being a little wasteful). it worked for me.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>


Re: bitten by spark.yarn.executor.memoryOverhead

2015-02-28 Thread Corey Nolet
+1 to a better default as well.

We were working find until we ran against a real dataset which was much
larger than the test dataset we were using locally. It took me a couple
days and digging through many logs to figure out this value was what was
causing the problem.

On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu  wrote:

> Having good out-of-box experience is desirable.
>
> +1 on increasing the default.
>
>
> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen  wrote:
>
>> There was a recent discussion about whether to increase or indeed make
>> configurable this kind of default fraction. I believe the suggestion
>> there too was that 9-10% is a safer default.
>>
>> Advanced users can lower the resulting overhead value; it may still
>> have to be increased in some cases, but a fatter default may make this
>> kind of surprise less frequent.
>>
>> I'd support increasing the default; any other thoughts?
>>
>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers  wrote:
>> > hey,
>> > running my first map-red like (meaning disk-to-disk, avoiding in memory
>> > RDDs) computation in spark on yarn i immediately got bitten by a too low
>> > spark.yarn.executor.memoryOverhead. however it took me about an hour to
>> find
>> > out this was the cause. at first i observed failing shuffles leading to
>> > restarting of tasks, then i realized this was because executors could
>> not be
>> > reached, then i noticed in containers got shut down and reallocated in
>> > resourcemanager logs (no mention of errors, it seemed the containers
>> > finished their business and shut down successfully), and finally i
>> found the
>> > reason in nodemanager logs.
>> >
>> > i dont think this is a pleasent first experience. i realize
>> > spark.yarn.executor.memoryOverhead needs to be set differently from
>> > situation to situation. but shouldnt the default be a somewhat higher
>> value
>> > so that these errors are unlikely, and then the experts that are
>> willing to
>> > deal with these errors can tune it lower? so why not make the default
>> 10%
>> > instead of 7%? that gives something that works in most situations out
>> of the
>> > box (at the cost of being a little wasteful). it worked for me.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: bitten by spark.yarn.executor.memoryOverhead

2015-02-28 Thread Ted Yu
Having good out-of-box experience is desirable.

+1 on increasing the default.


On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen  wrote:

> There was a recent discussion about whether to increase or indeed make
> configurable this kind of default fraction. I believe the suggestion
> there too was that 9-10% is a safer default.
>
> Advanced users can lower the resulting overhead value; it may still
> have to be increased in some cases, but a fatter default may make this
> kind of surprise less frequent.
>
> I'd support increasing the default; any other thoughts?
>
> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers  wrote:
> > hey,
> > running my first map-red like (meaning disk-to-disk, avoiding in memory
> > RDDs) computation in spark on yarn i immediately got bitten by a too low
> > spark.yarn.executor.memoryOverhead. however it took me about an hour to
> find
> > out this was the cause. at first i observed failing shuffles leading to
> > restarting of tasks, then i realized this was because executors could
> not be
> > reached, then i noticed in containers got shut down and reallocated in
> > resourcemanager logs (no mention of errors, it seemed the containers
> > finished their business and shut down successfully), and finally i found
> the
> > reason in nodemanager logs.
> >
> > i dont think this is a pleasent first experience. i realize
> > spark.yarn.executor.memoryOverhead needs to be set differently from
> > situation to situation. but shouldnt the default be a somewhat higher
> value
> > so that these errors are unlikely, and then the experts that are willing
> to
> > deal with these errors can tune it lower? so why not make the default 10%
> > instead of 7%? that gives something that works in most situations out of
> the
> > box (at the cost of being a little wasteful). it worked for me.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: bitten by spark.yarn.executor.memoryOverhead

2015-02-28 Thread Sean Owen
There was a recent discussion about whether to increase or indeed make
configurable this kind of default fraction. I believe the suggestion
there too was that 9-10% is a safer default.

Advanced users can lower the resulting overhead value; it may still
have to be increased in some cases, but a fatter default may make this
kind of surprise less frequent.

I'd support increasing the default; any other thoughts?

On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers  wrote:
> hey,
> running my first map-red like (meaning disk-to-disk, avoiding in memory
> RDDs) computation in spark on yarn i immediately got bitten by a too low
> spark.yarn.executor.memoryOverhead. however it took me about an hour to find
> out this was the cause. at first i observed failing shuffles leading to
> restarting of tasks, then i realized this was because executors could not be
> reached, then i noticed in containers got shut down and reallocated in
> resourcemanager logs (no mention of errors, it seemed the containers
> finished their business and shut down successfully), and finally i found the
> reason in nodemanager logs.
>
> i dont think this is a pleasent first experience. i realize
> spark.yarn.executor.memoryOverhead needs to be set differently from
> situation to situation. but shouldnt the default be a somewhat higher value
> so that these errors are unlikely, and then the experts that are willing to
> deal with these errors can tune it lower? so why not make the default 10%
> instead of 7%? that gives something that works in most situations out of the
> box (at the cost of being a little wasteful). it worked for me.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org