Thanks for taking this on Ted! On Sat, Feb 28, 2015 at 4:17 PM, Ted Yu <[email protected]> wrote:
> I have created SPARK-6085 with pull request: > https://github.com/apache/spark/pull/4836 > > Cheers > > On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet <[email protected]> wrote: > >> +1 to a better default as well. >> >> We were working find until we ran against a real dataset which was much >> larger than the test dataset we were using locally. It took me a couple >> days and digging through many logs to figure out this value was what was >> causing the problem. >> >> On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu <[email protected]> wrote: >> >>> Having good out-of-box experience is desirable. >>> >>> +1 on increasing the default. >>> >>> >>> On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen <[email protected]> wrote: >>> >>>> There was a recent discussion about whether to increase or indeed make >>>> configurable this kind of default fraction. I believe the suggestion >>>> there too was that 9-10% is a safer default. >>>> >>>> Advanced users can lower the resulting overhead value; it may still >>>> have to be increased in some cases, but a fatter default may make this >>>> kind of surprise less frequent. >>>> >>>> I'd support increasing the default; any other thoughts? >>>> >>>> On Sat, Feb 28, 2015 at 3:34 PM, Koert Kuipers <[email protected]> >>>> wrote: >>>> > hey, >>>> > running my first map-red like (meaning disk-to-disk, avoiding in >>>> memory >>>> > RDDs) computation in spark on yarn i immediately got bitten by a too >>>> low >>>> > spark.yarn.executor.memoryOverhead. however it took me about an hour >>>> to find >>>> > out this was the cause. at first i observed failing shuffles leading >>>> to >>>> > restarting of tasks, then i realized this was because executors could >>>> not be >>>> > reached, then i noticed in containers got shut down and reallocated in >>>> > resourcemanager logs (no mention of errors, it seemed the containers >>>> > finished their business and shut down successfully), and finally i >>>> found the >>>> > reason in nodemanager logs. >>>> > >>>> > i dont think this is a pleasent first experience. i realize >>>> > spark.yarn.executor.memoryOverhead needs to be set differently from >>>> > situation to situation. but shouldnt the default be a somewhat higher >>>> value >>>> > so that these errors are unlikely, and then the experts that are >>>> willing to >>>> > deal with these errors can tune it lower? so why not make the default >>>> 10% >>>> > instead of 7%? that gives something that works in most situations out >>>> of the >>>> > box (at the cost of being a little wasteful). it worked for me. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >> >
