hey, running my first map-red like (meaning disk-to-disk, avoiding in memory RDDs) computation in spark on yarn i immediately got bitten by a too low spark.yarn.executor.memoryOverhead. however it took me about an hour to find out this was the cause. at first i observed failing shuffles leading to restarting of tasks, then i realized this was because executors could not be reached, then i noticed in containers got shut down and reallocated in resourcemanager logs (no mention of errors, it seemed the containers finished their business and shut down successfully), and finally i found the reason in nodemanager logs.
i dont think this is a pleasent first experience. i realize spark.yarn.executor.memoryOverhead needs to be set differently from situation to situation. but shouldnt the default be a somewhat higher value so that these errors are unlikely, and then the experts that are willing to deal with these errors can tune it lower? so why not make the default 10% instead of 7%? that gives something that works in most situations out of the box (at the cost of being a little wasteful). it worked for me.