Hello, In some of our spark applications, when writing outputs to hdfs we encountered an error about the spark yarn executor memory overhead :
WARN yarn.YarnAllocator: Container killed by YARN for exceeding memory limits. 3.0 GB of 3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. When setting an higher overhead everything works correctly, but we'd like to know what's in this overhead in order to fix/change our code. The documentation states that it contains VM overheads, interned strings and other native overheads. However it's really vague. We considered two things to be the problem : - Interned strings, however since Java 7 it's not in the overhead anymore - Buffers used when writting to HDFS since we use Hadoop Multiple Outputs. So if someone could enlighten us about what's in this overhead, it could be very helpful. Regards, -- *Olivier Devoisin* Data Infrastructure Engineer olivier.devoi...@contentsquare.com http://www.contentsquare.com 50 Avenue Montaigne - 75008 Paris