liupc commented on issue #23672: [SPARK-26750]Estimate memory overhead with multi-cores URL: https://github.com/apache/spark/pull/23672#issuecomment-458846902 @srowen For my understanding, the overhead is composite of four parts: - jvm overhead (ThreadStacks, GC data, mapped files(jars, files) etc., ThreadStacks and GC data will increase with multi cores for there are more threads and more info needed to describe GC area) - spark offheap memory(mainly used in shuffle, grows with the tasks number for they may shuffle as the same time) - thridparty library offheap memory(Compression related library etc. for instance - Snappy, grows with the tasks number for they may do compression at the same time) - offheap memory within user code(user may load native library, or they may use some db client which uses offheap memory, grows with the tasks number for they may execute at the same time) except for some shared library & jars or base native memory, others may all grow with the tasks number, is it right?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
