liupc commented on issue #23672: [SPARK-26750]Estimate memory overhead with 
multi-cores
URL: https://github.com/apache/spark/pull/23672#issuecomment-458846902
 
 
   @srowen 
   For my understanding, the overhead is composite of  four parts:
   - jvm overhead (ThreadStacks, GC data, mapped files(jars, files) etc., 
ThreadStacks and GC data will increase with multi cores for there are more 
threads and more info needed to describe GC area)
   - spark offheap memory(mainly used in shuffle, grows with the tasks number 
for they may shuffle as the same time)
   - thridparty library offheap memory(Compression related library etc. for 
instance - Snappy, grows with the tasks number for they may do compression at 
the same time)
   - offheap memory within user code(user may load native library, or they may 
use some db client which uses offheap memory, grows with the tasks number for 
they may execute at the same time)
   
   except for some shared library & jars or base native memory, others may all 
grow with the tasks number,
   is it right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to