liupc edited a comment on issue #23672: [SPARK-26750]Estimate memory overhead 
with multi-cores
URL: https://github.com/apache/spark/pull/23672#issuecomment-458846902
 
 
   @srowen 
   For my understanding, the overhead is composite of  four parts:
   - jvm overhead (ThreadStacks, GC data, mapped files(jars, files) etc., 
ThreadStacks and GC data will grow with task number for there are more threads 
and more info needed to describe GC area)
   - spark offheap memory(mainly used in shuffle, will grow with the task 
number for they may shuffle as the same time)
   - thridparty library offheap memory(Compression related library etc. for 
instance - Snappy, will grow with the task number for they may do compression 
at the same time)
   - offheap memory within user code(user may load native library, or they may 
use some db client which uses offheap memory, will grow with the task number 
for they may execute at the same time)
   
   except for some shared library & jars or base native memory, others may all 
grow with the tasks number,
   is it right?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to