Hello Jan, I am moving this thread from [email protected] to [email protected], since it's less a question of general usage and more a question of internal code implementation details and possible enhancements.
I think the issue is that it's not guaranteed in the general case that Private_Clean pages are easily evictable from page cache by the kernel. For example, if the pages have been pinned into RAM by calling mlock [1], then the kernel cannot evict them. Since YARN can execute any code submitted by an application, including possibly code that calls mlock, it takes a cautious approach and assumes that these pages must be counted towards the process footprint. Although your Spark use case won't mlock the pages (I assume), YARN doesn't have a way to identify this. Perhaps there is room for improvement here. If there is a reliable way to count only mlock'ed pages, then perhaps that behavior could be added as another option in ProcfsBasedProcessTree. Off the top of my head, I can't think of a reliable way to do this, and I can't research it further immediately. Do others on the thread have ideas? --Chris Nauroth [1] http://linux.die.net/man/2/mlock On 2/4/16, 5:11 AM, "Jan Lukavský" <[email protected]> wrote: >Hello, > >I have a question about the way LinuxResourceCalculatorPlugin calculates >memory consumed by process tree (it is calculated via >ProcfsBasedProcessTree class). When we enable caching (disk) in apache >spark jobs run on YARN cluster, the node manager starts to kill the >containers while reading the cached data, because of "Container is >running beyond memory limits ...". The reason is that even if we enable >parsing of the smaps file >(yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled) >the ProcfsBasedProcessTree calculates mmaped read-only pages as consumed >by the process tree, while spark uses FileChannel.map(MapMode.READ_ONLY) >to read the cached data. The JVM then consumes *a lot* more memory than >the configured heap size (and it cannot be really controlled), but this >memory is IMO not really consumed by the process, the kernel can reclaim >these pages, if needed. My question is - is there any explicit reason >why "Private_Clean" pages are calculated as consumed by process tree? I >patched the ProcfsBasedProcessTree not to calculate them, but I don't >know if this is the "correct" solution. > >Thanks for opinions, > cheers, > Jan > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [email protected] >For additional commands, e-mail: [email protected] > >
