Hi Jan, YARN-1856 was recently committed which allows admins to use cgroups instead the ProcFsBasedProcessTree monitory. Would that solve your problem? However, that requires usage of the LinuxContainerExecutor.
-Varun On 2/5/16, 6:45 PM, "Jan Lukavský" <[email protected]> wrote: >Hi Chris, > >thanks for your reply. As far as I can see right, new linux kernels show >the locked memory in "Locked" field. > >If mmap file a mlock it, I see the following in 'smaps' file: > >7efd20aeb000-7efd2172b000 r--p 00000000 103:04 1870 >/tmp/file.bin >Size: 12544 kB >Rss: 12544 kB >Pss: 12544 kB >Shared_Clean: 0 kB >Shared_Dirty: 0 kB >Private_Clean: 12544 kB >Private_Dirty: 0 kB >Referenced: 12544 kB >Anonymous: 0 kB >AnonHugePages: 0 kB >Swap: 0 kB >KernelPageSize: 4 kB >MMUPageSize: 4 kB >Locked: 12544 kB > >... ># uname -a >Linux XXXXXX 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u3 x86_64 GNU/Linux > >If I do this on an older kernel (2.6.x), the Locked field is missing. > >I can make a patch for the ProcfsBasedProcessTree that will calculate >the "Locked" pages instead of the "Private_Clean" (based on >configuration option). The question is - should there be made even more >changes in the way the memory footprint is calculated? For instance, I >believe the kernel can write to disk even all dirty pages (if they are >backed by a file), making them clean and therefore can later free them. >Should I open a JIRA for this to have some discussion on this topic? > >Regards, > Jan > > >On 02/04/2016 07:20 PM, Chris Nauroth wrote: >> Hello Jan, >> >> I am moving this thread from [email protected] to >> [email protected], since it's less a question of general usage >> and more a question of internal code implementation details and possible >> enhancements. >> >> I think the issue is that it's not guaranteed in the general case that >> Private_Clean pages are easily evictable from page cache by the kernel. >> For example, if the pages have been pinned into RAM by calling mlock [1], >> then the kernel cannot evict them. Since YARN can execute any code >> submitted by an application, including possibly code that calls mlock, it >> takes a cautious approach and assumes that these pages must be counted >> towards the process footprint. Although your Spark use case won't mlock >> the pages (I assume), YARN doesn't have a way to identify this. >> >> Perhaps there is room for improvement here. If there is a reliable way to >> count only mlock'ed pages, then perhaps that behavior could be added as >> another option in ProcfsBasedProcessTree. Off the top of my head, I can't >> think of a reliable way to do this, and I can't research it further >> immediately. Do others on the thread have ideas? >> >> --Chris Nauroth >> >> [1] http://linux.die.net/man/2/mlock >> >> >> >> >> On 2/4/16, 5:11 AM, "Jan Lukavský" <[email protected]> wrote: >> >>> Hello, >>> >>> I have a question about the way LinuxResourceCalculatorPlugin calculates >>> memory consumed by process tree (it is calculated via >>> ProcfsBasedProcessTree class). When we enable caching (disk) in apache >>> spark jobs run on YARN cluster, the node manager starts to kill the >>> containers while reading the cached data, because of "Container is >>> running beyond memory limits ...". The reason is that even if we enable >>> parsing of the smaps file >>> (yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled) >>> the ProcfsBasedProcessTree calculates mmaped read-only pages as consumed >>> by the process tree, while spark uses FileChannel.map(MapMode.READ_ONLY) >>> to read the cached data. The JVM then consumes *a lot* more memory than >>> the configured heap size (and it cannot be really controlled), but this >>> memory is IMO not really consumed by the process, the kernel can reclaim >>> these pages, if needed. My question is - is there any explicit reason >>> why "Private_Clean" pages are calculated as consumed by process tree? I >>> patched the ProcfsBasedProcessTree not to calculate them, but I don't >>> know if this is the "correct" solution. >>> >>> Thanks for opinions, >>> cheers, >>> Jan >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >
