Re: FreeBSD 8.2 - active plus inactive memory leak!?
On Wed, 2012-03-07 at 10:23 +0200, Konstantin Belousov wrote: On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote: I'm trying to confirm that, on a system with no pages swapped out, that the following is a true statement: a page is accounted for in active + inactive if and only if it corresponds to one or more of the pages accounted for in the resident memory lists of all the processes on the system (as per the output of 'top' and 'ps') No. The pages belonging to vnode vm object can be active or inactive or cached but not mapped into any process address space. Thank you, Konstantin. Does the number of vnodes we've got open on this machine (272011) fully explain away the memory gap? Memory gap: 11264M active + 2598M inactive - 9297M sum-of-resident = 4565M Active vnodes: vfs.numvnodes: 272011 That gives a lower bound at 17.18Kb per vode (or higher if we take into account shared libs, etc); that seems a bit high for a vnode vm object doesn't it? If that doesn't fully explain it, what else might be chewing through active memory? Also, when are vnodes freed? This system does have some tuning... kern.maxfiles: 100 vm.pmap.pv_entry_max: 73296250 Could that be contributing to so much active + inactive memory (5GB+ more than expected), or do PV entries live in wired e.g. kernel memory? On Tue, 2012-03-06 at 17:48 -0700, Ian Lepore wrote: In my experience, the bulk of the memory in the inactive category is cached disk blocks, at least for ufs (I think zfs does things differently). On this desktop machine I have 12G physical and typically have roughly 11G inactive, and I can unmount one particular filesystem where most of my work is done and instantly I have almost no inactive and roughly 11G free. Okay, so this could be UFS disk cache, except the system is ZFS-on-root with no UFS filesystems active or mounted. Can I confirm that no double-caching of ZFS data is happening in active + inactive (+ cache) memory? Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD 8.2 - active plus inactive memory leak!?
On Wed, 07 Mar 2012 10:23:38 +0200, Konstantin Belousov wrote: On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote: ... I'm trying to confirm that, on a system with no pages swapped out, that the following is a true statement: a page is accounted for in active + inactive if and only if it corresponds to one or more of the pages accounted for in the resident memory lists of all the processes on the system (as per the output of 'top' and 'ps') No. The pages belonging to vnode vm object can be active or inactive or cached but not mapped into any process address space. I wonder if some ideas by Denys Vlasenko contained in this thread http://comments.gmane.org/gmane.linux.redhat.fedora.devel/157706 would be useful ? ... Today, I'm looking at my process list, sorted by amount of dirtied pages (which very closely matches amount of malloced and used space - that is, malloced, but not-written to memory areas are not included). This is the most expensive type of pages, they can't be discarded. If we would be in memory squeeze, kernel will have to swap them out, if swap exists, otherwise kernel can't do anything at all. ... Note that any shared pages (such as glibc) are not freed this way; also, non-mapped pages (such as large, but unused malloced space, or large, but unused file mappings) also do not contribute to MemFree increase. jb ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD 8.2 - active plus inactive memory leak!?
On Wed, 2012-03-07 at 13:33 +0100, J B wrote: On Wed, 07 Mar 2012 10:23:38 +0200, Konstantin Belousov wrote: On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote: ... I'm trying to confirm that, on a system with no pages swapped out, that the following is a true statement: a page is accounted for in active + inactive if and only if it corresponds to one or more of the pages accounted for in the resident memory lists of all the processes on the system (as per the output of 'top' and 'ps') No. The pages belonging to vnode vm object can be active or inactive or cached but not mapped into any process address space. I wonder if some ideas by Denys Vlasenko contained in this thread http://comments.gmane.org/gmane.linux.redhat.fedora.devel/157706 would be useful ? https://github.com/pixelb/scripts/blob/master/scripts/ps_mem.py This looks like a really useful script, and looks like it works under FreeBSD with linprocfs. Good find! Cheers, Luke ... Today, I'm looking at my process list, sorted by amount of dirtied pages (which very closely matches amount of malloced and used space - that is, malloced, but not-written to memory areas are not included). This is the most expensive type of pages, they can't be discarded. If we would be in memory squeeze, kernel will have to swap them out, if swap exists, otherwise kernel can't do anything at all. ... Note that any shared pages (such as glibc) are not freed this way; also, non-mapped pages (such as large, but unused malloced space, or large, but unused file mappings) also do not contribute to MemFree increase. jb -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD 8.2 - active plus inactive memory leak!?
On 3/6/2012 2:13 PM, Luke Marsden wrote: [ ... ] My current (probably quite simplistic) understanding of the FreeBSD virtual memory system is that, for each process as reported by top: * Size corresponds to the total size of all the text pages for the process (those belonging to code in the binary itself and linked libraries) plus data pages (including stack and malloc()'d but not-yet-written-to memory segments). Size is the amount of the processes' VM address space which has been assigned; the various things you mention indeed are the common things which consume address space, but there are others like shared memory (ie, SysV shmem stuff), memory-mapped hardware like a video card VRAM buffer, thread-local storage, etc. * Resident corresponds to a subset of the pages above: those pages which actually occupy physical/core memory. Notably pages may appear in size but not appear in resident for read-only text pages from libraries which have not been used yet or which have been malloc()'d but not yet written-to. Yes. My understanding for the values for the system as a whole (at the top in 'top') is as follows: * Active / inactive memory is the same thing: resident memory from processes in use. Being in the inactive as opposed to active list simply indicates that the pages in question are less recently used and therefore more likely to get swapped out if the machine comes under memory pressure. Well, they aren't exactly the same thing. The kernel implements a VM working set algorithm which periodically looks at all of the pages that are in memory and notes whether a process has accessed that page recently. If it has, the page is active; if the page has not been used for some time, it becomes inactive. If the system has plenty of memory, it will not page or swap anything out. If it is under mild memory pressure, it will only consider pages which are inactive or cache as candidates for which it might page them out. Only under more severe memory pressure will it start looking to swap out entire processes rather than just page individual pages out. [ Although, the FreeBSD implementation supposedly will try to balance the size of the active, inactive, and cache lists (or queues), so it is looking at the active list also-- but you don't want to page out an active page unless you really have to, and if you have to do that, maybe you might as well free up the whole process and let something have enough room to run. ] * Wired is mostly kernel memory. It's normally all kernel memory; only a rare handful of userland programs such as crypto code like gnupg ever ask for wired memory, AFAIK. * Cache is freed memory which the kernel has decided to keep in case it correspond to a useful page in future; it can be cheaply evicted into the free list. Sort of, although this description fits the inactive memory category also. The major distinction is that the system is actively trying to flush any dirty pages in the cache category, so that they are available for reuse by something else immediately. * Free memory is actually not being used for anything. Yes, although the system likes to have at least a few pre-zeroed pages handy in case an interrupt handler needs them. It seems that pages which occur in the active + inactive lists must occur in the resident memory of one or more processes (or more since processes can share pages in e.g. read-only shared libs or COW forked address space). Everything in the active and inactive (and cache) lists are resident in physical memory. Conversely, if a page *does not* occur in the resident memory of any process, it must not occupy any space in the active + inactive lists. Hmm...if a process gets swapped out entirely, the pages for it will be moved to the cache list, flushed, and then reused as soon as the disk I/O completes. But there is a window where the process can be marked as swapped out (and considered no longer resident), but still has some of it's pages in physical memory. Therefore the active + inactive memory should always be less than or equal to the sum of the resident memory of all the processes on the system, right? No. If you've got a lot of process pages shared (ie, a webserver with lots of httpd children, or a database pulling in a large common shmem area), then your process resident sizes can be very large compared to the system-wide active+inactive count. This missing memory is scary, because it seems to be increasing over time, and eventually when the system runs out of free memory, I'm certain it will crash in the same way described in my previous thread [1]. I don't have enough data to fully evaluate the interactions with ZFS; you can easily get system panics by running out of KVA on a 32-bit system, but that shouldn't apply
Re: FreeBSD 8.2 - active plus inactive memory leak!?
Thanks for your email, Chuck. Conversely, if a page *does not* occur in the resident memory of any process, it must not occupy any space in the active + inactive lists. Hmm...if a process gets swapped out entirely, the pages for it will be moved to the cache list, flushed, and then reused as soon as the disk I/O completes. But there is a window where the process can be marked as swapped out (and considered no longer resident), but still has some of it's pages in physical memory. There's no swapping happening on these machines (intentionally so, because as soon as we hit swap everything goes tits up), so this window doesn't concern me. I'm trying to confirm that, on a system with no pages swapped out, that the following is a true statement: a page is accounted for in active + inactive if and only if it corresponds to one or more of the pages accounted for in the resident memory lists of all the processes on the system (as per the output of 'top' and 'ps') Therefore the active + inactive memory should always be less than or equal to the sum of the resident memory of all the processes on the system, right? No. If you've got a lot of process pages shared (ie, a webserver with lots of httpd children, or a database pulling in a large common shmem area), then your process resident sizes can be very large compared to the system-wide active+inactive count. But that's what I'm saying... sum(process resident sizes) = active + inactive Or as I said it above, equivalently: active + inactive = sum(process resident sizes) The data I've got from this system, and what's killing us, shows the opposite: active + inactive sum(process resident sizes) - by over 5GB now and growing, which is what keeps causing these machines to crash. In particular: Mem: 13G Active, 1129M Inact, 7543M Wired, 120M Cache, 1553M Free But the total sum of resident memories is 9457M (according to summing the output from ps or top). 13G + 1129M = 14441M (active + inact) 9457M (sum of res) That's 4984M out, and that's almost enough to push us over the edge. If my understanding of VM is correct, I don't see how this can happen. But it's happening, and it's causing real trouble here because our free memory keeps hitting zero and then we swap-spiral. What can I do to investigate this discrepancy? Are there some tools that I can use to debug the memory allocated in active to find out where it's going, if not to resident process memory? Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD 8.2 - active plus inactive memory leak!?
On Tue, 06 Mar 2012 18:30:07 -0500 Chuck Swiger wrote: On 3/6/2012 2:13 PM, Luke Marsden wrote: * Resident corresponds to a subset of the pages above: those pages which actually occupy physical/core memory. Notably pages may appear in size but not appear in resident for read-only text pages from libraries which have not been used yet or which have been malloc()'d but not yet written-to. Yes. My understanding for the values for the system as a whole (at the top in 'top') is as follows: * Active / inactive memory is the same thing: resident memory from processes in use. Being in the inactive as opposed to active list simply indicates that the pages in question are less recently used and therefore more likely to get swapped out if the machine comes under memory pressure. Well, they aren't exactly the same thing. The kernel implements a VM working set algorithm which periodically looks at all of the pages that are in memory and notes whether a process has accessed that page recently. If it has, the page is active; if the page has not been used for some time, it becomes inactive. I think the previous poster has it about right, it's mostly about lifecycle. The inactive queue contains a mixture of resident and non-resident memory. It's commonly dominated by disk cache pages, and consequently is easily blown away by recursive greps etc. * Cache is freed memory which the kernel has decided to keep in case it correspond to a useful page in future; it can be cheaply evicted into the free list. Sort of, although this description fits the inactive memory category also. The major distinction is that the system is actively trying to flush any dirty pages in the cache category, so that they are available for reuse by something else immediately. Only clean pages are added to cache. A dirty page will go twice around the inactive queue as dirty, get flushed and then do a third pass as a clean page. The point of cache is that it's a small stock of memory that's available for immediate reuse, the pages have nothing else in common. On Wed, 07 Mar 2012 00:36:21 + Luke Marsden wrote: But that's what I'm saying... sum(process resident sizes) = active + inactive Inactive memory contains disc cache. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org