Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-07 Thread Luke Marsden
On Wed, 2012-03-07 at 10:23 +0200, Konstantin Belousov wrote:
 On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote:
  I'm trying to confirm that, on a system with no pages swapped out, that
  the following is a true statement:
  
  a page is accounted for in active + inactive if and only if it
  corresponds to one or more of the pages accounted for in the
  resident memory lists of all the processes on the system (as per
  the output of 'top' and 'ps')
 No.
 
 The pages belonging to vnode vm object can be active or inactive or cached
 but not mapped into any process address space.

Thank you, Konstantin.  Does the number of vnodes we've got open on this
machine (272011) fully explain away the memory gap?

Memory gap:
11264M active + 2598M inactive - 9297M sum-of-resident = 4565M

Active vnodes:
vfs.numvnodes: 272011

That gives a lower bound at 17.18Kb per vode (or higher if we take into
account shared libs, etc); that seems a bit high for a vnode vm object
doesn't it?

If that doesn't fully explain it, what else might be chewing through
active memory?

Also, when are vnodes freed?

This system does have some tuning...
kern.maxfiles: 100
vm.pmap.pv_entry_max: 73296250

Could that be contributing to so much active + inactive memory (5GB+
more than expected), or do PV entries live in wired e.g. kernel memory?


On Tue, 2012-03-06 at 17:48 -0700, Ian Lepore wrote:
 In my experience, the bulk of the memory in the inactive category is
 cached disk blocks, at least for ufs (I think zfs does things
 differently).  On this desktop machine I have 12G physical and
 typically have roughly 11G inactive, and I can unmount one particular
 filesystem where most of my work is done and instantly I have almost
 no inactive and roughly 11G free.

Okay, so this could be UFS disk cache, except the system is ZFS-on-root
with no UFS filesystems active or mounted.  Can I confirm that no
double-caching of ZFS data is happening in active + inactive (+ cache)
memory?

Thanks,
Luke

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-07 Thread J B
On Wed, 07 Mar 2012 10:23:38 +0200, Konstantin Belousov wrote:

 On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote:
 ...
 I'm trying to confirm that, on a system with no pages swapped out, that
 the following is a true statement:

 a page is accounted for in active + inactive if and only if it
 corresponds to one or more of the pages accounted for in the
 resident memory lists of all the processes on the system (as
 per the output of 'top' and 'ps')
 No.

 The pages belonging to vnode vm object can be active or inactive or
 cached but not mapped into any process address space.

I wonder if some ideas by Denys Vlasenko contained in this thread
http://comments.gmane.org/gmane.linux.redhat.fedora.devel/157706
would be useful ?

...
Today, I'm looking at my process list, sorted by amount of dirtied pages
(which very closely matches amount of malloced and used space - that is,
malloced, but not-written to memory areas are not included).
This is the most expensive type of pages, they can't be discarded.
If we would be in memory squeeze, kernel will have to swap them out,
if swap exists, otherwise kernel can't do anything at all.
...
Note that any shared pages (such as glibc) are not freed this way;
also, non-mapped pages (such as large, but unused malloced space, or large,
but unused file mappings) also do not contribute to MemFree increase.

jb
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-07 Thread Luke Marsden
On Wed, 2012-03-07 at 13:33 +0100, J B wrote:
 On Wed, 07 Mar 2012 10:23:38 +0200, Konstantin Belousov wrote:
 
  On Wed, Mar 07, 2012 at 12:36:21AM +, Luke Marsden wrote:
  ...
  I'm trying to confirm that, on a system with no pages swapped out, that
  the following is a true statement:
 
  a page is accounted for in active + inactive if and only if it
  corresponds to one or more of the pages accounted for in the
  resident memory lists of all the processes on the system (as
  per the output of 'top' and 'ps')
  No.
 
  The pages belonging to vnode vm object can be active or inactive or
  cached but not mapped into any process address space.
 
 I wonder if some ideas by Denys Vlasenko contained in this thread
 http://comments.gmane.org/gmane.linux.redhat.fedora.devel/157706
 would be useful ?

https://github.com/pixelb/scripts/blob/master/scripts/ps_mem.py

This looks like a really useful script, and looks like it works under
FreeBSD with linprocfs.

Good find!

Cheers,
Luke

 ...
 Today, I'm looking at my process list, sorted by amount of dirtied pages
 (which very closely matches amount of malloced and used space - that is,
 malloced, but not-written to memory areas are not included).
 This is the most expensive type of pages, they can't be discarded.
 If we would be in memory squeeze, kernel will have to swap them out,
 if swap exists, otherwise kernel can't do anything at all.
 ...
 Note that any shared pages (such as glibc) are not freed this way;
 also, non-mapped pages (such as large, but unused malloced space, or large,
 but unused file mappings) also do not contribute to MemFree increase.
 
 jb

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-06 Thread Chuck Swiger

On 3/6/2012 2:13 PM, Luke Marsden wrote:
[ ... ]

My current (probably quite simplistic) understanding of the FreeBSD
virtual memory system is that, for each process as reported by top:

   * Size corresponds to the total size of all the text pages for the
 process (those belonging to code in the binary itself and linked
 libraries) plus data pages (including stack and malloc()'d but
 not-yet-written-to memory segments).


Size is the amount of the processes' VM address space which has been assigned; 
the various things you mention indeed are the common things which consume 
address space, but there are others like shared memory (ie, SysV shmem stuff), 
memory-mapped hardware like a video card VRAM buffer, thread-local storage, etc.



   * Resident corresponds to a subset of the pages above: those pages
 which actually occupy physical/core memory.  Notably pages may
 appear in size but not appear in resident for read-only text
 pages from libraries which have not been used yet or which have
 been malloc()'d but not yet written-to.


Yes.


My understanding for the values for the system as a whole (at the top in
'top') is as follows:

   * Active / inactive memory is the same thing: resident memory from
 processes in use.  Being in the inactive as opposed to active
 list simply indicates that the pages in question are less
 recently used and therefore more likely to get swapped out if
 the machine comes under memory pressure.


Well, they aren't exactly the same thing.  The kernel implements a VM working 
set algorithm which periodically looks at all of the pages that are in memory 
and notes whether a process has accessed that page recently.  If it has, the 
page is active; if the page has not been used for some time, it becomes 
inactive.


If the system has plenty of memory, it will not page or swap anything out.  If 
it is under mild memory pressure, it will only consider pages which are 
inactive or cache as candidates for which it might page them out.  Only under 
more severe memory pressure will it start looking to swap out entire processes 
rather than just page individual pages out.


[ Although, the FreeBSD implementation supposedly will try to balance the size 
of the active, inactive, and cache lists (or queues), so it is looking at the 
active list also-- but you don't want to page out an active page unless you 
really have to, and if you have to do that, maybe you might as well free up 
the whole process and let something have enough room to run. ]



   * Wired is mostly kernel memory.


It's normally all kernel memory; only a rare handful of userland programs such 
as crypto code like gnupg ever ask for wired memory, AFAIK.



   * Cache is freed memory which the kernel has decided to keep in
 case it correspond to a useful page in future; it can be cheaply
 evicted into the free list.


Sort of, although this description fits the inactive memory category also.

The major distinction is that the system is actively trying to flush any dirty 
pages in the cache category, so that they are available for reuse by something 
else immediately.



   * Free memory is actually not being used for anything.


Yes, although the system likes to have at least a few pre-zeroed pages handy 
in case an interrupt handler needs them.



It seems that pages which occur in the active + inactive lists must
occur in the resident memory of one or more processes (or more since
processes can share pages in e.g. read-only shared libs or COW forked
address space).


Everything in the active and inactive (and cache) lists are resident in 
physical memory.



Conversely, if a page *does not* occur in the resident
memory of any process, it must not occupy any space in the active +
inactive lists.


Hmm...if a process gets swapped out entirely, the pages for it will be moved 
to the cache list, flushed, and then reused as soon as the disk I/O completes. 
 But there is a window where the process can be marked as swapped out (and 
considered no longer resident), but still has some of it's pages in physical 
memory.



Therefore the active + inactive memory should always be less than or
equal to the sum of the resident memory of all the processes on the
system, right?


No.  If you've got a lot of process pages shared (ie, a webserver with lots of 
httpd children, or a database pulling in a large common shmem area), then your 
process resident sizes can be very large compared to the system-wide 
active+inactive count.



This missing memory is scary, because it seems to be increasing over
time, and eventually when the system runs out of free memory, I'm
certain it will crash in the same way described in my previous thread
[1].


I don't have enough data to fully evaluate the interactions with ZFS; you can 
easily get system panics by running out of KVA on a 32-bit system, but that 
shouldn't apply 

Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-06 Thread Luke Marsden
Thanks for your email, Chuck.

  Conversely, if a page *does not* occur in the resident
  memory of any process, it must not occupy any space in the active +
  inactive lists.
 
 Hmm...if a process gets swapped out entirely, the pages for it will be moved 
 to the cache list, flushed, and then reused as soon as the disk I/O 
 completes. 
   But there is a window where the process can be marked as swapped out (and 
 considered no longer resident), but still has some of it's pages in physical 
 memory.

There's no swapping happening on these machines (intentionally so,
because as soon as we hit swap everything goes tits up), so this window
doesn't concern me.

I'm trying to confirm that, on a system with no pages swapped out, that
the following is a true statement:

a page is accounted for in active + inactive if and only if it
corresponds to one or more of the pages accounted for in the
resident memory lists of all the processes on the system (as per
the output of 'top' and 'ps')

  Therefore the active + inactive memory should always be less than or
  equal to the sum of the resident memory of all the processes on the
  system, right?
 
 No.  If you've got a lot of process pages shared (ie, a webserver with lots 
 of 
 httpd children, or a database pulling in a large common shmem area), then 
 your 
 process resident sizes can be very large compared to the system-wide 
 active+inactive count.

But that's what I'm saying...

sum(process resident sizes) = active + inactive

Or as I said it above, equivalently:

active + inactive = sum(process resident sizes)

The data I've got from this system, and what's killing us, shows the
opposite: active + inactive  sum(process resident sizes) - by over 5GB
now and growing, which is what keeps causing these machines to crash.

In particular:
Mem: 13G Active, 1129M Inact, 7543M Wired, 120M Cache, 1553M Free

But the total sum of resident memories is 9457M (according to summing
the output from ps or top).

13G + 1129M = 14441M (active + inact)  9457M (sum of res)

That's 4984M out, and that's almost enough to push us over the edge.

If my understanding of VM is correct, I don't see how this can happen.
But it's happening, and it's causing real trouble here because our free
memory keeps hitting zero and then we swap-spiral.

What can I do to investigate this discrepancy?  Are there some tools
that I can use to debug the memory allocated in active to find out
where it's going, if not to resident process memory?

Thanks,
Luke

-- 
CTO, Hybrid Logic
+447791750420  |  +1-415-449-1165  | www.hybrid-cluster.com 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: FreeBSD 8.2 - active plus inactive memory leak!?

2012-03-06 Thread RW
On Tue, 06 Mar 2012 18:30:07 -0500
Chuck Swiger wrote:

 On 3/6/2012 2:13 PM, Luke Marsden wrote:

 * Resident corresponds to a subset of the pages above: those
  pages which actually occupy physical/core memory.  Notably pages may
   appear in size but not appear in resident for read-only
  text pages from libraries which have not been used yet or which have
   been malloc()'d but not yet written-to.
 
 Yes.
 
  My understanding for the values for the system as a whole (at the
  top in 'top') is as follows:
 
 * Active / inactive memory is the same thing: resident
  memory from processes in use.  Being in the inactive as opposed to
  active list simply indicates that the pages in question are less
   recently used and therefore more likely to get swapped out
  if the machine comes under memory pressure.
 
 Well, they aren't exactly the same thing.  The kernel implements a VM
 working set algorithm which periodically looks at all of the pages
 that are in memory and notes whether a process has accessed that page
 recently.  If it has, the page is active; if the page has not been
 used for some time, it becomes inactive.

I think the previous poster  has it about right, it's mostly about
lifecycle. The inactive queue contains a mixture of resident and
non-resident memory. It's commonly dominated by disk cache pages, and
consequently is easily blown away by recursive greps etc.

 * Cache is freed memory which the kernel has decided to keep
  in case it correspond to a useful page in future; it can be cheaply
   evicted into the free list.
 
 Sort of, although this description fits the inactive memory
 category also.
 
 The major distinction is that the system is actively trying to flush
 any dirty pages in the cache category, so that they are available for
 reuse by something else immediately.

Only clean pages are added to cache. A dirty page will go twice around
the inactive queue as dirty, get flushed and then do a third pass as a
clean page. 

The point of cache is that it's a small stock of memory that's
available for immediate reuse, the pages have nothing else in common.



On Wed, 07 Mar 2012 00:36:21 +
Luke Marsden wrote:

 But that's what I'm saying...
 
 sum(process resident sizes) = active + inactive


Inactive memory contains disc cache. 
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org