>>>>> "Tim" == Tim Wescott <[email protected]> writes:

Tim> This isn't what you want to hear, but I would be concerned about
Tim> an 18/14 difference between life and death.  Are you sure that
Tim> your problem isn't heap fragmentation, instead of lack of raw
Tim> memory?

The kernel allocates things in blocks (slabs) of like-sized objects to
minimize fragmentation.  So, no I don't think fragmentation is my
problem.

I think my problems with NoCatAuth were due to rapidly forking from a
~4meg perl process putting pressure on the memory system.
Empirically, this was more likely to trigger failure when the
free+cache+buffers was at or below 14meg.  This was a symptom of the
memory leak.  Patching over the symptom could slow down the problem,
make it take longer to manifest itself (in fact, I found a way to do
that too, see below), but I really wanted to find and fix the
underlying problem.

Tim> If the kernel has different flavors of heap it may be that it's
Tim> running out of one and not another, and you can shake the cereal
Tim> down in the box a bit by reapportioning how it uses them.

I think I have identified the problem with the gradually eroding
free+cache+buffers.  As usual, formulating the problem for others
helped to get me thinking clearly about what was happening and what I
was seeing.

Basically, busybox ps only shows VSZ which is the size of the process
in virtual memory.  I was NOT seeing substantial changes in VSZ.  All
of that virtual memory isn't allocated physical memory until it is
actually needed.  What was putting pressure on the memory was growing
RSS (Resident Set Size), which was not readily visible to me.  When I
installed procps and used its /usr/bin/ps, I saw that one of my
processes was indeed growing substantially RSS-wise from the initial
warmup after boot over the course of a few days.  Restarting the
process recovers the memory.  Luckily it is a program that can easily
restore its own state, so restarting it once a day (until its apparent
leak is fixed) is not a hardship.

Setting /proc/sys/vm/overcommit_memory to 1 (always overcommit), I
seem to have solved the problem of NoCatAuth dying while forking
children.

Still perhaps slightly early to declare victory, but I'm feeling
pretty good about it right now.

Anyway, thanks for letting me ask even if only because it helped me
figure it out myself.


-- 
Russell Senior, President
[email protected]
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Reply via email to