>>>>> "Tim" == Tim Wescott <[email protected]> writes:
Tim> This isn't what you want to hear, but I would be concerned about Tim> an 18/14 difference between life and death. Are you sure that Tim> your problem isn't heap fragmentation, instead of lack of raw Tim> memory? The kernel allocates things in blocks (slabs) of like-sized objects to minimize fragmentation. So, no I don't think fragmentation is my problem. I think my problems with NoCatAuth were due to rapidly forking from a ~4meg perl process putting pressure on the memory system. Empirically, this was more likely to trigger failure when the free+cache+buffers was at or below 14meg. This was a symptom of the memory leak. Patching over the symptom could slow down the problem, make it take longer to manifest itself (in fact, I found a way to do that too, see below), but I really wanted to find and fix the underlying problem. Tim> If the kernel has different flavors of heap it may be that it's Tim> running out of one and not another, and you can shake the cereal Tim> down in the box a bit by reapportioning how it uses them. I think I have identified the problem with the gradually eroding free+cache+buffers. As usual, formulating the problem for others helped to get me thinking clearly about what was happening and what I was seeing. Basically, busybox ps only shows VSZ which is the size of the process in virtual memory. I was NOT seeing substantial changes in VSZ. All of that virtual memory isn't allocated physical memory until it is actually needed. What was putting pressure on the memory was growing RSS (Resident Set Size), which was not readily visible to me. When I installed procps and used its /usr/bin/ps, I saw that one of my processes was indeed growing substantially RSS-wise from the initial warmup after boot over the course of a few days. Restarting the process recovers the memory. Luckily it is a program that can easily restore its own state, so restarting it once a day (until its apparent leak is fixed) is not a hardship. Setting /proc/sys/vm/overcommit_memory to 1 (always overcommit), I seem to have solved the problem of NoCatAuth dying while forking children. Still perhaps slightly early to declare victory, but I'm feeling pretty good about it right now. Anyway, thanks for letting me ask even if only because it helped me figure it out myself. -- Russell Senior, President [email protected] _______________________________________________ PLUG mailing list [email protected] http://lists.pdxlinux.org/mailman/listinfo/plug
