I am working on a network management program written in python that has
multiple threads (typically 20+) spawning subprocesses which are used
to communicate with other systems on the network.  This runs fine for a
while, but eventually slows down to a crawl.  Running sar shows that
when it is running slowly there is an exceptionally large number of
minor page faults - there are continuously 14000 faults/sec, with a
variation of about +/-100.  There are no pages swapped to disk, these
are purely in-memory faults.

I have a hypothesis about what is happening, but have not been able to
prove or disprove it:
the theory is that when a subprocess is spawned, there is a small
window between the call to fork and the call to exec where the parent's
memory is shared between the two processes.  Linux marks the memory as
copy-on-write, so if the parent process then accesses memory during
that window a minor page fault is generated and the page is copied.
Normally this is not a problem, but with a large number of threads all
spawning subprocesses there is a chance of a another process being
spawned during that window and the whole of memory is copied.  This
slows everything else down so the probability of another collision
increases, and the whole thing snowballs.  This could also happen if
something else tries to write to large areas of memory (maybe the
python garbage collector?).

This is running on a Sun V40 64 bit SMP with Fedora Core 3.   The same
code has been run on intel systems and the problem has not been seen -
this could be because the problem is specific to that hardware or
because the intel systems are not fast enough for a collision to occur.

My questions are:

1) is the theory plausible/likely?

2) what could I do to prove/disprove it?

3) has anyone else seen this problem?

4) are there any other situations that could be causing a continuous
stream of minor page faults?

5) WTF can I do about it?

Dave Kirby
               (dave.x.kirby at
                gmail dot
                com)

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to