Karthik Gurusamy wrote: > On Jul 2, 10:57 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > >>>>>I have found the stop-and-go between two processes on the same machine >>>>>leads to very poor throughput. By stop-and-go, I mean the producer and >>>>>consumer are constantly getting on and off of the CPU since the pipe >>>>>gets full (or empty for consumer). Note that a producer can't run at >>>>>its top speed as the scheduler will pull it out since it's output pipe >>>>>got filled up. >>... > If the problem does not require two way communication, which is > typical of a producer-consumer, it is a lot faster to allow P to fully > run before C is started. > > If P and C are tied using a pipe, in most linux like OS (QNX may be > doing something really smart as noted by John Nagle), there is a big > cost of scheduler swapping P and C constantly to use the CPU. You may > ask why? because the data flowing between P and C, has a small finite > space (the buffer). Once P fills it; it will block -- the scheduler > sees C is runnable and puts C on the CPU.
The killer case is where there's another thread or process other than C already ready to run when P blocks. The other thread, not C, usually gets control, because it was ready to run first, and not until the other thread runs out its time quantum does C get a turn. Then C gets to run briefly, drains out the pipe, and blocks. P gets to run, fills the pipe, and blocks. The compute-bound thread gets to run, runs for a full time quantum, and loses the CPU to C. Wash, rinse, repeat. The effect is that pipe-like producer-consumer systems may get only a small fraction of the available CPU time on a busy system. When testing a producer-consumer system, put a busy loop in the background and see if performance becomes terrible. It ought to drop by 50% against an equal-priority compute bound process; if it drops by far more than that, you have the problem described here. This problem is sometimes called "What you want is a subroutine call; what the OS gives you is an I/O operation." When you make a subroutine call on top of an I/O operation, you get these scheduling problems. John Nagle -- http://mail.python.org/mailman/listinfo/python-list