Re: Possible degenerate case in trigger handling?

2001-07-27 Thread MarkD

> > A second and more defensive measure is to issue a non-blocking read on
> > the pipe to drain all qmail-queue bytes *prior* to the todo

This is the solution I eventually used. Works like a charm. I've
appended the patch so that it gets into the archives. Variations from
the original post include opening the trigger file just the once as
well as setting it non-blocking at the time it's opened.  I've called
it the trigger-happy patch :>


Regards.

*** Makefile.orig   Mon Jun 15 03:53:16 1998
--- MakefileFri Jul 27 11:23:47 2001
***
*** 2114,2120 
./compile token822.c
  
  trigger.o: \
! compile trigger.c select.h open.h trigger.h hasnpbg1.h
./compile trigger.c
  
  triggerpull.o: \
--- 2114,2120 
./compile token822.c
  
  trigger.o: \
! compile trigger.c select.h open.h trigger.h hasnpbg1.h ndelay.h
./compile trigger.c
  
  triggerpull.o: \
*** trigger.orig.c  Mon Jun 15 03:53:16 1998
--- trigger.c   Thu Jul 26 18:02:07 2001
***
*** 1,4 
--- 1,5 
  #include "select.h"
+ #include "ndelay.h"
  #include "open.h"
  #include "trigger.h"
  #include "hasnpbg1.h"
***
*** 10,24 
  
  void trigger_set()
  {
!  if (fd != -1)
!close(fd);
! #ifdef HASNAMEDPIPEBUG1
!  if (fdw != -1)
!close(fdw);
! #endif
!  fd = open_read("lock/trigger");
  #ifdef HASNAMEDPIPEBUG1
!  fdw = open_write("lock/trigger");
  #endif
  }
  
--- 11,25 
  
  void trigger_set()
  {
!  if (fd == -1)
!   {
!fd = open_read("lock/trigger");
!if (fd != -1)
!  ndelay_on(fd);
!   }
  #ifdef HASNAMEDPIPEBUG1
!  if (fdw == -1)
!fdw = open_write("lock/trigger");
  #endif
  }
  
***
*** 36,41 
  int trigger_pulled(rfds)
  fd_set *rfds;
  {
!  if (fd != -1) if (FD_ISSET(fd,rfds)) return 1;
   return 0;
  }
--- 37,48 
  int trigger_pulled(rfds)
  fd_set *rfds;
  {
!  char buf[64];
! 
!  if ((fd != -1) && FD_ISSET(fd,rfds))
!   {
!while (read(fd,buf,sizeof(buf)) == sizeof(buf)) ;
!return 1;
!   }
   return 0;
  }



Re: Possible degenerate case in trigger handling?

2001-07-25 Thread MarkD

(To follow up on my own post)

> In other words, in the tiny window that qmail-send leaves for the
> kernel to flush the pipe, there is always at least one qmail-queue
> process with the trigger open. Ergo a resource burning spin that
> degenerates if the injection rate is high and regular (exactly the
> situation for the servers I noticed this on).
> 
...

> Fortunately there are a couple of remedies.
> 
> At the very least, the flush window can be made substantially larger
> by closing trigger as soon as the select returns.

This of course is not a good idea as it means that any qmail-queue
notify will be lost while trigger is closed. So the closed-window size
is a catch-22. Keep it small and a busy system may spin as
described. Make it larger and you increase the probably of missing a
notify from qmail-queue.

> A second and more defensive measure is to issue a non-blocking read on
> the pipe to drain all qmail-queue bytes *prior* to the todo
> scan. Perhaps both of these could be done in the trigger_pull
> routine. I've appended a patch that gives the idea in code (it's
> untested).

I still think this might be a viable solution. In fact I wonder
whether qmail-send can simply keep the pipe open and do a non-blocking
read to drain the notifys rather than rely on the open/close flush
semantics. That way there is no close-window at all, so losing a
notify becomes impossible rather than unlikely, it's simpler code, it
eliminates the false triggers I'm seeing and probably creates less
load on the OS by avoiding those repeated opens and closes.


Regards.



Possible degenerate case in trigger handling?

2001-07-25 Thread MarkD

(I originally found this problem on a very busy FreeBSD 4.3 system
running with the bigtodo patch - it is much less likely to occur with
a standard qmail.)

First some background regarding trigger. When qmail-queue has a mail
for qmail-send it opens the named pipe, trigger, writes a byte to it,
closes trigger and exits.

qmail-send notices this trigger in the following loop:

open trigger
select: is trigger readable?
...
todo_do()
...
close trigger
open trigger
...
select: is trigger readable
etc.

A couple of notes on this loop:

o The todo_do() involves a potentially expensive directory scan - if
  lots of injections are occuring or if you use the bigtodo patch.

o The idea behind closing and opening trigger is to flush the byte
  written by qmail-queue so that next time around the loop the select
  blocks until another qmail-queue comes along.

The problem I've found relates to when the flush occurs on a named
pipe. At least on FreeBSD, a named pipe is only flushed when no other
process has the pipe opened.

On a very busy system the chance of this occuring reduces as there is
almost always one or more qmail-queue processes running. Futhermore
the code order of qmail-send is such that the window in which no
qmail-queue process can exist is very very small. It's the tiny window
between the close that immediately precedes the open in trigger_set().

The degenerate case I see is that qmail-send starts spinning on the
select()--todo_do() loop as select() always indicates that the trigger
is readable. This spin involves a directory scan of todo which slows
the qmail-queue processes as they too are writing to the same
directory/file system. Since the qmail-queue processes are further
slowed, qmail-send continues to spin on a readable trigger.

In other words, in the tiny window that qmail-send leaves for the
kernel to flush the pipe, there is always at least one qmail-queue
process with the trigger open. Ergo a resource burning spin that
degenerates if the injection rate is high and regular (exactly the
situation for the servers I noticed this on).

Returning to the bigtodo patch, that of course exacerbates the
situation as the window between the close and open in trigger_set
forms an even smaller part of the loop.

Fortunately there are a couple of remedies.

At the very least, the flush window can be made substantially larger
by closing trigger as soon as the select returns.

A second and more defensive measure is to issue a non-blocking read on
the pipe to drain all qmail-queue bytes *prior* to the todo
scan. Perhaps both of these could be done in the trigger_pull
routine. I've appended a patch that gives the idea in code (it's
untested).

Question: has anyone else seen this? You most likely will only see it
on a very busy system that has bigtodo.


Regards.

*** trigger.orig.c  Mon Jun 15 03:53:16 1998
--- trigger.c   Wed Jul 25 16:50:40 2001
***
*** 1,4 
--- 1,5 
  #include "select.h"
+ #include "ndelay.h"
  #include "open.h"
  #include "trigger.h"
  #include "hasnpbg1.h"
***
*** 36,41 
  int trigger_pulled(rfds)
  fd_set *rfds;
  {
!  if (fd != -1) if (FD_ISSET(fd,rfds)) return 1;
   return 0;
  }
--- 37,55 
  int trigger_pulled(rfds)
  fd_set *rfds;
  {
!  char buf[64];
! 
!  if ((fd != -1) && FD_ISSET(fd,rfds))
!   {
!ndelay_on(fd);
!while (read(fd,buf,sizeof(buf)) > 0) ;
!close(fd);
!fd = -1;
! #ifdef HASNAMEDPIPEBUG1
!if (fdw != -1)
!  close(fdw);
! #endif
!return 1;
!   }
   return 0;
  }