On Sun, Jun 17, 2001 at 08:56:13PM +0100, James R Grinter wrote:

% I think it isn't relevant. qmail-remote doesn't seem to use select,
% or at least it's nowhere in the path where my qmail-remote wedges.

Go look at timeoutread(), which *is* in your path.  The select is in
the line right before where you wedge.

% As to different OS behaviour, Solaris 2.6 (and 7) both say:

[Man page claims it doesn't do this.]

% whereas SunOS 4.1.4 (my usual 'old bsd system' benchmark) says:

[Man page unclear.]

% and I can tell you that I've not seen the problem happen with
% qmail-remote on SunOS 4.1.4.

Well, I don't necessarily trust man pages to tell the truth,
especially if this was added accidentally (i.e. if it's a bug).

And I still haven't seen anything to really convince me that any OS
actually does this.  I've only seen that a few people think some do,
that it could easily happen as a bug, and that it could explain the
hung qmail-remotes.  And it's easily fixed if it is the problem.

In other words, I'm not saying that this is the cause, only that it's
possible.

%  Indeed, I think DJB's code (and most
% other people's) compensates for both behaviours by setting the
% necessary FD's each time anyway.

It doesn't.  (Don't know about other people's.)  It assumes that the
fd_sets will be cleared on timeout.  Setting the fd_sets each time is
always necessary and doesn't protect against this issue, anyway.


In any case, since I did see (one) stuck process recently I built
myself a test to see if I could reproduce it.  I wasn't.  At least on
a RedHat linux 2.2.19-6.2.1 or -6.2.1smp, it looks like select acts
sanely on a timeout, at least some of the time.

I also put a debugging version of qmail-remote on my system, so if it
ever decides to hang again I can fling gdb at it.


Mark

Reply via email to