On Sun, Jun 17, 2001 at 08:56:13PM +0100, James R Grinter wrote:
% I think it isn't relevant. qmail-remote doesn't seem to use select,
% or at least it's nowhere in the path where my qmail-remote wedges.
Go look at timeoutread(), which *is* in your path. The select is in
the line right before where you wedge.
% As to different OS behaviour, Solaris 2.6 (and 7) both say:
[Man page claims it doesn't do this.]
% whereas SunOS 4.1.4 (my usual 'old bsd system' benchmark) says:
[Man page unclear.]
% and I can tell you that I've not seen the problem happen with
% qmail-remote on SunOS 4.1.4.
Well, I don't necessarily trust man pages to tell the truth,
especially if this was added accidentally (i.e. if it's a bug).
And I still haven't seen anything to really convince me that any OS
actually does this. I've only seen that a few people think some do,
that it could easily happen as a bug, and that it could explain the
hung qmail-remotes. And it's easily fixed if it is the problem.
In other words, I'm not saying that this is the cause, only that it's
possible.
% Indeed, I think DJB's code (and most
% other people's) compensates for both behaviours by setting the
% necessary FD's each time anyway.
It doesn't. (Don't know about other people's.) It assumes that the
fd_sets will be cleared on timeout. Setting the fd_sets each time is
always necessary and doesn't protect against this issue, anyway.
In any case, since I did see (one) stuck process recently I built
myself a test to see if I could reproduce it. I wasn't. At least on
a RedHat linux 2.2.19-6.2.1 or -6.2.1smp, it looks like select acts
sanely on a timeout, at least some of the time.
I also put a debugging version of qmail-remote on my system, so if it
ever decides to hang again I can fling gdb at it.
Mark