I came across the following, which *might* explain some of these
deadlocking problems:
<http://kt.zork.net/kernel-traffic/kt20010611_121.html#6>
[Summary: Some systems leave the fd_sets alone when select times out.]
If I read this right, timeoutconn/read/write (and anything else that
uses select) have to check for a result of 0 explicitly to be
completely portable.
Even if an OS doesn't do this intentionally, it's quite easy to see
someone forgetting to clear the fd_sets on a timeout by accident, so
some defensive coding against the problem (explicitly checking for a
result of 0) may be worthwhile.
Or this may just be a red herring...
Mark
N.B. Although someone claimed to have seen a BSD man page reporting
that it wouldn't clear the fd_sets on a timeout, I was unable to find
any evidence of such a thing with Google. And at least one standard
(Single UNIX Specification v2) has forbidden this kind of weirdness.
P.S. And I just found one of these bloody hung qmail-remotes on one
of my systems!@#$! Stuck in read of fd 3; directed at email.com (who
clearly have no clue how to set up DNS records for email, and are down
anyway). Redhat Linux kernel 2.2.19-6.2.1smp.