Re: qmail-remote (cry wolf?)

2001-06-18 Thread Mark Jefferys

On Mon, Jun 18, 2001 at 11:20:36PM -0400, Troy Settle wrote:

% How would I need to go about building a dubug version of qmail-remote?

I set conf-cc and conf-ld to 'gcc -g', edited timeoutread.c slightly
to save the return value of the select in a variable, then built
qmail-remote and put it in place of the live one.  I'll attach a patch
matching what I did to timeoutread.c.

% Also, how to terminate the process so that I can 'fling' gdb at it?

I wasn't planning on terminating it.  Rather I was thinking of using
gdb's "attach" command to take over the process, and then start
examining variables.  Mostly, I was going to wing it.

I expect the full attachment sequence to look something like this:

(gdb) attach 
(gdb) symbol-file /var/qmail/bin/qmail-remote
(gdb) directory 
(gdb) bt
(gdb) up   <-- repeat until at timeoutread() stack frame
(gdb) p res
(gdb) p fd
(gdb) p rfds   <-- or something like that

% With a little I can probably have output from gdb within a couple hours.

Good luck, then.


Mark



--- timeoutread.c   Mon Jun 15 03:53:16 1998
+++ timeoutread.c   Mon Jun 18 22:23:24 2001
@@ -7,6 +7,7 @@
 {
   fd_set rfds;
   struct timeval tv;
+  int res;
 
   tv.tv_sec = t;
   tv.tv_usec = 0;
@@ -14,7 +15,8 @@
   FD_ZERO(&rfds);
   FD_SET(fd,&rfds);
 
-  if (select(fd + 1,&rfds,(fd_set *) 0,(fd_set *) 0,&tv) == -1) return -1;
+  res = select(fd + 1,&rfds,(fd_set *) 0,(fd_set *) 0,&tv);
+  if (res == -1) return -1;
   if (FD_ISSET(fd,&rfds)) return read(fd,buf,len);
 
   errno = error_timeout;



Re: qmail-remote (cry wolf?)

2001-06-18 Thread Mark Jefferys

On Sun, Jun 17, 2001 at 08:56:13PM +0100, James R Grinter wrote:

% I think it isn't relevant. qmail-remote doesn't seem to use select,
% or at least it's nowhere in the path where my qmail-remote wedges.

Go look at timeoutread(), which *is* in your path.  The select is in
the line right before where you wedge.

% As to different OS behaviour, Solaris 2.6 (and 7) both say:

[Man page claims it doesn't do this.]

% whereas SunOS 4.1.4 (my usual 'old bsd system' benchmark) says:

[Man page unclear.]

% and I can tell you that I've not seen the problem happen with
% qmail-remote on SunOS 4.1.4.

Well, I don't necessarily trust man pages to tell the truth,
especially if this was added accidentally (i.e. if it's a bug).

And I still haven't seen anything to really convince me that any OS
actually does this.  I've only seen that a few people think some do,
that it could easily happen as a bug, and that it could explain the
hung qmail-remotes.  And it's easily fixed if it is the problem.

In other words, I'm not saying that this is the cause, only that it's
possible.

%  Indeed, I think DJB's code (and most
% other people's) compensates for both behaviours by setting the
% necessary FD's each time anyway.

It doesn't.  (Don't know about other people's.)  It assumes that the
fd_sets will be cleared on timeout.  Setting the fd_sets each time is
always necessary and doesn't protect against this issue, anyway.


In any case, since I did see (one) stuck process recently I built
myself a test to see if I could reproduce it.  I wasn't.  At least on
a RedHat linux 2.2.19-6.2.1 or -6.2.1smp, it looks like select acts
sanely on a timeout, at least some of the time.

I also put a debugging version of qmail-remote on my system, so if it
ever decides to hang again I can fling gdb at it.


Mark




Re: qmail-remote (cry wolf?)

2001-06-17 Thread Mark Jefferys

I came across the following, which *might* explain some of these
deadlocking problems:



[Summary: Some systems leave the fd_sets alone when select times out.]

If I read this right, timeoutconn/read/write (and anything else that
uses select) have to check for a result of 0 explicitly to be
completely portable.

Even if an OS doesn't do this intentionally, it's quite easy to see
someone forgetting to clear the fd_sets on a timeout by accident, so
some defensive coding against the problem (explicitly checking for a
result of 0) may be worthwhile.

Or this may just be a red herring...


Mark

N.B.  Although someone claimed to have seen a BSD man page reporting
that it wouldn't clear the fd_sets on a timeout, I was unable to find
any evidence of such a thing with Google.  And at least one standard
(Single UNIX Specification v2) has forbidden this kind of weirdness.

P.S.  And I just found one of these bloody hung qmail-remotes on one
of my systems!@#$!  Stuck in read of fd 3; directed at email.com (who
clearly have no clue how to set up DNS records for email, and are down
anyway).  Redhat Linux kernel 2.2.19-6.2.1smp.




Re: qmail does not handle timezones properly? - More Info

2001-05-13 Thread Mark Jefferys

On Sun, May 13, 2001 at 06:28:53PM +0200, Patrick Starrenburg wrote:

% *Linux box*
% [root@linuxbox patrick]# date
% Sun May 13 17:02:55 GMT+2 2001 - Check

Your clock seems to be set wrong.  According to Solaris and at least
one web page I dug up, , GMT+2 is a posix
time zone equivalent to GMT-0200 (!).  Your linux box thinks that you
are sitting somewhere in the Atlantic Ocean.

Try setting your local TZ to "Europe/Amsterdam", and reset your clock.


Mark