Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-17 Thread Chris Shenton
Don Lewis [EMAIL PROTECTED] writes:

 Thanks for doing the testing.  I just committed this patch.

Seems fine here too -- many thanks.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Tim Robbins
On Sun, Jun 15, 2003 at 08:43:15PM -0400, Chris Shenton wrote:

 I've been running qmail for years and like it, installed pretty much
 per www.LifeWithQmail.org.  My main system was running FreeBSD
 5.0-RELEASE and -CURRENT and qmail was fine.  When I just upgraded to
 5.1-CURRENT a couple days back, the qmail-send process started using
 all CPU.

This looks like a bug in the named pipe code. Reverting
sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
away. I haven't tracked down exactly what change between RELENG_5_0 and
RELENG_5_1 caused the problem.


Tim
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Thorsten Schroeder
Hi,

On Mon, 15 Jun 2003, Chris Shenton wrote:

 [...] qmail is run under daemontools and all work fine (the configuration
 is 2 years old!), but when I delivery the first mail (localy or remote)
 the qmail-send process fire up to 100% of CPU infinitely

 All other mail are right delivery, and the CPU use is the only problem, I
 see in qmail-send.c that select() function, after the first message,
 allways return 1

same here too.
I don't know what it could be - perhaps a problem with named pipes
(lock/trigger)?

You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt

Would be nice if anyone have an idea :)

 A truss shows me it's running in a tight loop over this code:
 close(9) = 0 (0x0)
 select(0x9,0xbfbffcbc,0xbfbffc3c,0x0,0xbfbffc24) = 1 (0x1)

 Anyone else seen this or know what in FreeBSD-5.1 might have changed to cause
 this?  Any thoughts on how I might go about diagnosing this any better?

greetings,

  thorsten

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, Thorsten Schroeder wrote:
 Hi,
 
 On Mon, 15 Jun 2003, Chris Shenton wrote:
 
 [...] qmail is run under daemontools and all work fine (the configuration
 is 2 years old!), but when I delivery the first mail (localy or remote)
 the qmail-send process fire up to 100% of CPU infinitely

 All other mail are right delivery, and the CPU use is the only problem, I
 see in qmail-send.c that select() function, after the first message,
 allways return 1
 
 same here too.
 I don't know what it could be - perhaps a problem with named pipes
 (lock/trigger)?
 
 You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt
 
 Would be nice if anyone have an idea :)
 
 A truss shows me it's running in a tight loop over this code:
 close(9) = 0 (0x0)
 select(0x9,0xbfbffcbc,0xbfbffc3c,0x0,0xbfbffc24) = 1 (0x1)
 
 Anyone else seen this or know what in FreeBSD-5.1 might have changed to cause
 this?  Any thoughts on how I might go about diagnosing this any better?

Which version of fifo_vnops.c?  If the problem is present in
5.1-RELEASE, then the problem is likely to be the change made in 1.79
and 1.85.  If the problem didn't show up until after the 5.1-RELEASE,
then the problem could be the changes in 1.87 or 1.88.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Tim Robbins
On Mon, Jun 16, 2003 at 04:09:51PM +1000, Tim Robbins wrote:

 On Sun, Jun 15, 2003 at 08:43:15PM -0400, Chris Shenton wrote:
 
  I've been running qmail for years and like it, installed pretty much
  per www.LifeWithQmail.org.  My main system was running FreeBSD
  5.0-RELEASE and -CURRENT and qmail was fine.  When I just upgraded to
  5.1-CURRENT a couple days back, the qmail-send process started using
  all CPU.
 
 This looks like a bug in the named pipe code. Reverting
 sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
 away. I haven't tracked down exactly what change between RELENG_5_0 and
 RELENG_5_1 caused the problem.

Looks like revision 1.86 works, but it stops working with 1.87. Moving the
soclose() calls to fifo_inactive() may have caused it.


Tim
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Thorsten Schroeder
Hi,

On Sun, 15 Jun 2003, Don Lewis wrote:

  I don't know what it could be - perhaps a problem with named pipes
  (lock/trigger)?
 
  You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt

 Which version of fifo_vnops.c?  If the problem is present in
 5.1-RELEASE, then the problem is likely to be the change made in 1.79
 and 1.85.  If the problem didn't show up until after the 5.1-RELEASE,
 then the problem could be the changes in 1.87 or 1.88.

FreeBSD 5.1-CURRENT #1: Thu Jun  5 19:29:29 CEST 2003

fifo_vnops.c:

$FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $

bye,

  thorsten



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, Tim Robbins wrote:
 On Mon, Jun 16, 2003 at 04:09:51PM +1000, Tim Robbins wrote:
 
 On Sun, Jun 15, 2003 at 08:43:15PM -0400, Chris Shenton wrote:
 
  I've been running qmail for years and like it, installed pretty much
  per www.LifeWithQmail.org.  My main system was running FreeBSD
  5.0-RELEASE and -CURRENT and qmail was fine.  When I just upgraded to
  5.1-CURRENT a couple days back, the qmail-send process started using
  all CPU.
 
 This looks like a bug in the named pipe code. Reverting
 sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
 away. I haven't tracked down exactly what change between RELENG_5_0 and
 RELENG_5_1 caused the problem.
 
 Looks like revision 1.86 works, but it stops working with 1.87. Moving the
 soclose() calls to fifo_inactive() may have caused it.

This is an interesting observation, but I'm not sure why it would make a
difference.  I haven't looked at the qmail source, but it looks like it
is doing a non-blocking open on the fifo, calling select() on the fd,
and hoping that select() waits for a writer to open the fifo before
returning with an indication that the descriptor is readable.

It looks like the select code is calling the soreadable() macro to
determine if the fifo descriptor is readable, and the soreadable() macro
returns a true value if the SS_CANTRCVMORE socket flag is set, which
would indicate an EOF condition.

I might believe that I accidentally changed the setting of this flag,
but I just compared fifo_vnops.c rev 1.78 with 1.87 and I believe this
flag should be set the same way in both versions.

In both versions, fifo_close() always calls socantrcvmore(), which sets
SS_CANTRCVMORE when the writer count drops to zero.  Prior to 1.87,
fifo_close() also destroyed the sockets when the reference count dropped
to zero, which caused fifo_open() to recreate the sockets when the fifo
was opened again, and when it did, fifo_open() set the SS_CANTRCVMORE
flag again.

The posted qmail syscall trace looks like what I would expect to see in
the present implementation.  I can't explain why it would behave any
differently prior to 1.87 ...

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, I wrote:
 On 16 Jun, Tim Robbins wrote:

 This looks like a bug in the named pipe code. Reverting
 sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
 away. I haven't tracked down exactly what change between RELENG_5_0 and
 RELENG_5_1 caused the problem.
 
 Looks like revision 1.86 works, but it stops working with 1.87. Moving the
 soclose() calls to fifo_inactive() may have caused it.
 
 This is an interesting observation, but I'm not sure why it would make a
 difference.  I haven't looked at the qmail source, but it looks like it
 is doing a non-blocking open on the fifo, calling select() on the fd,
 and hoping that select() waits for a writer to open the fifo before
 returning with an indication that the descriptor is readable.
 
 It looks like the select code is calling the soreadable() macro to
 determine if the fifo descriptor is readable, and the soreadable() macro
 returns a true value if the SS_CANTRCVMORE socket flag is set, which
 would indicate an EOF condition.
 
 I might believe that I accidentally changed the setting of this flag,
 but I just compared fifo_vnops.c rev 1.78 with 1.87 and I believe this
 flag should be set the same way in both versions.
 
 In both versions, fifo_close() always calls socantrcvmore(), which sets
 SS_CANTRCVMORE when the writer count drops to zero.  Prior to 1.87,
 fifo_close() also destroyed the sockets when the reference count dropped
 to zero, which caused fifo_open() to recreate the sockets when the fifo
 was opened again, and when it did, fifo_open() set the SS_CANTRCVMORE
 flag again.
 
 The posted qmail syscall trace looks like what I would expect to see in
 the present implementation.  I can't explain why it would behave any
 differently prior to 1.87 ...

The plot thickens ...

I ran this bit of code on both 5.1 current with version 1.88 of
fifo_vnops.c, and 4.8-stable:

#include sys/types.h
#include sys/time.h
#include unistd.h
#include fcntl.h
main()
{
int fd;
fd_set readfds;

fd = open(myfifo, O_RDONLY | O_NONBLOCK);

printf(before the loop\n);
while (1) {
FD_ZERO(readfds);
FD_SET(fd, readfds);
printf(%d %d\n, fd, select(20, readfds, NULL, NULL, NULL));
}
exit(0);
}

On 4.8-stable, select() immediately returns a 1, whether or not the
fifo has ever been opened for writing.

On 5.1-current, select() waits forever, even if the fifo has been opened
for writing by another process.  Select() only returns when something
has actually been written to the fifo, and since this process doesn't
read anything from the fifo, it spins on select() forever.

If some data is getting written to the fifo, it doesn't look like qmail
consumes it, and since fifo_close in 1.87 doesn't destroy the sockets,
it looks like the data is hanging around in the fifo while neither end
is open, and qmail stumbles across this data when it calls select()
after re-opening the fifo.

Now there are two questions that I can't answer:

Why is my analysis of select() and the SS_CANTRCVMORE flag
incorrect in 5.1-current with version 1.87 or 1.88 of
fifo_vnops.c.

Why doesn't qmail get stuck in a similar loop in 4.8-stable,
since select always returns true for reading on a fifo with no
writers?
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Bruce Evans
On Mon, 16 Jun 2003, Don Lewis wrote:

 On 16 Jun, I wrote:
  On 16 Jun, Tim Robbins wrote:

  This looks like a bug in the named pipe code. Reverting
  sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
  away. I haven't tracked down exactly what change between RELENG_5_0 and
  RELENG_5_1 caused the problem.
 
  Looks like revision 1.86 works, but it stops working with 1.87. Moving the
  soclose() calls to fifo_inactive() may have caused it.
 
  This is an interesting observation, but I'm not sure why it would make a
  difference.  I haven't looked at the qmail source, but it looks like it
  is doing a non-blocking open on the fifo, calling select() on the fd,
  and hoping that select() waits for a writer to open the fifo before
  returning with an indication that the descriptor is readable.

In my review of 1.87, I forgot to ask you how atomic the close is with part
of it moved out to fifo_inactive().  I think it's important that all
traces of the old open have gone away (as far as applications can tell)
when the last close returns.

  It looks like the select code is calling the soreadable() macro to
  determine if the fifo descriptor is readable, and the soreadable() macro
  returns a true value if the SS_CANTRCVMORE socket flag is set, which
  would indicate an EOF condition.

fifo_close() sets this flag and the corresponding send flag on last close,
so there is no direct problem here.

  ...
  The posted qmail syscall trace looks like what I would expect to see in
  the present implementation.  I can't explain why it would behave any
  differently prior to 1.87 ...

 The plot thickens ...

 I ran this bit of code on both 5.1 current with version 1.88 of
 fifo_vnops.c, and 4.8-stable:

 #include sys/types.h
 #include sys/time.h
 #include unistd.h
 #include fcntl.h
 main()
 {
 int fd;
 fd_set readfds;

 fd = open(myfifo, O_RDONLY | O_NONBLOCK);

 printf(before the loop\n);
 while (1) {
 FD_ZERO(readfds);
 FD_SET(fd, readfds);
 printf(%d %d\n, fd, select(20, readfds, NULL, NULL, NULL));
 }
 exit(0);
 }

 On 4.8-stable, select() immediately returns a 1, whether or not the
 fifo has ever been opened for writing.

 On 5.1-current, select() waits forever, even if the fifo has been opened
 for writing by another process.  Select() only returns when something
 has actually been written to the fifo, and since this process doesn't
 read anything from the fifo, it spins on select() forever.

 If some data is getting written to the fifo, it doesn't look like qmail
 consumes it, and since fifo_close in 1.87 doesn't destroy the sockets,
 it looks like the data is hanging around in the fifo while neither end
 is open, and qmail stumbles across this data when it calls select()
 after re-opening the fifo.

 Now there are two questions that I can't answer:

   Why is my analysis of select() and the SS_CANTRCVMORE flag
 incorrect in 5.1-current with version 1.87 or 1.88 of
 fifo_vnops.c.

I think it is correct, assuming that something writes to the fifo.
Writing might be part of synchronization but actually reading the
data should not be necessary since the last close must discard the
data (POSIX spec).

   Why doesn't qmail get stuck in a similar loop in 4.8-stable,
 since select always returns true for reading on a fifo with no
 writers?

Don't know.  Maybe it uses autoconfig to handle the 4.8 behaviour.
The 4.8 behaviour is normal compared with the buggy behaviour of
not discarding data on last close, so applications should handle it
better :-).  Maybe qmain spins under 4.8 too, but only until
synchronization is achieved.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, Bruce Evans wrote:
 On Mon, 16 Jun 2003, Don Lewis wrote:
 
 On 16 Jun, I wrote:
  On 16 Jun, Tim Robbins wrote:

  This looks like a bug in the named pipe code. Reverting
  sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go
  away. I haven't tracked down exactly what change between RELENG_5_0 and
  RELENG_5_1 caused the problem.
 
  Looks like revision 1.86 works, but it stops working with 1.87. Moving the
  soclose() calls to fifo_inactive() may have caused it.
 
  This is an interesting observation, but I'm not sure why it would make a
  difference.  I haven't looked at the qmail source, but it looks like it
  is doing a non-blocking open on the fifo, calling select() on the fd,
  and hoping that select() waits for a writer to open the fifo before
  returning with an indication that the descriptor is readable.
 
 In my review of 1.87, I forgot to ask you how atomic the close is with part
 of it moved out to fifo_inactive().  I think it's important that all
 traces of the old open have gone away (as far as applications can tell)
 when the last close returns.

I hadn't taken queued data into consideration.  Now that I've looked at
this more closely, there are other problems in both the old and new
code.  If a process calls fcntl(fd, F_SETOWN, ...) on one end of the
fifo, that should be undone when that end of the fifo is closed.  In the
old implementation, that only happens when both ends of the fifo are
closed and the sockets are deleted.


 On 5.1-current, select() waits forever, even if the fifo has been opened
 for writing by another process.  Select() only returns when something
 has actually been written to the fifo, and since this process doesn't
 read anything from the fifo, it spins on select() forever.

 If some data is getting written to the fifo, it doesn't look like qmail
 consumes it, and since fifo_close in 1.87 doesn't destroy the sockets,
 it looks like the data is hanging around in the fifo while neither end
 is open, and qmail stumbles across this data when it calls select()
 after re-opening the fifo.

 Now there are two questions that I can't answer:

  Why is my analysis of select() and the SS_CANTRCVMORE flag
 incorrect in 5.1-current with version 1.87 or 1.88 of
 fifo_vnops.c.
 
 I think it is correct, assuming that something writes to the fifo.
 Writing might be part of synchronization but actually reading the
 data should not be necessary since the last close must discard the
 data (POSIX spec).

It sure looks to me like SS_CANTRCVMORE is always set when the write end
of the fifo is closed, no matter whether the the sockets were freshly
allocated by a fifo_open() call on the read end of the fifo, or because
the the last writer closed the write end of the fifo.  It sure looks
like select() should immediately return if this flag is set, but it is
not returning ...

Actually, something seems broken.  I modified my little test program to
actually read the data, which works just fine, but select() still blocks
when the writer closes the fifo, so there doesn't seem to be a way to
detect the EOF.

  Why doesn't qmail get stuck in a similar loop in 4.8-stable,
 since select always returns true for reading on a fifo with no
 writers?
 
 Don't know.  Maybe it uses autoconfig to handle the 4.8 behaviour.
 The 4.8 behaviour is normal compared with the buggy behaviour of
 not discarding data on last close, so applications should handle it
 better :-).  Maybe qmain spins under 4.8 too, but only until
 synchronization is achieved.
 
 Bruce

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, Thorsten Schroeder wrote:
 Hi,
 
 On Sun, 15 Jun 2003, Don Lewis wrote:
 
  I don't know what it could be - perhaps a problem with named pipes
  (lock/trigger)?
 
  You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt
 
 Which version of fifo_vnops.c?  If the problem is present in
 5.1-RELEASE, then the problem is likely to be the change made in 1.79
 and 1.85.  If the problem didn't show up until after the 5.1-RELEASE,
 then the problem could be the changes in 1.87 or 1.88.
 
 FreeBSD 5.1-CURRENT #1: Thu Jun  5 19:29:29 CEST 2003
 
 fifo_vnops.c:
 
 $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $


Try upgrading to 1.88 and applying this patch:

Index: sys/fs/fifofs/fifo_vnops.c
===
RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v
retrieving revision 1.88
diff -u -r1.88 fifo_vnops.c
--- sys/fs/fifofs/fifo_vnops.c  13 Jun 2003 06:58:11 -  1.88
+++ sys/fs/fifofs/fifo_vnops.c  16 Jun 2003 08:44:20 -
@@ -70,7 +70,6 @@
 static int fifo_lookup(struct vop_lookup_args *);
 static int fifo_open(struct vop_open_args *);
 static int fifo_close(struct vop_close_args *);
-static int fifo_inactive(struct vop_inactive_args *);
 static int fifo_read(struct vop_read_args *);
 static int fifo_write(struct vop_write_args *);
 static int fifo_ioctl(struct vop_ioctl_args *);
@@ -98,7 +97,6 @@
{ vop_create_desc, (vop_t *) vop_panic },
{ vop_getattr_desc,(vop_t *) vop_ebadf },
{ vop_getwritemount_desc,  (vop_t *) vop_stdgetwritemount },
-   { vop_inactive_desc,   (vop_t *) fifo_inactive },
{ vop_ioctl_desc,  (vop_t *) fifo_ioctl },
{ vop_kqfilter_desc,   (vop_t *) fifo_kqfilter },
{ vop_lease_desc,  (vop_t *) vop_null },
@@ -556,32 +554,18 @@
if (fip-fi_writers == 0)
socantrcvmore(fip-fi_readsock);
}
-   VOP_UNLOCK(vp, 0, td);
-   return (0);
-}
-
-static int
-fifo_inactive(ap)
-   struct vop_inactive_args /* {
-   struct vnode *a_vp;
-   struct thread *a_td;
-   } */ *ap;
-{
-   struct vnode *vp = ap-a_vp;
-   struct fifoinfo *fip = vp-v_fifoinfo;
-
VI_LOCK(vp);
-   if (fip != NULL  vp-v_usecount == 0) {
+   if (vp-v_usecount == 1) {
vp-v_fifoinfo = NULL;
VI_UNLOCK(vp);
(void)soclose(fip-fi_readsock);
(void)soclose(fip-fi_writesock);
FREE(fip, M_VNODE);
-   }
-   VOP_UNLOCK(vp, 0, ap-a_td);
+   } else
+   VI_UNLOCK(vp);
+   VOP_UNLOCK(vp, 0, td);
return (0);
 }
-
 
 /*
  * Print out internal contents of a fifo vnode.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Terry Lambert
Don Lewis wrote:
 Actually, something seems broken.  I modified my little test program to
 actually read the data, which works just fine, but select() still blocks
 when the writer closes the fifo, so there doesn't seem to be a way to
 detect the EOF.

I think this should be covered under the exceptional event
and read select flags (a subsequent read will return 0).

Also, you should remember that qmail opens the thing with
non-blocking I/O, and then expects the select to block.  Very
odd program, qmail.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Bruce Evans
On Mon, 16 Jun 2003, Don Lewis wrote:

 On 16 Jun, Bruce Evans wrote:
  In my review of 1.87, I forgot to ask you how atomic the close is with part
  of it moved out to fifo_inactive().  I think it's important that all
  traces of the old open have gone away (as far as applications can tell)
  when the last close returns.

 I hadn't taken queued data into consideration.  Now that I've looked at
 this more closely, there are other problems in both the old and new
 code.  If a process calls fcntl(fd, F_SETOWN, ...) on one end of the
 fifo, that should be undone when that end of the fifo is closed.  In the
 old implementation, that only happens when both ends of the fifo are
 closed and the sockets are deleted.

F_SETOWN (and associated signal delivery) is even more broken than that :-].
This fcntl() should applied to the file (though not just the file descriptor),
so its effect should be limited to fd's open in the file instance and go
away when all thse are closed.  However, F_SETOWN (and associated signal
delivery) actually applies to the socket for fifos.  It doesn't work quite
right for ttys either.  F_SETOWN apparently isn't used in ways complicated
enough to require it to work right.

  Now there are two questions that I can't answer:
 
 Why is my analysis of select() and the SS_CANTRCVMORE flag
  incorrect in 5.1-current with version 1.87 or 1.88 of
  fifo_vnops.c.
 
  I think it is correct, assuming that something writes to the fifo.
  Writing might be part of synchronization but actually reading the
  data should not be necessary since the last close must discard the
  data (POSIX spec).

 It sure looks to me like SS_CANTRCVMORE is always set when the write end
 of the fifo is closed, no matter whether the the sockets were freshly
 allocated by a fifo_open() call on the read end of the fifo, or because
 the the last writer closed the write end of the fifo.  It sure looks
 like select() should immediately return if this flag is set, but it is
 not returning ...

Alfred changed the semantics for 5.x.  I thought that you knew this.
I finally gave up resisting this change after a lot of email :-).  In
5.x, SS_CANTRCVMORE often has no effect for fifos (it still works
normally for sockets).  fifo_poll() normally calls soo_poll() with
POLLIN converted to POLLINIGNEOF.  This causes soo_poll() (sopoll())
to skip the usual SS_CANTRCVMORE check (which is inside soreadable())
and check the watermark instead, so that select() on a fifo normally
waits for data even when the fifo is open in nonblocking mode and
SS_CANTRCVMORE is set.  Blocking in select() even in nonblocking mode
is usually what is wanted, but is not what is wanted for detecting
EOF.  4.8 handles EOF detection (== all writers going away in the context
of fifos) better at a cost of providing no good way to wait for the
first writer.  We changed it since all other systems seem to do it like
5.x and few applications understand this.

 Actually, something seems broken.  I modified my little test program to
 actually read the data, which works just fine, but select() still blocks
 when the writer closes the fifo, so there doesn't seem to be a way to
 detect the EOF.

Hmm, we may have changed too much.  EOF can be detected using poll() instead
of select() and seting POLLIN and POLLINIGNEOF in the poll flags (this stops
fifo_poll() clearing POLLIN -- see the comment), but the POLLINIGNEOF is
not documented at the application level and is probably never used there.
I suspect that other systems have more magic to handle EOF.  I tried to
avoid such magic since I think the state of the fifo should be the same
when there are no writers (and no data) no matter how the state of having
no writers was reached (otherwise I think the state depends too much on
races between open() for reading and close() by the last writer).  POSIX
is clear enough on this for read/write but fuzzy for select/poll.

Bruce
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Thorsten Schroeder
Hi,

On Mon, 16 Jun 2003, Don Lewis wrote:

  FreeBSD 5.1-CURRENT #1: Thu Jun  5 19:29:29 CEST 2003
 
  fifo_vnops.c:
 
  $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $

 Try upgrading to 1.88 and applying this patch:

 Index: sys/fs/fifofs/fifo_vnops.c
 ===
 RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v
 retrieving revision 1.88
 diff -u -r1.88 fifo_vnops.c
 --- sys/fs/fifofs/fifo_vnops.c13 Jun 2003 06:58:11 -  1.88
 +++ sys/fs/fifofs/fifo_vnops.c16 Jun 2003 08:44:20 -
[...]

Yes! This seems to work fine :)

qmail-send doesn't increase cpu usage after the first mail anymore.

Thanks a lot,

  Thorsten


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Jesse Guardiani
Thorsten Schroeder wrote:

 Hi,
 
 On Mon, 16 Jun 2003, Don Lewis wrote:
 
  FreeBSD 5.1-CURRENT #1: Thu Jun  5 19:29:29 CEST 2003
 
  fifo_vnops.c:
 
  $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32
  truckman Exp $
 
 Try upgrading to 1.88 and applying this patch:

 Index: sys/fs/fifofs/fifo_vnops.c
 ===
 RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v
 retrieving revision 1.88
 diff -u -r1.88 fifo_vnops.c
 --- sys/fs/fifofs/fifo_vnops.c   13 Jun 2003 06:58:11 -  1.88
 +++ sys/fs/fifofs/fifo_vnops.c   16 Jun 2003 08:44:20 -
 [...]
 
 Yes! This seems to work fine :)

I run qmail on my 4.8 servers.

For my sanity, is this a problem in 5.1-RELEASE, or in code after 5.1-RELEASE?
We haven't upgraded to 5.1 yet (and don't intend to for a while), but I thought
I'd ask since this bug would cripple our mail server.

-- 
Jesse Guardiani, Systems Administrator
WingNET Internet Services,
P.O. Box 2605 // Cleveland, TN 37320-2605
423-559-LINK (v)  423-559-5145 (f)
http://www.wingnet.net


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, Thorsten Schroeder wrote:
 Hi,
 
 On Mon, 16 Jun 2003, Don Lewis wrote:
 
  FreeBSD 5.1-CURRENT #1: Thu Jun  5 19:29:29 CEST 2003
 
  fifo_vnops.c:
 
  $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $
 
 Try upgrading to 1.88 and applying this patch:

 Index: sys/fs/fifofs/fifo_vnops.c
 ===
 RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v
 retrieving revision 1.88
 diff -u -r1.88 fifo_vnops.c
 --- sys/fs/fifofs/fifo_vnops.c   13 Jun 2003 06:58:11 -  1.88
 +++ sys/fs/fifofs/fifo_vnops.c   16 Jun 2003 08:44:20 -
 [...]
 
 Yes! This seems to work fine :)
 
 qmail-send doesn't increase cpu usage after the first mail anymore.
 
 Thanks a lot,

Thanks for doing the testing.  I just committed this patch.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, Jesse Guardiani wrote:

 I run qmail on my 4.8 servers.
 
 For my sanity, is this a problem in 5.1-RELEASE, or in code after 5.1-RELEASE?
 We haven't upgraded to 5.1 yet (and don't intend to for a while), but I thought
 I'd ask since this bug would cripple our mail server.

It was broken in 5.1-CURRENT shortly after 5.1-RELEASE, until I
committed a patch a few minutes ago.  5.1-RELEASE is fine.  The
problematic versions of fifo_vnops.c are 1.87 and 1.88.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-16 Thread Don Lewis
On 16 Jun, Bruce Evans wrote:
 On Mon, 16 Jun 2003, Don Lewis wrote:
 
 On 16 Jun, Bruce Evans wrote:
  In my review of 1.87, I forgot to ask you how atomic the close is with part
  of it moved out to fifo_inactive().  I think it's important that all
  traces of the old open have gone away (as far as applications can tell)
  when the last close returns.

 I hadn't taken queued data into consideration.  Now that I've looked at
 this more closely, there are other problems in both the old and new
 code.  If a process calls fcntl(fd, F_SETOWN, ...) on one end of the
 fifo, that should be undone when that end of the fifo is closed.  In the
 old implementation, that only happens when both ends of the fifo are
 closed and the sockets are deleted.
 
 F_SETOWN (and associated signal delivery) is even more broken than that :-].
 This fcntl() should applied to the file (though not just the file descriptor),
 so its effect should be limited to fd's open in the file instance and go
 away when all thse are closed.  However, F_SETOWN (and associated signal
 delivery) actually applies to the socket for fifos.  It doesn't work quite
 right for ttys either.  F_SETOWN apparently isn't used in ways complicated
 enough to require it to work right.

There is a fundamental architectural problem -- devices and files don't
have a list of the descriptors that have them open.  That would require
putting descriptors on another list (and dealing with the necessary
locking), which would also bloat the size of the descriptor structure.
Storing the F_SETOWN info there would bloat all descriptors even more
rather than the relative handful of device structures that support this
feature.

  Now there are two questions that I can't answer:
 
Why is my analysis of select() and the SS_CANTRCVMORE flag
  incorrect in 5.1-current with version 1.87 or 1.88 of
  fifo_vnops.c.
 
  I think it is correct, assuming that something writes to the fifo.
  Writing might be part of synchronization but actually reading the
  data should not be necessary since the last close must discard the
  data (POSIX spec).

 It sure looks to me like SS_CANTRCVMORE is always set when the write end
 of the fifo is closed, no matter whether the the sockets were freshly
 allocated by a fifo_open() call on the read end of the fifo, or because
 the the last writer closed the write end of the fifo.  It sure looks
 like select() should immediately return if this flag is set, but it is
 not returning ...
 
 Alfred changed the semantics for 5.x.  I thought that you knew this.
 I finally gave up resisting this change after a lot of email :-).  In
 5.x, SS_CANTRCVMORE often has no effect for fifos (it still works
 normally for sockets).  fifo_poll() normally calls soo_poll() with
 POLLIN converted to POLLINIGNEOF.  This causes soo_poll() (sopoll())
 to skip the usual SS_CANTRCVMORE check (which is inside soreadable())
 and check the watermark instead, so that select() on a fifo normally
 waits for data even when the fifo is open in nonblocking mode and
 SS_CANTRCVMORE is set.

Nope, I didn't know this, and I missed the POLLIN-POLLINIGNEOF
conversion when I was tracing the code.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade

2003-06-15 Thread Fred Souza
 I've been running qmail for years and like it, installed pretty much
 per www.LifeWithQmail.org.  My main system was running FreeBSD
 5.0-RELEASE and -CURRENT and qmail was fine.  When I just upgraded to
 5.1-CURRENT a couple days back, the qmail-send process started using
 all CPU.

  [snip]

 Anyone else seen this or know what in FreeBSD-5.1 might have changed to cause
 this?  Any thoughts on how I might go about diagnosing this any better?

  I saw this too, but couldn't get it fixed either. My solution
  (hopefully temporary) was to switch to another MTA.


  Fred


-- 
I used to think romantic love was a neurosis shared by two, a supreme
foolishness.  I no longer thought that.  There's nothing foolish in
loving anyone.  Thinking you'll be loved in return is what's foolish.
-- Rita Mae Brown


pgp0.pgp
Description: PGP signature