Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
Don Lewis [EMAIL PROTECTED] writes: Thanks for doing the testing. I just committed this patch. Seems fine here too -- many thanks. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On Sun, Jun 15, 2003 at 08:43:15PM -0400, Chris Shenton wrote: I've been running qmail for years and like it, installed pretty much per www.LifeWithQmail.org. My main system was running FreeBSD 5.0-RELEASE and -CURRENT and qmail was fine. When I just upgraded to 5.1-CURRENT a couple days back, the qmail-send process started using all CPU. This looks like a bug in the named pipe code. Reverting sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go away. I haven't tracked down exactly what change between RELENG_5_0 and RELENG_5_1 caused the problem. Tim ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
Hi, On Mon, 15 Jun 2003, Chris Shenton wrote: [...] qmail is run under daemontools and all work fine (the configuration is 2 years old!), but when I delivery the first mail (localy or remote) the qmail-send process fire up to 100% of CPU infinitely All other mail are right delivery, and the CPU use is the only problem, I see in qmail-send.c that select() function, after the first message, allways return 1 same here too. I don't know what it could be - perhaps a problem with named pipes (lock/trigger)? You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt Would be nice if anyone have an idea :) A truss shows me it's running in a tight loop over this code: close(9) = 0 (0x0) select(0x9,0xbfbffcbc,0xbfbffc3c,0x0,0xbfbffc24) = 1 (0x1) Anyone else seen this or know what in FreeBSD-5.1 might have changed to cause this? Any thoughts on how I might go about diagnosing this any better? greetings, thorsten ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, Thorsten Schroeder wrote: Hi, On Mon, 15 Jun 2003, Chris Shenton wrote: [...] qmail is run under daemontools and all work fine (the configuration is 2 years old!), but when I delivery the first mail (localy or remote) the qmail-send process fire up to 100% of CPU infinitely All other mail are right delivery, and the CPU use is the only problem, I see in qmail-send.c that select() function, after the first message, allways return 1 same here too. I don't know what it could be - perhaps a problem with named pipes (lock/trigger)? You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt Would be nice if anyone have an idea :) A truss shows me it's running in a tight loop over this code: close(9) = 0 (0x0) select(0x9,0xbfbffcbc,0xbfbffc3c,0x0,0xbfbffc24) = 1 (0x1) Anyone else seen this or know what in FreeBSD-5.1 might have changed to cause this? Any thoughts on how I might go about diagnosing this any better? Which version of fifo_vnops.c? If the problem is present in 5.1-RELEASE, then the problem is likely to be the change made in 1.79 and 1.85. If the problem didn't show up until after the 5.1-RELEASE, then the problem could be the changes in 1.87 or 1.88. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On Mon, Jun 16, 2003 at 04:09:51PM +1000, Tim Robbins wrote: On Sun, Jun 15, 2003 at 08:43:15PM -0400, Chris Shenton wrote: I've been running qmail for years and like it, installed pretty much per www.LifeWithQmail.org. My main system was running FreeBSD 5.0-RELEASE and -CURRENT and qmail was fine. When I just upgraded to 5.1-CURRENT a couple days back, the qmail-send process started using all CPU. This looks like a bug in the named pipe code. Reverting sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go away. I haven't tracked down exactly what change between RELENG_5_0 and RELENG_5_1 caused the problem. Looks like revision 1.86 works, but it stops working with 1.87. Moving the soclose() calls to fifo_inactive() may have caused it. Tim ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
Hi, On Sun, 15 Jun 2003, Don Lewis wrote: I don't know what it could be - perhaps a problem with named pipes (lock/trigger)? You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt Which version of fifo_vnops.c? If the problem is present in 5.1-RELEASE, then the problem is likely to be the change made in 1.79 and 1.85. If the problem didn't show up until after the 5.1-RELEASE, then the problem could be the changes in 1.87 or 1.88. FreeBSD 5.1-CURRENT #1: Thu Jun 5 19:29:29 CEST 2003 fifo_vnops.c: $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $ bye, thorsten ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, Tim Robbins wrote: On Mon, Jun 16, 2003 at 04:09:51PM +1000, Tim Robbins wrote: On Sun, Jun 15, 2003 at 08:43:15PM -0400, Chris Shenton wrote: I've been running qmail for years and like it, installed pretty much per www.LifeWithQmail.org. My main system was running FreeBSD 5.0-RELEASE and -CURRENT and qmail was fine. When I just upgraded to 5.1-CURRENT a couple days back, the qmail-send process started using all CPU. This looks like a bug in the named pipe code. Reverting sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go away. I haven't tracked down exactly what change between RELENG_5_0 and RELENG_5_1 caused the problem. Looks like revision 1.86 works, but it stops working with 1.87. Moving the soclose() calls to fifo_inactive() may have caused it. This is an interesting observation, but I'm not sure why it would make a difference. I haven't looked at the qmail source, but it looks like it is doing a non-blocking open on the fifo, calling select() on the fd, and hoping that select() waits for a writer to open the fifo before returning with an indication that the descriptor is readable. It looks like the select code is calling the soreadable() macro to determine if the fifo descriptor is readable, and the soreadable() macro returns a true value if the SS_CANTRCVMORE socket flag is set, which would indicate an EOF condition. I might believe that I accidentally changed the setting of this flag, but I just compared fifo_vnops.c rev 1.78 with 1.87 and I believe this flag should be set the same way in both versions. In both versions, fifo_close() always calls socantrcvmore(), which sets SS_CANTRCVMORE when the writer count drops to zero. Prior to 1.87, fifo_close() also destroyed the sockets when the reference count dropped to zero, which caused fifo_open() to recreate the sockets when the fifo was opened again, and when it did, fifo_open() set the SS_CANTRCVMORE flag again. The posted qmail syscall trace looks like what I would expect to see in the present implementation. I can't explain why it would behave any differently prior to 1.87 ... ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, I wrote: On 16 Jun, Tim Robbins wrote: This looks like a bug in the named pipe code. Reverting sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go away. I haven't tracked down exactly what change between RELENG_5_0 and RELENG_5_1 caused the problem. Looks like revision 1.86 works, but it stops working with 1.87. Moving the soclose() calls to fifo_inactive() may have caused it. This is an interesting observation, but I'm not sure why it would make a difference. I haven't looked at the qmail source, but it looks like it is doing a non-blocking open on the fifo, calling select() on the fd, and hoping that select() waits for a writer to open the fifo before returning with an indication that the descriptor is readable. It looks like the select code is calling the soreadable() macro to determine if the fifo descriptor is readable, and the soreadable() macro returns a true value if the SS_CANTRCVMORE socket flag is set, which would indicate an EOF condition. I might believe that I accidentally changed the setting of this flag, but I just compared fifo_vnops.c rev 1.78 with 1.87 and I believe this flag should be set the same way in both versions. In both versions, fifo_close() always calls socantrcvmore(), which sets SS_CANTRCVMORE when the writer count drops to zero. Prior to 1.87, fifo_close() also destroyed the sockets when the reference count dropped to zero, which caused fifo_open() to recreate the sockets when the fifo was opened again, and when it did, fifo_open() set the SS_CANTRCVMORE flag again. The posted qmail syscall trace looks like what I would expect to see in the present implementation. I can't explain why it would behave any differently prior to 1.87 ... The plot thickens ... I ran this bit of code on both 5.1 current with version 1.88 of fifo_vnops.c, and 4.8-stable: #include sys/types.h #include sys/time.h #include unistd.h #include fcntl.h main() { int fd; fd_set readfds; fd = open(myfifo, O_RDONLY | O_NONBLOCK); printf(before the loop\n); while (1) { FD_ZERO(readfds); FD_SET(fd, readfds); printf(%d %d\n, fd, select(20, readfds, NULL, NULL, NULL)); } exit(0); } On 4.8-stable, select() immediately returns a 1, whether or not the fifo has ever been opened for writing. On 5.1-current, select() waits forever, even if the fifo has been opened for writing by another process. Select() only returns when something has actually been written to the fifo, and since this process doesn't read anything from the fifo, it spins on select() forever. If some data is getting written to the fifo, it doesn't look like qmail consumes it, and since fifo_close in 1.87 doesn't destroy the sockets, it looks like the data is hanging around in the fifo while neither end is open, and qmail stumbles across this data when it calls select() after re-opening the fifo. Now there are two questions that I can't answer: Why is my analysis of select() and the SS_CANTRCVMORE flag incorrect in 5.1-current with version 1.87 or 1.88 of fifo_vnops.c. Why doesn't qmail get stuck in a similar loop in 4.8-stable, since select always returns true for reading on a fifo with no writers? ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On Mon, 16 Jun 2003, Don Lewis wrote: On 16 Jun, I wrote: On 16 Jun, Tim Robbins wrote: This looks like a bug in the named pipe code. Reverting sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go away. I haven't tracked down exactly what change between RELENG_5_0 and RELENG_5_1 caused the problem. Looks like revision 1.86 works, but it stops working with 1.87. Moving the soclose() calls to fifo_inactive() may have caused it. This is an interesting observation, but I'm not sure why it would make a difference. I haven't looked at the qmail source, but it looks like it is doing a non-blocking open on the fifo, calling select() on the fd, and hoping that select() waits for a writer to open the fifo before returning with an indication that the descriptor is readable. In my review of 1.87, I forgot to ask you how atomic the close is with part of it moved out to fifo_inactive(). I think it's important that all traces of the old open have gone away (as far as applications can tell) when the last close returns. It looks like the select code is calling the soreadable() macro to determine if the fifo descriptor is readable, and the soreadable() macro returns a true value if the SS_CANTRCVMORE socket flag is set, which would indicate an EOF condition. fifo_close() sets this flag and the corresponding send flag on last close, so there is no direct problem here. ... The posted qmail syscall trace looks like what I would expect to see in the present implementation. I can't explain why it would behave any differently prior to 1.87 ... The plot thickens ... I ran this bit of code on both 5.1 current with version 1.88 of fifo_vnops.c, and 4.8-stable: #include sys/types.h #include sys/time.h #include unistd.h #include fcntl.h main() { int fd; fd_set readfds; fd = open(myfifo, O_RDONLY | O_NONBLOCK); printf(before the loop\n); while (1) { FD_ZERO(readfds); FD_SET(fd, readfds); printf(%d %d\n, fd, select(20, readfds, NULL, NULL, NULL)); } exit(0); } On 4.8-stable, select() immediately returns a 1, whether or not the fifo has ever been opened for writing. On 5.1-current, select() waits forever, even if the fifo has been opened for writing by another process. Select() only returns when something has actually been written to the fifo, and since this process doesn't read anything from the fifo, it spins on select() forever. If some data is getting written to the fifo, it doesn't look like qmail consumes it, and since fifo_close in 1.87 doesn't destroy the sockets, it looks like the data is hanging around in the fifo while neither end is open, and qmail stumbles across this data when it calls select() after re-opening the fifo. Now there are two questions that I can't answer: Why is my analysis of select() and the SS_CANTRCVMORE flag incorrect in 5.1-current with version 1.87 or 1.88 of fifo_vnops.c. I think it is correct, assuming that something writes to the fifo. Writing might be part of synchronization but actually reading the data should not be necessary since the last close must discard the data (POSIX spec). Why doesn't qmail get stuck in a similar loop in 4.8-stable, since select always returns true for reading on a fifo with no writers? Don't know. Maybe it uses autoconfig to handle the 4.8 behaviour. The 4.8 behaviour is normal compared with the buggy behaviour of not discarding data on last close, so applications should handle it better :-). Maybe qmain spins under 4.8 too, but only until synchronization is achieved. Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, Bruce Evans wrote: On Mon, 16 Jun 2003, Don Lewis wrote: On 16 Jun, I wrote: On 16 Jun, Tim Robbins wrote: This looks like a bug in the named pipe code. Reverting sys/fs/fifofs/fifo_vnops.c to the RELENG_5_0 version makes the problem go away. I haven't tracked down exactly what change between RELENG_5_0 and RELENG_5_1 caused the problem. Looks like revision 1.86 works, but it stops working with 1.87. Moving the soclose() calls to fifo_inactive() may have caused it. This is an interesting observation, but I'm not sure why it would make a difference. I haven't looked at the qmail source, but it looks like it is doing a non-blocking open on the fifo, calling select() on the fd, and hoping that select() waits for a writer to open the fifo before returning with an indication that the descriptor is readable. In my review of 1.87, I forgot to ask you how atomic the close is with part of it moved out to fifo_inactive(). I think it's important that all traces of the old open have gone away (as far as applications can tell) when the last close returns. I hadn't taken queued data into consideration. Now that I've looked at this more closely, there are other problems in both the old and new code. If a process calls fcntl(fd, F_SETOWN, ...) on one end of the fifo, that should be undone when that end of the fifo is closed. In the old implementation, that only happens when both ends of the fifo are closed and the sockets are deleted. On 5.1-current, select() waits forever, even if the fifo has been opened for writing by another process. Select() only returns when something has actually been written to the fifo, and since this process doesn't read anything from the fifo, it spins on select() forever. If some data is getting written to the fifo, it doesn't look like qmail consumes it, and since fifo_close in 1.87 doesn't destroy the sockets, it looks like the data is hanging around in the fifo while neither end is open, and qmail stumbles across this data when it calls select() after re-opening the fifo. Now there are two questions that I can't answer: Why is my analysis of select() and the SS_CANTRCVMORE flag incorrect in 5.1-current with version 1.87 or 1.88 of fifo_vnops.c. I think it is correct, assuming that something writes to the fifo. Writing might be part of synchronization but actually reading the data should not be necessary since the last close must discard the data (POSIX spec). It sure looks to me like SS_CANTRCVMORE is always set when the write end of the fifo is closed, no matter whether the the sockets were freshly allocated by a fifo_open() call on the read end of the fifo, or because the the last writer closed the write end of the fifo. It sure looks like select() should immediately return if this flag is set, but it is not returning ... Actually, something seems broken. I modified my little test program to actually read the data, which works just fine, but select() still blocks when the writer closes the fifo, so there doesn't seem to be a way to detect the EOF. Why doesn't qmail get stuck in a similar loop in 4.8-stable, since select always returns true for reading on a fifo with no writers? Don't know. Maybe it uses autoconfig to handle the 4.8 behaviour. The 4.8 behaviour is normal compared with the buggy behaviour of not discarding data on last close, so applications should handle it better :-). Maybe qmain spins under 4.8 too, but only until synchronization is achieved. Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, Thorsten Schroeder wrote: Hi, On Sun, 15 Jun 2003, Don Lewis wrote: I don't know what it could be - perhaps a problem with named pipes (lock/trigger)? You can find my ktrace output here: http://cs.so36.net/~ths/kdump.txt Which version of fifo_vnops.c? If the problem is present in 5.1-RELEASE, then the problem is likely to be the change made in 1.79 and 1.85. If the problem didn't show up until after the 5.1-RELEASE, then the problem could be the changes in 1.87 or 1.88. FreeBSD 5.1-CURRENT #1: Thu Jun 5 19:29:29 CEST 2003 fifo_vnops.c: $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $ Try upgrading to 1.88 and applying this patch: Index: sys/fs/fifofs/fifo_vnops.c === RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v retrieving revision 1.88 diff -u -r1.88 fifo_vnops.c --- sys/fs/fifofs/fifo_vnops.c 13 Jun 2003 06:58:11 - 1.88 +++ sys/fs/fifofs/fifo_vnops.c 16 Jun 2003 08:44:20 - @@ -70,7 +70,6 @@ static int fifo_lookup(struct vop_lookup_args *); static int fifo_open(struct vop_open_args *); static int fifo_close(struct vop_close_args *); -static int fifo_inactive(struct vop_inactive_args *); static int fifo_read(struct vop_read_args *); static int fifo_write(struct vop_write_args *); static int fifo_ioctl(struct vop_ioctl_args *); @@ -98,7 +97,6 @@ { vop_create_desc, (vop_t *) vop_panic }, { vop_getattr_desc,(vop_t *) vop_ebadf }, { vop_getwritemount_desc, (vop_t *) vop_stdgetwritemount }, - { vop_inactive_desc, (vop_t *) fifo_inactive }, { vop_ioctl_desc, (vop_t *) fifo_ioctl }, { vop_kqfilter_desc, (vop_t *) fifo_kqfilter }, { vop_lease_desc, (vop_t *) vop_null }, @@ -556,32 +554,18 @@ if (fip-fi_writers == 0) socantrcvmore(fip-fi_readsock); } - VOP_UNLOCK(vp, 0, td); - return (0); -} - -static int -fifo_inactive(ap) - struct vop_inactive_args /* { - struct vnode *a_vp; - struct thread *a_td; - } */ *ap; -{ - struct vnode *vp = ap-a_vp; - struct fifoinfo *fip = vp-v_fifoinfo; - VI_LOCK(vp); - if (fip != NULL vp-v_usecount == 0) { + if (vp-v_usecount == 1) { vp-v_fifoinfo = NULL; VI_UNLOCK(vp); (void)soclose(fip-fi_readsock); (void)soclose(fip-fi_writesock); FREE(fip, M_VNODE); - } - VOP_UNLOCK(vp, 0, ap-a_td); + } else + VI_UNLOCK(vp); + VOP_UNLOCK(vp, 0, td); return (0); } - /* * Print out internal contents of a fifo vnode. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
Don Lewis wrote: Actually, something seems broken. I modified my little test program to actually read the data, which works just fine, but select() still blocks when the writer closes the fifo, so there doesn't seem to be a way to detect the EOF. I think this should be covered under the exceptional event and read select flags (a subsequent read will return 0). Also, you should remember that qmail opens the thing with non-blocking I/O, and then expects the select to block. Very odd program, qmail. -- Terry ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On Mon, 16 Jun 2003, Don Lewis wrote: On 16 Jun, Bruce Evans wrote: In my review of 1.87, I forgot to ask you how atomic the close is with part of it moved out to fifo_inactive(). I think it's important that all traces of the old open have gone away (as far as applications can tell) when the last close returns. I hadn't taken queued data into consideration. Now that I've looked at this more closely, there are other problems in both the old and new code. If a process calls fcntl(fd, F_SETOWN, ...) on one end of the fifo, that should be undone when that end of the fifo is closed. In the old implementation, that only happens when both ends of the fifo are closed and the sockets are deleted. F_SETOWN (and associated signal delivery) is even more broken than that :-]. This fcntl() should applied to the file (though not just the file descriptor), so its effect should be limited to fd's open in the file instance and go away when all thse are closed. However, F_SETOWN (and associated signal delivery) actually applies to the socket for fifos. It doesn't work quite right for ttys either. F_SETOWN apparently isn't used in ways complicated enough to require it to work right. Now there are two questions that I can't answer: Why is my analysis of select() and the SS_CANTRCVMORE flag incorrect in 5.1-current with version 1.87 or 1.88 of fifo_vnops.c. I think it is correct, assuming that something writes to the fifo. Writing might be part of synchronization but actually reading the data should not be necessary since the last close must discard the data (POSIX spec). It sure looks to me like SS_CANTRCVMORE is always set when the write end of the fifo is closed, no matter whether the the sockets were freshly allocated by a fifo_open() call on the read end of the fifo, or because the the last writer closed the write end of the fifo. It sure looks like select() should immediately return if this flag is set, but it is not returning ... Alfred changed the semantics for 5.x. I thought that you knew this. I finally gave up resisting this change after a lot of email :-). In 5.x, SS_CANTRCVMORE often has no effect for fifos (it still works normally for sockets). fifo_poll() normally calls soo_poll() with POLLIN converted to POLLINIGNEOF. This causes soo_poll() (sopoll()) to skip the usual SS_CANTRCVMORE check (which is inside soreadable()) and check the watermark instead, so that select() on a fifo normally waits for data even when the fifo is open in nonblocking mode and SS_CANTRCVMORE is set. Blocking in select() even in nonblocking mode is usually what is wanted, but is not what is wanted for detecting EOF. 4.8 handles EOF detection (== all writers going away in the context of fifos) better at a cost of providing no good way to wait for the first writer. We changed it since all other systems seem to do it like 5.x and few applications understand this. Actually, something seems broken. I modified my little test program to actually read the data, which works just fine, but select() still blocks when the writer closes the fifo, so there doesn't seem to be a way to detect the EOF. Hmm, we may have changed too much. EOF can be detected using poll() instead of select() and seting POLLIN and POLLINIGNEOF in the poll flags (this stops fifo_poll() clearing POLLIN -- see the comment), but the POLLINIGNEOF is not documented at the application level and is probably never used there. I suspect that other systems have more magic to handle EOF. I tried to avoid such magic since I think the state of the fifo should be the same when there are no writers (and no data) no matter how the state of having no writers was reached (otherwise I think the state depends too much on races between open() for reading and close() by the last writer). POSIX is clear enough on this for read/write but fuzzy for select/poll. Bruce ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
Hi, On Mon, 16 Jun 2003, Don Lewis wrote: FreeBSD 5.1-CURRENT #1: Thu Jun 5 19:29:29 CEST 2003 fifo_vnops.c: $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $ Try upgrading to 1.88 and applying this patch: Index: sys/fs/fifofs/fifo_vnops.c === RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v retrieving revision 1.88 diff -u -r1.88 fifo_vnops.c --- sys/fs/fifofs/fifo_vnops.c13 Jun 2003 06:58:11 - 1.88 +++ sys/fs/fifofs/fifo_vnops.c16 Jun 2003 08:44:20 - [...] Yes! This seems to work fine :) qmail-send doesn't increase cpu usage after the first mail anymore. Thanks a lot, Thorsten ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
Thorsten Schroeder wrote: Hi, On Mon, 16 Jun 2003, Don Lewis wrote: FreeBSD 5.1-CURRENT #1: Thu Jun 5 19:29:29 CEST 2003 fifo_vnops.c: $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $ Try upgrading to 1.88 and applying this patch: Index: sys/fs/fifofs/fifo_vnops.c === RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v retrieving revision 1.88 diff -u -r1.88 fifo_vnops.c --- sys/fs/fifofs/fifo_vnops.c 13 Jun 2003 06:58:11 - 1.88 +++ sys/fs/fifofs/fifo_vnops.c 16 Jun 2003 08:44:20 - [...] Yes! This seems to work fine :) I run qmail on my 4.8 servers. For my sanity, is this a problem in 5.1-RELEASE, or in code after 5.1-RELEASE? We haven't upgraded to 5.1 yet (and don't intend to for a while), but I thought I'd ask since this bug would cripple our mail server. -- Jesse Guardiani, Systems Administrator WingNET Internet Services, P.O. Box 2605 // Cleveland, TN 37320-2605 423-559-LINK (v) 423-559-5145 (f) http://www.wingnet.net ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, Thorsten Schroeder wrote: Hi, On Mon, 16 Jun 2003, Don Lewis wrote: FreeBSD 5.1-CURRENT #1: Thu Jun 5 19:29:29 CEST 2003 fifo_vnops.c: $FreeBSD: src/sys/fs/fifofs/fifo_vnops.c,v 1.87 2003/06/01 06:24:32 truckman Exp $ Try upgrading to 1.88 and applying this patch: Index: sys/fs/fifofs/fifo_vnops.c === RCS file: /home/ncvs/src/sys/fs/fifofs/fifo_vnops.c,v retrieving revision 1.88 diff -u -r1.88 fifo_vnops.c --- sys/fs/fifofs/fifo_vnops.c 13 Jun 2003 06:58:11 - 1.88 +++ sys/fs/fifofs/fifo_vnops.c 16 Jun 2003 08:44:20 - [...] Yes! This seems to work fine :) qmail-send doesn't increase cpu usage after the first mail anymore. Thanks a lot, Thanks for doing the testing. I just committed this patch. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, Jesse Guardiani wrote: I run qmail on my 4.8 servers. For my sanity, is this a problem in 5.1-RELEASE, or in code after 5.1-RELEASE? We haven't upgraded to 5.1 yet (and don't intend to for a while), but I thought I'd ask since this bug would cripple our mail server. It was broken in 5.1-CURRENT shortly after 5.1-RELEASE, until I committed a patch a few minutes ago. 5.1-RELEASE is fine. The problematic versions of fifo_vnops.c are 1.87 and 1.88. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
On 16 Jun, Bruce Evans wrote: On Mon, 16 Jun 2003, Don Lewis wrote: On 16 Jun, Bruce Evans wrote: In my review of 1.87, I forgot to ask you how atomic the close is with part of it moved out to fifo_inactive(). I think it's important that all traces of the old open have gone away (as far as applications can tell) when the last close returns. I hadn't taken queued data into consideration. Now that I've looked at this more closely, there are other problems in both the old and new code. If a process calls fcntl(fd, F_SETOWN, ...) on one end of the fifo, that should be undone when that end of the fifo is closed. In the old implementation, that only happens when both ends of the fifo are closed and the sockets are deleted. F_SETOWN (and associated signal delivery) is even more broken than that :-]. This fcntl() should applied to the file (though not just the file descriptor), so its effect should be limited to fd's open in the file instance and go away when all thse are closed. However, F_SETOWN (and associated signal delivery) actually applies to the socket for fifos. It doesn't work quite right for ttys either. F_SETOWN apparently isn't used in ways complicated enough to require it to work right. There is a fundamental architectural problem -- devices and files don't have a list of the descriptors that have them open. That would require putting descriptors on another list (and dealing with the necessary locking), which would also bloat the size of the descriptor structure. Storing the F_SETOWN info there would bloat all descriptors even more rather than the relative handful of device structures that support this feature. Now there are two questions that I can't answer: Why is my analysis of select() and the SS_CANTRCVMORE flag incorrect in 5.1-current with version 1.87 or 1.88 of fifo_vnops.c. I think it is correct, assuming that something writes to the fifo. Writing might be part of synchronization but actually reading the data should not be necessary since the last close must discard the data (POSIX spec). It sure looks to me like SS_CANTRCVMORE is always set when the write end of the fifo is closed, no matter whether the the sockets were freshly allocated by a fifo_open() call on the read end of the fifo, or because the the last writer closed the write end of the fifo. It sure looks like select() should immediately return if this flag is set, but it is not returning ... Alfred changed the semantics for 5.x. I thought that you knew this. I finally gave up resisting this change after a lot of email :-). In 5.x, SS_CANTRCVMORE often has no effect for fifos (it still works normally for sockets). fifo_poll() normally calls soo_poll() with POLLIN converted to POLLINIGNEOF. This causes soo_poll() (sopoll()) to skip the usual SS_CANTRCVMORE check (which is inside soreadable()) and check the watermark instead, so that select() on a fifo normally waits for data even when the fifo is open in nonblocking mode and SS_CANTRCVMORE is set. Nope, I didn't know this, and I missed the POLLIN-POLLINIGNEOF conversion when I was tracing the code. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: qmail uses 100% cpu after FreeBSD-5.0 to 5.1 upgrade
I've been running qmail for years and like it, installed pretty much per www.LifeWithQmail.org. My main system was running FreeBSD 5.0-RELEASE and -CURRENT and qmail was fine. When I just upgraded to 5.1-CURRENT a couple days back, the qmail-send process started using all CPU. [snip] Anyone else seen this or know what in FreeBSD-5.1 might have changed to cause this? Any thoughts on how I might go about diagnosing this any better? I saw this too, but couldn't get it fixed either. My solution (hopefully temporary) was to switch to another MTA. Fred -- I used to think romantic love was a neurosis shared by two, a supreme foolishness. I no longer thought that. There's nothing foolish in loving anyone. Thinking you'll be loved in return is what's foolish. -- Rita Mae Brown pgp0.pgp Description: PGP signature