Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
Don Lewis <[EMAIL PROTECTED]> writes: > Try the very untested patch below ... Well, it seems to be working now, but not necessarily due to this patch. I lost two of the four drives on my ATA RAID card (RAID-5) so lost my entire system :-(. Rebuilt the box from the 5.0-RELEASE floppies/net then cvsupped to 5.1-CURRENT. Reinstalled all the stuff like qmail and apache. I'm no longer seeing the "unlocked" messages in the logs any longer. Thanks for all your help! ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
On 2003-06-19 08:13 -0700, Don Lewis <[EMAIL PROTECTED]> wrote: > > In PR kern/46652 I reported, that DEBUG_VFS_LOCKS does never > > check the **vpp parameters. A patch is included in the PR and > > it does generate the missing tests. > > > > I asked for feedback on the hackers mail list (IIRC), but did > > not get any replies. > > > > Any objections against me committing the patch now ? > > > > (A different fix is mentioned in the PR, the patch I suggested > > was the minimal change to the code which made it work, the > > alternative seems cleaner to me ...) Please read PR kern/46652 ! > > I think the alternative fix should be committed. That would do the > correct thing if another pointer to a pointer to a vnode argument is > ever added. I think this is better than adding magic to vpp. Well, the alternative fix is much more work than I thought ... I spent an hour on it, but the parameter names are assumed to only consist of [a-z]* in a number of places and fixing this would add complexity to the AWK script and make it harder to maintain. Instead, I'm going to commit the trivial fix and add a comment about double indirection as in **vpp requiring special code in the AWK script to vnode_if.src ... This will fix the debug code that is generated and who ever cares for a better solution may back-out my two line fix and implement the clean solution. > Any idea if this change turns up more problems? Sorry, no. I found this while looking for some other problem and just opened the PR and sent a note to the hackers mail list ... Regards, STefan ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
On 19 Jun, Stefan Eßer wrote: > On 2003-06-18 20:41 -0700, Don Lewis <[EMAIL PROTECTED]> wrote: >> On 18 Jun, Chris Shenton wrote: >> > Don Lewis <[EMAIL PROTECTED]> writes: >> > >> >> Try the very untested patch below ... [ snip ] >> > Tried it, rebuilt kernel, rebooted, no affect :-( >> > >> > You were correct about apache using it. Doing a simple >> > >> > fetch http://pectopah/ >> > >> > causes the error, dropping me into ddb if panic enabled. A "tr" shows >> > the same trace as I submitted yesterday :-( >> >> Wierd ... I just tested the patch with ftpd which also uses sendfile() >> and didn't get any complaints from DEBUG_VFS_LOCKS. > > Not sure whether the following applies, but I think the patch > should be commited anyway: I don't think it applies, but ... > In PR kern/46652 I reported, that DEBUG_VFS_LOCKS does never > check the **vpp parameters. A patch is included in the PR and > it does generate the missing tests. > > I asked for feedback on the hackers mail list (IIRC), but did > not get any replies. > > Any objections against me committing the patch now ? > > (A different fix is mentioned in the PR, the patch I suggested > was the minimal change to the code which made it work, the > alternative seems cleaner to me ...) Please read PR kern/46652 ! I think the alternative fix should be committed. That would do the correct thing if another pointer to a pointer to a vnode argument is ever added. I think this is better than adding magic to vpp. Any idea if this change turns up more problems? ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
On 2003-06-18 20:41 -0700, Don Lewis <[EMAIL PROTECTED]> wrote: > On 18 Jun, Chris Shenton wrote: > > Don Lewis <[EMAIL PROTECTED]> writes: > > > >> Try the very untested patch below ... > > > >> RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v > >> retrieving revision 1.150 > >> Try the very untested patch below ... > >> diff -u -r1.150 uipc_syscalls.c > >> --- uipc_syscalls.c12 Jun 2003 05:52:09 - 1.150 > >> +++ uipc_syscalls.c18 Jun 2003 03:14:42 - > >> @@ -1775,10 +1775,13 @@ > >> */ > >>if ((error = fgetvp_read(td, uap->fd, &vp)) != 0) > >>goto done; > >> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); > >>if (vp->v_type != VREG || VOP_GETVOBJECT(vp, &obj) != 0) { > >>error = EINVAL; > >> + VOP_UNLOCK(vp, 0, td); > >>goto done; > >>} > >> + VOP_UNLOCK(vp, 0, td); > > > > Tried it, rebuilt kernel, rebooted, no affect :-( > > > > You were correct about apache using it. Doing a simple > > > > fetch http://pectopah/ > > > > causes the error, dropping me into ddb if panic enabled. A "tr" shows > > the same trace as I submitted yesterday :-( > > Wierd ... I just tested the patch with ftpd which also uses sendfile() > and didn't get any complaints from DEBUG_VFS_LOCKS. Not sure whether the following applies, but I think the patch should be commited anyway: In PR kern/46652 I reported, that DEBUG_VFS_LOCKS does never check the **vpp parameters. A patch is included in the PR and it does generate the missing tests. I asked for feedback on the hackers mail list (IIRC), but did not get any replies. Any objections against me committing the patch now ? (A different fix is mentioned in the PR, the patch I suggested was the minimal change to the code which made it work, the alternative seems cleaner to me ...) Please read PR kern/46652 ! If nobody complains, I'll do the commit tomorrow. Regards, STefan Index: /usr/src/sys/tools/vnode_if.awk === RCS file: /usr/cvs/src/sys/tools/vnode_if.awk,v retrieving revision 1.37 diff -u -u -4 -r1.37 vnode_if.awk --- /usr/src/sys/tools/vnode_if.awk 26 Sep 2002 04:48:43 - 1.37 +++ /usr/src/sys/tools/vnode_if.awk 31 Dec 2002 13:37:20 - @@ -64,8 +64,10 @@ function printh(s) {print s > hfile;} function add_debug_code(name, arg, pos) { + if (arg == "vpp") + arg = "*vpp"; if (lockdata[name, arg, pos]) { printh("\tASSERT_VI_UNLOCKED("arg", \""uname"\");"); # Add assertions for locking if (lockdata[name, arg, pos] == "L") ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
On 18 Jun, Chris Shenton wrote: > Don Lewis <[EMAIL PROTECTED]> writes: > >> Try the very untested patch below ... > >> RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v >> retrieving revision 1.150 >> Try the very untested patch below ... >> diff -u -r1.150 uipc_syscalls.c >> --- uipc_syscalls.c 12 Jun 2003 05:52:09 - 1.150 >> +++ uipc_syscalls.c 18 Jun 2003 03:14:42 - >> @@ -1775,10 +1775,13 @@ >> */ >> if ((error = fgetvp_read(td, uap->fd, &vp)) != 0) >> goto done; >> +vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); >> if (vp->v_type != VREG || VOP_GETVOBJECT(vp, &obj) != 0) { >> error = EINVAL; >> +VOP_UNLOCK(vp, 0, td); >> goto done; >> } >> +VOP_UNLOCK(vp, 0, td); > > Tried it, rebuilt kernel, rebooted, no affect :-( > > You were correct about apache using it. Doing a simple > > fetch http://pectopah/ > > causes the error, dropping me into ddb if panic enabled. A "tr" shows > the same trace as I submitted yesterday :-( Wierd ... I just tested the patch with ftpd which also uses sendfile() and didn't get any complaints from DEBUG_VFS_LOCKS. I'm going to go ahead and commit this patch. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
Don Lewis <[EMAIL PROTECTED]> writes: > Try the very untested patch below ... > RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v > retrieving revision 1.150 > Try the very untested patch below ... > diff -u -r1.150 uipc_syscalls.c > --- uipc_syscalls.c 12 Jun 2003 05:52:09 - 1.150 > +++ uipc_syscalls.c 18 Jun 2003 03:14:42 - > @@ -1775,10 +1775,13 @@ >*/ > if ((error = fgetvp_read(td, uap->fd, &vp)) != 0) > goto done; > + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); > if (vp->v_type != VREG || VOP_GETVOBJECT(vp, &obj) != 0) { > error = EINVAL; > + VOP_UNLOCK(vp, 0, td); > goto done; > } > + VOP_UNLOCK(vp, 0, td); Tried it, rebuilt kernel, rebooted, no affect :-( You were correct about apache using it. Doing a simple fetch http://pectopah/ causes the error, dropping me into ddb if panic enabled. A "tr" shows the same trace as I submitted yesterday :-( Time to find that null modem cable. Thanks. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
On 18 Jun, Chris Shenton wrote: > Don Lewis <[EMAIL PROTECTED]> writes: > >> Try the very untested patch below ... > >> RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v > > When I do the patch, how much of the OS do I need to rebuild, just do > a "make install" in the ".../src/sys/kern" dir? Rebuild the OS from > the top dir? Rebuild the kernel? I want to make sure I'm giving this > a proper test. If the only changes since the last buildworld have been in src/sys, then the slow but safe way is: make buildkernel make installkernel The quicker way is: cd /usr/obj/usr/src/sys/KERNELCONFNAME make make install You can do "make reinstall" instead of "make install" if you don't want /boot/kernel.old to be nuked and boot/kernel to be renamed to /boot/kernel.old before the new kernel is installed. You'll have to do it the slow way if you've changed your kernel configuration. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
Don Lewis <[EMAIL PROTECTED]> writes: > Try the very untested patch below ... > RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v When I do the patch, how much of the OS do I need to rebuild, just do a "make install" in the ".../src/sys/kern" dir? Rebuild the OS from the top dir? Rebuild the kernel? I want to make sure I'm giving this a proper test. Thanks. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
On 17 Jun, Chris Shenton wrote: > Don Lewis <[EMAIL PROTECTED]> writes: > >> If you have another machine and a null modem cable you can redirect the >> system console of the machine to be debugged to a serial port and run >> some comm software on the other machine so that you can capture all the >> output from ddb. > > OK, I'll give that a shot, probably tomorrow. > > >> At the ddb prompt, you can do a "tr" command to get a stack trace, >> which is likely to be very helpful in pointing out the offending >> code. > > Just saw it again, did a tr. From chicken-scratch notes, the last > bits are: > > VOP_GETVOBJECT(...) > do_sendfile(...) > sendfile(...) > syscall(...) > Xint0x80_syscall... > --- syscall( 393, FreeBSD ELF32, sendfile) ... > > The next time it dropped into ddb, same "sendfile" thing. Try the very untested patch below ... > The main services I'm running are qmail, apache, and NFS. Also > tftp, rarpd, lpd, sshd, bootparamd ... oh, well, I guess I'm running > a bunch of stuff here. :-( Not sure which one, if any, this would be. > > Unless sendfile() is something in the OS? It's a system call, and I believe apache uses it. > > I'll have to dig up a nullmodem and grab console output. I realise > I'm not giving enough detailed info to be very helpful here. It's good enough to squash one bug. I don't know if it will solve your problem, though. >> If you are running the NFS *client* code on this machine, there is one >> lock assertion that is easy to trigger. > > In my kernel config I have this, because a diskless box uses the same > kernel, but my /etc/fstab doesn't mount anyone else's NFS exports. You won't trigger the the lock violation in the NFS client code unless you actually mount a file system from another machine using NFS and actually do some I/O on it. Here's the patch: Index: uipc_syscalls.c === RCS file: /home/ncvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.150 diff -u -r1.150 uipc_syscalls.c --- uipc_syscalls.c 12 Jun 2003 05:52:09 - 1.150 +++ uipc_syscalls.c 18 Jun 2003 03:14:42 - @@ -1775,10 +1775,13 @@ */ if ((error = fgetvp_read(td, uap->fd, &vp)) != 0) goto done; + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td); if (vp->v_type != VREG || VOP_GETVOBJECT(vp, &obj) != 0) { error = EINVAL; + VOP_UNLOCK(vp, 0, td); goto done; } + VOP_UNLOCK(vp, 0, td); if ((error = fgetsock(td, uap->s, &so, NULL)) != 0) goto done; if (so->so_type != SOCK_STREAM) { ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
Don Lewis <[EMAIL PROTECTED]> writes: > If you have another machine and a null modem cable you can redirect the > system console of the machine to be debugged to a serial port and run > some comm software on the other machine so that you can capture all the > output from ddb. OK, I'll give that a shot, probably tomorrow. > At the ddb prompt, you can do a "tr" command to get a stack trace, > which is likely to be very helpful in pointing out the offending > code. Just saw it again, did a tr. From chicken-scratch notes, the last bits are: VOP_GETVOBJECT(...) do_sendfile(...) sendfile(...) syscall(...) Xint0x80_syscall... --- syscall( 393, FreeBSD ELF32, sendfile) ... The next time it dropped into ddb, same "sendfile" thing. The main services I'm running are qmail, apache, and NFS. Also tftp, rarpd, lpd, sshd, bootparamd ... oh, well, I guess I'm running a bunch of stuff here. :-( Not sure which one, if any, this would be. Unless sendfile() is something in the OS? I'll have to dig up a nullmodem and grab console output. I realise I'm not giving enough detailed info to be very helpful here. > If you are running the NFS *client* code on this machine, there is one > lock assertion that is easy to trigger. In my kernel config I have this, because a diskless box uses the same kernel, but my /etc/fstab doesn't mount anyone else's NFS exports. options NFSCLIENT #Network Filesystem Client [EMAIL PROTECTED]<101> ps -axww|grep nfs 42 ?? IL 0:00.00 (nfsiod 0) 43 ?? IL 0:00.00 (nfsiod 1) 44 ?? IL 0:00.00 (nfsiod 2) 45 ?? IL 0:00.00 (nfsiod 3) 428 ?? Is 0:00.03 nfsd: master (nfsd) 429 ?? I 0:00.09 nfsd: server (nfsd) 430 ?? I 0:00.00 nfsd: server (nfsd) 431 ?? I 0:00.00 nfsd: server (nfsd) 432 ?? I 0:00.00 nfsd: server (nfsd) 35366 p0 R+ 0:00.00 grep nfs > At the ddb prompt you should be able to use the write command tweak a > couple of variables to modify this behavior. If you set the > vfs_badlock_panic variable to zero, the kernel will no longer drop into > DDB when one of these lock violations occurs. If you set the > vfs_badlock_print variable to zero, the kernel will stop printing the > warnings. OK, I've done a examine vfs_badlock_panic which shows it zero, then write vfs_badlock_panic 0 at least for now. Thanks again. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
On 17 Jun, Chris Shenton wrote: > Don Lewis <[EMAIL PROTECTED]> writes: > >> I doubt it. I checked in a fix for this problem today so you should get >> the fix when you next cvsup. > > Yup, many thanks. > >> Can you break into ddb and do a ps to find out what state all the >> processes are in? > > I'm a newbie to ddb. Was able to get a ps from a hung system but > didn't know how to capture it to send to you. Any hints? If you have another machine and a null modem cable you can redirect the system console of the machine to be debugged to a serial port and run some comm software on the other machine so that you can capture all the output from ddb. Lacking that, there's the pencil and paper method that I used for far too long. > >> You might want to try adding the DEBUG_VFS_LOCKS options to your >> kernel config to see if that turns up anything. > > Oh, man, I'm getting killed here now. Rebuilt the kernel with that > option (not found in GENERIC or other examples in /usr/src/sys/i386/conf/). > > Now the system is dropping into ddb ever minute or so with complaints > like the following on the screen, and in /var/log/messages: > > Jun 17 21:06:08 PECTOPAH kernel: VOP_GETVOBJECT: 0xc584eb68 is not locked but should > be > Jun 17 21:08:04 PECTOPAH last message repeated 3 times > ... > Jun 17 21:18:55 PECTOPAH kernel: VOP_GETVOBJECT: 0xc59346d8 is not locked but should > be > Jun 17 21:18:59 PECTOPAH last message repeated 5 times > > Lots 'n' lots of 'em, with a few of the same hex value then another > set for a different hex value. Been there, but that was quite a while ago. I run this way all the time and hardly ever see problems these days. You must be exercising some file system code that I don't. At the ddb prompt, you can do a "tr" command to get a stack trace, which is likely to be very helpful in pointing out the offending code. If you're getting a lot of VFS lock violation reports, the underlying locking violations could be the reason that your machine deadlocks. Post some representative stack traces. These problems are generally easy to fix. >> There is also ddb command to list the locked vnodes "show >> lockedvnods". > > After I type "cont" at ddb a few times the system runs for a while > again, only to repeat. When it drops to ddb again that show command > doesn't list anything. > > I may have to remove that option from my kernel just to get to run a > bit, even tho eventually the system will hang. It's (of course) my > main box which the other systems NFS off, mail server, etc. :-( At the ddb prompt you should be able to use the write command tweak a couple of variables to modify this behavior. If you set the vfs_badlock_panic variable to zero, the kernel will no longer drop into DDB when one of these lock violations occurs. If you set the vfs_badlock_print variable to zero, the kernel will stop printing the warnings. If you are running the NFS *client* code on this machine, there is one lock assertion that is easy to trigger. The stack trace will show the nfsiod process calling nfssvc_iod(), which calls nfs_doio(), which complains about a lock not being held. If you run into that problem, just comment out the line: ASSERT_VOP_LOCKED(vp, "nfs_doio"); in nfs_doio(), in the file sys/nfsclient/nfs_bio.c. I haven't been able to figure out the correct fix for this problem, and so far I haven't encountered any problems with the problem being unfixed. > >> Are you using nullfs or unionfs which are a bit fragile? > > Nope. I'd be happy to mail you my kernel config if you want. I've > posted it to http://chris.shenton.org/PECTOPAH but if the system's > hung again, naturally it won't be available :-( > > > Thanks for your help. Any other things I might try? > > Dunno if this matters, but I'm using an DELL CERC ATA RAID card with > disks showing up as amrd* if that matters. Was flawless at > 5.0-{CURRENT,RELEASE}. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
Oh, FWIW, I did a cvsup and rebuilt the OS and kernel then did a mergemaster about 30 minutes ago in order to get your fix to my qmail issue. So I'm running about as CURRENT as possible. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: 5.1-CURRENT hangs on disk i/o? sysctl_old_user() non-sleepablelocks
Don Lewis <[EMAIL PROTECTED]> writes: > I doubt it. I checked in a fix for this problem today so you should get > the fix when you next cvsup. Yup, many thanks. > Can you break into ddb and do a ps to find out what state all the > processes are in? I'm a newbie to ddb. Was able to get a ps from a hung system but didn't know how to capture it to send to you. Any hints? > You might want to try adding the DEBUG_VFS_LOCKS options to your > kernel config to see if that turns up anything. Oh, man, I'm getting killed here now. Rebuilt the kernel with that option (not found in GENERIC or other examples in /usr/src/sys/i386/conf/). Now the system is dropping into ddb ever minute or so with complaints like the following on the screen, and in /var/log/messages: Jun 17 21:06:08 PECTOPAH kernel: VOP_GETVOBJECT: 0xc584eb68 is not locked but should be Jun 17 21:08:04 PECTOPAH last message repeated 3 times ... Jun 17 21:18:55 PECTOPAH kernel: VOP_GETVOBJECT: 0xc59346d8 is not locked but should be Jun 17 21:18:59 PECTOPAH last message repeated 5 times Lots 'n' lots of 'em, with a few of the same hex value then another set for a different hex value. > There is also ddb command to list the locked vnodes "show > lockedvnods". After I type "cont" at ddb a few times the system runs for a while again, only to repeat. When it drops to ddb again that show command doesn't list anything. I may have to remove that option from my kernel just to get to run a bit, even tho eventually the system will hang. It's (of course) my main box which the other systems NFS off, mail server, etc. :-( > Are you using nullfs or unionfs which are a bit fragile? Nope. I'd be happy to mail you my kernel config if you want. I've posted it to http://chris.shenton.org/PECTOPAH but if the system's hung again, naturally it won't be available :-( Thanks for your help. Any other things I might try? Dunno if this matters, but I'm using an DELL CERC ATA RAID card with disks showing up as amrd* if that matters. Was flawless at 5.0-{CURRENT,RELEASE}. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"