Re: Frequent VFS crashes with RELENG_6
On Tue, Nov 14, 2006 at 05:10:23PM +0200, Vlad Galu wrote: > On 11/1/06, Vlad Galu <[EMAIL PROTECTED]> wrote: > >On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: > >> On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote: > >> > >> >Yes, but for objective reasons I can't publish it :( > >> > The only > >> > debugging option that I didn't use was INVARIANTS. > >> > >> Which is coincidentally the most useful one ;-) > >> > >> Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of > >> 'show lockedvnods' at the time of crash, as well. > >Upon Tor Egge's suggestion, I removed ZERO_COPY_SOCKETS from my > kernel and the machine has been running nicely ever since. Glad to hear it, depending on what Tor had to say you might want to file a PR about that. Kris pgpZDlSaF6yFa.pgp Description: PGP signature
Re: Frequent VFS crashes with RELENG_6
On 11/1/06, Vlad Galu <[EMAIL PROTECTED]> wrote: On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: > On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote: > > >Yes, but for objective reasons I can't publish it :( > > The only > > debugging option that I didn't use was INVARIANTS. > > Which is coincidentally the most useful one ;-) > > Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of > 'show lockedvnods' at the time of crash, as well. Upon Tor Egge's suggestion, I removed ZERO_COPY_SOCKETS from my kernel and the machine has been running nicely ever since. -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: It now crashes in a different place. Unfortunately I don't have physical access to the machine. A bt full is available at http://night.rdslink.ro/dudu/freebsd/03_11_2006.txt. The stack was corrupted though :( -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
On 10/31/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote: >Yes, but for objective reasons I can't publish it :( > The only > debugging option that I didn't use was INVARIANTS. Which is coincidentally the most useful one ;-) Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of 'show lockedvnods' at the time of crash, as well. I've applied a patch suggested by Eric and I'll see how it goes with it. If it crashes again, I'll add the things you mentioned to my kernel configuration and get back to the list with further details. -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
On Tue, Oct 31, 2006 at 04:34:59PM +0200, Vlad Galu wrote: >Yes, but for objective reasons I can't publish it :( > The only > debugging option that I didn't use was INVARIANTS. Which is coincidentally the most useful one ;-) Also turn on DEBUG_LOCKS and DEBUG_VFS_LOCKS then report the output of 'show lockedvnods' at the time of crash, as well. Kris pgpfBXzUAvxB9.pgp Description: PGP signature
Re: Frequent VFS crashes with RELENG_6
On 10/31/06, Eric Anderson <[EMAIL PROTECTED]> wrote: On 10/31/06 08:03, Vlad Galu wrote: > On 10/1/06, Cy Schubert <[EMAIL PROTECTED]> wrote: >> In message <[EMAIL PROTECTED]>, >> "Vlad >> GALU" writes: >>> On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote: Hi, 1.) Bad ram ? Have you run some memory tester ? >>>Yes, memtest86 didn't show anything weird. >>> 2.) Have you background fsck running on this disk ? If so try to boot into single user and do a full fsck on this disk. >>>I have background_fsck="NO" in rc.conf and I checked the whole disk >>> several times. >>>Something I forgot to mention earlier: the crash is easier to >>> reproduce when running rtorrent. The machine did crash without running >>> it as well, but far more seldom. >> I've been experiencing the same problem as well. I discovered that the disk on which the filesystem was had some bad sectors causing dump -0Lauf to fail while taking snapshot causing the system to panic. Running smartctl on the device indicated that there were bad sectors 40% within the surface scan being performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good price, so good a price I purchased two of them). It was 906 days old, having only been powered off maybe a dozen times over the last three years. > > During the last 2 weeks I ran the same system with WITNESS turned > on. The fact that the purpose of this machine is not I/O dependant > allowed me to run bonnie++ and iozone every second day for the whole > 24 hours. At the same time I ran several instances of rtorrent. This > morning I rebooted to a non-WITNESS kernel (the same sources from 2 > weeks ago) and the exact same crash occured within a few hours from > bootup. In all this time, smartd didn't report anything suspicious. > WITNESS only reported a LOR related to kqueue that is already known. > Any ideas for further stresstesting would be welcome. I am > familiar with a few parts of the kernel, but VFS is a total stranger > to me. > > Did you get a crash dump? If not, you might want to start with adding all the debugger options into the kernel. Yes, but for objective reasons I can't publish it :( The only debugging option that I didn't use was INVARIANTS. However, I issued an output of "bt full" during the beginning of this thread. See http://lists.freebsd.org/pipermail/freebsd-stable/2006-September/028985.html. Eric -- Eric AndersonSr. Systems AdministratorCentaur Technology Anything that works is better than anything that doesn't. -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
On 10/31/06 08:03, Vlad Galu wrote: On 10/1/06, Cy Schubert <[EMAIL PROTECTED]> wrote: In message <[EMAIL PROTECTED]>, "Vlad GALU" writes: On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote: Hi, 1.) Bad ram ? Have you run some memory tester ? Yes, memtest86 didn't show anything weird. 2.) Have you background fsck running on this disk ? If so try to boot into single user and do a full fsck on this disk. I have background_fsck="NO" in rc.conf and I checked the whole disk several times. Something I forgot to mention earlier: the crash is easier to reproduce when running rtorrent. The machine did crash without running it as well, but far more seldom. I've been experiencing the same problem as well. I discovered that the disk on which the filesystem was had some bad sectors causing dump -0Lauf to fail while taking snapshot causing the system to panic. Running smartctl on the device indicated that there were bad sectors 40% within the surface scan being performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good price, so good a price I purchased two of them). It was 906 days old, having only been powered off maybe a dozen times over the last three years. During the last 2 weeks I ran the same system with WITNESS turned on. The fact that the purpose of this machine is not I/O dependant allowed me to run bonnie++ and iozone every second day for the whole 24 hours. At the same time I ran several instances of rtorrent. This morning I rebooted to a non-WITNESS kernel (the same sources from 2 weeks ago) and the exact same crash occured within a few hours from bootup. In all this time, smartd didn't report anything suspicious. WITNESS only reported a LOR related to kqueue that is already known. Any ideas for further stresstesting would be welcome. I am familiar with a few parts of the kernel, but VFS is a total stranger to me. Did you get a crash dump? If not, you might want to start with adding all the debugger options into the kernel. Eric -- Eric AndersonSr. Systems AdministratorCentaur Technology Anything that works is better than anything that doesn't. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
On 10/1/06, Cy Schubert <[EMAIL PROTECTED]> wrote: In message <[EMAIL PROTECTED]>, "Vlad GALU" writes: > On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > 1.) Bad ram ? Have you run some memory tester ? > >Yes, memtest86 didn't show anything weird. > > > 2.) Have you background fsck running on this disk ? If > > so try to boot into single user and do a full fsck on this > > disk. > > > >I have background_fsck="NO" in rc.conf and I checked the whole disk > several times. >Something I forgot to mention earlier: the crash is easier to > reproduce when running rtorrent. The machine did crash without running > it as well, but far more seldom. I've been experiencing the same problem as well. I discovered that the disk on which the filesystem was had some bad sectors causing dump -0Lauf to fail while taking snapshot causing the system to panic. Running smartctl on the device indicated that there were bad sectors 40% within the surface scan being performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good price, so good a price I purchased two of them). It was 906 days old, having only been powered off maybe a dozen times over the last three years. During the last 2 weeks I ran the same system with WITNESS turned on. The fact that the purpose of this machine is not I/O dependant allowed me to run bonnie++ and iozone every second day for the whole 24 hours. At the same time I ran several instances of rtorrent. This morning I rebooted to a non-WITNESS kernel (the same sources from 2 weeks ago) and the exact same crash occured within a few hours from bootup. In all this time, smartd didn't report anything suspicious. WITNESS only reported a LOR related to kqueue that is already known. Any ideas for further stresstesting would be welcome. I am familiar with a few parts of the kernel, but VFS is a total stranger to me. -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
In message <[EMAIL PROTECTED]>, "Vlad GALU" writes: > On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > 1.) Bad ram ? Have you run some memory tester ? > >Yes, memtest86 didn't show anything weird. > > > 2.) Have you background fsck running on this disk ? If > > so try to boot into single user and do a full fsck on this > > disk. > > > >I have background_fsck="NO" in rc.conf and I checked the whole disk > several times. >Something I forgot to mention earlier: the crash is easier to > reproduce when running rtorrent. The machine did crash without running > it as well, but far more seldom. I've been experiencing the same problem as well. I discovered that the disk on which the filesystem was had some bad sectors causing dump -0Lauf to fail while taking snapshot causing the system to panic. Running smartctl on the device indicated that there were bad sectors 40% within the surface scan being performed by SMART. The drive, an 80 GB Maxtor, was replaced with a 250 GB Western Digital (for a very good price, so good a price I purchased two of them). It was 906 days old, having only been powered off maybe a dozen times over the last three years. -- Cheers, Cy Schubert <[EMAIL PROTECTED]> FreeBSD UNIX: <[EMAIL PROTECTED]> Web: http://www.FreeBSD.org e**(i*pi)+1=0 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
On 9/30/06, Martin Blapp <[EMAIL PROTECTED]> wrote: Hi, 1.) Bad ram ? Have you run some memory tester ? Yes, memtest86 didn't show anything weird. 2.) Have you background fsck running on this disk ? If so try to boot into single user and do a full fsck on this disk. I have background_fsck="NO" in rc.conf and I checked the whole disk several times. Something I forgot to mention earlier: the crash is easier to reproduce when running rtorrent. The machine did crash without running it as well, but far more seldom. Martin Martin Blapp, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> -- ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 61 826 93 00 Fax: +41 61 826 93 01 PGP: PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E -- On Sat, 30 Sep 2006, Vlad GALU wrote: > I've been getting random crashes like the one below, once or twice a > week, always in the same code path. The system is a RELENG_6 as of Wed > Sep 27 11:42:57 EEST 2006, running on amd64. > > -- cut here -- > #0 doadump () at pcpu.h:172 > No locals. > #1 0x8022d033 in boot (howto=260) at > ../../../kern/kern_shutdown.c:409 > first_buf_printf = 1 > #2 0x8022d687 in panic (fmt=0xff002bb6e260 "°ö¾\"") at > ../../../kern/kern_shutdown.c:565 > bootopt = 260 > newpanic = 0 > ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = > 0xa7995790, reg_save_area = 0xa79956b0}} > buf = "vm_page_unwire: invalid wire count: 0", '\0' times> -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Frequent VFS crashes with RELENG_6
Hi, 1.) Bad ram ? Have you run some memory tester ? 2.) Have you background fsck running on this disk ? If so try to boot into single user and do a full fsck on this disk. Martin Martin Blapp, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> -- ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 61 826 93 00 Fax: +41 61 826 93 01 PGP: PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E -- On Sat, 30 Sep 2006, Vlad GALU wrote: I've been getting random crashes like the one below, once or twice a week, always in the same code path. The system is a RELENG_6 as of Wed Sep 27 11:42:57 EEST 2006, running on amd64. -- cut here -- #0 doadump () at pcpu.h:172 No locals. #1 0x8022d033 in boot (howto=260) at ../../../kern/kern_shutdown.c:409 first_buf_printf = 1 #2 0x8022d687 in panic (fmt=0xff002bb6e260 "°ö¾\"") at ../../../kern/kern_shutdown.c:565 bootopt = 260 newpanic = 0 ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0xa7995790, reg_save_area = 0xa79956b0}} buf = "vm_page_unwire: invalid wire count: 0", '\0' times>___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Frequent VFS crashes with RELENG_6
I've been getting random crashes like the one below, once or twice a week, always in the same code path. The system is a RELENG_6 as of Wed Sep 27 11:42:57 EEST 2006, running on amd64. -- cut here -- #0 doadump () at pcpu.h:172 No locals. #1 0x8022d033 in boot (howto=260) at ../../../kern/kern_shutdown.c:409 first_buf_printf = 1 #2 0x8022d687 in panic (fmt=0xff002bb6e260 "°ö¾\"") at ../../../kern/kern_shutdown.c:565 bootopt = 260 newpanic = 0 ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area = 0xa7995790, reg_save_area = 0xa79956b0}} buf = "vm_page_unwire: invalid wire count: 0", '\0' #3 0x8036980b in vm_page_unwire (m=0xff003e5c79e8, activate=0) at ../../../vm/vm_page.c:1265 No locals. #4 0x80282c15 in vfs_vmio_release (bp=0x9a6c2430) at ../../../kern/vfs_bio.c:1470 i = 1 m = 0xff003e5c79e8 #5 0x80285f78 in getnewbuf (slpflag=0, slptimeo=0, size=0, maxsize=16384) at ../../../kern/vfs_bio.c:1779 addr = 18446744072226429136 bp = (struct buf *) 0x9a6c2430 nbp = (struct buf *) 0x9a69ac48 defrag = 0 nqindex = 1 flushingbufs = 0 #6 0x802863c0 in getblk (vp=0xff001015c5d0, blkno=0, size=2048, slpflag=0, slptimeo=0, flags=0) at ../../../kern/vfs_bio.c:2486 bsize = 0 maxsize = 0 vmio = 1 offset = 0 bp = (struct buf *) 0x0 bo = (struct bufobj *) 0xff001015c720 #7 0x802880ec in breadn (vp=0xff001015c5d0, blkno=0, size=0, rablkno=0x0, rabsize=0x0, cnt=0, cred=0x0, bpp=0x0) at ../../../kern/vfs_bio.c:738 bp = (struct buf *) 0xa79958f0 rabp = (struct buf *) 0x344 i = -1 rv = 0 readwait = 0 #8 0x8028850e in bread (vp=0x0, blkno=0, size=0, cred=0x0, bpp=0x0) at ../../../kern/vfs_bio.c:719 No locals. #9 0x803427a5 in ffs_read (ap=0x0) at ../../../ufs/ffs/ffs_vnops.c:523 vp = (struct vnode *) 0xff001015c5d0 ip = (struct inode *) 0xff0017978780 uio = (struct uio *) 0xa7995b50 fs = (struct fs *) 0xff0012347000 bp = (struct buf *) 0x0 lbn = 0 nextlbn = 1 bytesinfile = 0 size = 2048 xfersize = 836 blkoffset = 0 error = 0 orig_resid = 4096 seqcount = 2 ioflag = 131072 #10 0x803b374a in VOP_READ_APV (vop=0x0, a=0x0) at vnode_if.c:643 rc = 0 #11 0x802a74e0 in vn_read (fp=0xff001e5f8078, uio=0xa7995b50, active_cred=0x0, flags=0, td=0xff002bb6e260) at vnode_if.h:343 vp = (struct vnode *) 0xff001015c5d0 error = 0 ioflag = 131072 #12 0x80257b64 in dofileread (td=0xff002bb6e260, fd=5, fp=0xff001e5f8078, auio=0xa7995b50, offset=0, flags=0) at file.h:240 cnt = 4096 error = 509575288 ktruio = (struct uio *) 0x0 #13 0x80257de0 in kern_readv (td=0xff002bb6e260, fd=5, auio=0xa7995b50) at ../../../kern/sys_generic.c:192 fp = (struct file *) 0xff001e5f8078 error = 0 #14 0x80257eda in read (td=0x0, uap=0x0) at ../../../kern/sys_generic.c:116 auio = {uio_iov = 0xa7995b40, uio_iovcnt = 1, uio_offset = 0, uio_resid = 4096, uio_segflg = UIO_USERSPACE, uio_rw = UIO_READ, uio_td = 0xff002bb6e260} aiov = {iov_base = 0x666000, iov_len = 4096} #15 0x8038b2d8 in syscall (frame= {tf_rdi = 5, tf_rsi = 6709248, tf_rdx = 4096, tf_rcx = 542953472, tf_r8 = 1, tf_r9 = 0, tf_rax = 3, tf_rbx = 6151168, tf_rbp = 4294967295, tf_r10 = 3260, tf_r11 = 518, tf_r12 = 0, tf_r13 = 140737488327200, tf_r14 = 140737488327328, tf_r15 = 5, tf_trapno = 12, tf_addr = 9093168, tf_flags = 0, tf_err = 2, tf_rip = 550694412, tf_cs = 43, tf_rflags = 518, tf_rsp = 140737488327160, tf_ss = 35}) at ../../../amd64/amd64/trap.c:792 params = 0x7fff9200 callp = (struct sysent *) 0x80502ae8 p = (struct proc *) 0xff0022bef6b0 orig_tf_rflags = 518 sticks = 116 error = 0 narg = 3 args = {5, 6709248, 4096, 542953472, 1, 0, 140737488327328, 5} argp = (register_t *) 0x0 code = 3 reg = 48 regcnt = 6 #16 0x80377bc8 in Xfast_syscall () at ../../../amd64/amd64/exception.S:270 -- and here -- -- If it's there, and you can see it, it's real. If it's not there, and you can see it, it's virtual. If it's there, and you can't see it, it's transparent. If it's not there, and you can't see it, you erased it. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"