On Sun, Jun 24, 2007 at 12:30:20AM -0400, Adam McDougall wrote:
On Mon, Apr 23, 2007 at 11:55:52AM -0400, Kris Kennaway wrote:
On Mon, Apr 23, 2007 at 05:35:47PM +0200, Kai wrote:
On Thu, Apr 19, 2007 at 02:33:29PM +0200, Kai wrote:
On Wed, Apr 11, 2007 at 12:53:32PM +0200, Kai wrote:
Hello all,
We're running into regular panics on our webserver after upgrading
from 4.x to 6.2-stable:
Hi all,
To continue this story, a colleague wrote a small program in C that
launches
40 threads to randomly append and write to 10 files on an NFS mounted
filesystem.
If I keep removing the files on one of the other machines in a while loop,
the first system panics:
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0x34
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc06bdefa
stack pointer = 0x28:0xeb9f69b8
frame pointer = 0x28:0xeb9f69c4
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 73626 (nfscrash)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 3h2m14s
Sounds like a nice denial of service problem. I can hand the program to
developers on request.
Please send it to me. Panics are always much easier to get fixed if
they come with a test case that developer can use to reproduce it.
Kris
I have been working on this problem all weekend and I have a strong hunch at
this point
that it is a result of 1.424 of sys/kern/vfs_bio.c which was between FreeBSD
5.1 and
5.2. This hunch is currently being verified by a system that was cvsupped to
code
just before 1.424, and it has been running about 7 times longer than the
usual time
required to crash. I am currently attempting to craft a patch for 6.2 that
essentially
backs out the change to see if that works, but if this information can help
send a
FreeBSD developer down the right trail to a proper fix, great. I will follow
up with
more detailed findings and results tonight or soon.
links:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.423;r2=1.424
related to 1.424:
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/vfs_bio.c.diff?r1=1.420r2=1.421
Commit emails:
http://docs.freebsd.org/cgi/mid.cgi?200311150845.hAF8jawU027349
http://docs.freebsd.org/cgi/mid.cgi?20030445.hAB4jbYw093253
___
If I turn on invariants, I get the following panic instead, much quicker, and
happens with at least as far back as 5.0-RELEASE:
panic: bundirty: buffer 0x8e2e95f8 still on queue 1
cpuid = 1
Uptime: 35s
Dumping 511 MB (2 chunks)
chunk 0: 1MB (153 pages) ... ok
chunk 1: 511MB (130816 pages) 496 480 464 448 432 416 400 384 368 352 336 320
304 288 272 256 240 224 208 192 176
160 144 128 112 96 80 64 48 32 16
#0 doadump () at pcpu.h:172
172 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0 doadump () at pcpu.h:172
#1 0x8028d699 in boot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:409
#2 0x8028d12b in panic (fmt=0x80443458 bundirty: buffer %p
still on queue %d)
at /usr/src/sys/kern/kern_shutdown.c:565
#3 0x802e1e78 in bundirty (bp=0x8e2e95f8) at
/usr/src/sys/kern/vfs_bio.c:1055
#4 0x802e3eb1 in brelse (bp=0x8e2e95f8) at
/usr/src/sys/kern/vfs_bio.c:1370
#5 0x803550e8 in nfs_writebp (bp=0x8e2e95f8, force=0, td=0x0)
at
/usr/src/sys/nfsclient/nfs_vnops.c:3005
#6 0x802e5197 in getblk (vp=0xff000c23e5d0, blkno=0, size=14400,
slpflag=256, slptimeo=0, flags=0)
at buf.h:412
#7 0x80344f13 in nfs_getcacheblk (vp=0xff000c23e5d0, bn=0,
size=14400, td=0xff0015b274c0)
at /usr/src/sys/nfsclient/nfs_bio.c:1252
#8 0x8034616c in nfs_write (ap=0x0) at
/usr/src/sys/nfsclient/nfs_bio.c:1068
#9 0x80405ee4 in VOP_WRITE_APV (vop=0x805a0260,
a=0x976bfa10) at vnode_if.c:698
#10 0x80303d2c in vn_write (fp=0xff000f524000,
uio=0x976bfb50, active_cred=0x0, flags=0,
td=0xff0015b274c0) at vnode_if.h:372
#11 0x802ba2e5 in dofilewrite (td=0xff0015b274c0, fd=3,
fp=0xff000f524000, auio=0x976bfb50,
offset=0, flags=0) at file.h:253
#12 0x802ba5e1 in kern_writev (td=0xff0015b274c0, fd=3,
auio=0x976bfb50)
at /usr/src/sys/kern/sys_generic.c:402
#13 0x802ba6da in write (td=0x0, uap=0x0) at
/usr/src/sys/kern/sys_generic.c:326
#14 0x803c6db2 in syscall (frame=
{tf_rdi = 3, tf_rsi = 140737488344336,