Re: Sound familiar? 5.0-RC hangs on dual athlon
Jacques A. Vidrine cc current@ I have an ASUS P2L97-DS ACPI BIOS Revision 1008 Dual cpu box FreeBSD 5.0-DP2 #0: Wed Dec 4 00:26:02 CET 2002 CPU: Pentium II/Pentium II Xeon/Celeron (334.09-MHz 686-class CPU) Origin = GenuineIntel Id = 0x650 Stepping = 0 Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP, MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR real memory = 536858624 (511 MB) avail memory = 514600960 (490 MB) I also experienced unusable instability with 5.0-DP2 DUAL CPU kernel. it crashed with lots of different stack traces, so I didnt chase/report (lack of time, (I migh have found time if it was one thing consiustently, but no time for a variety)) Easiest way to get it to crash in minutes was do several jobs at once, EG cd /usr/src ; make -j 10 Without the j 10 it reduced to `just' a handful of crashes during make. I dropped back to a generic single CPU kernel. ( Which cancelled main reason I moved to 5.0-DP2: to get ATA bus working with dual, see my Nov. 22 Subject: 5.0-DP2: SMP+ATA OK. But 4.7 stable boot panic with ASUS P2L97-DS To: freebsd-current@ ) I'm down loading 5.0-RC1-i386-disc1.iso Julian Stacey jhs @ berklix.com Computer Systems Engineer, Unix Net Consultant, Munich. Ihr Rauchen = mein allergischer Kopfschmerz ! Schnupftabak probieren. Munich BSD Conference:http://berklix.org/conf/ Spam phrases triggering deletion: http://berklix.com/jhs/mail/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Sound familiar? 5.0-RC hangs on dual athlon
On Mon, Dec 09, 2002 at 10:47:24PM +0100, Julian H. Stacey wrote: I dropped back to a generic single CPU kernel. ( Which cancelled main reason I moved to 5.0-DP2: to get ATA bus working with dual, see my Nov. 22 Subject: 5.0-DP2: SMP+ATA OK. But 4.7 stable boot panic with ASUS P2L97-DS To: freebsd-current@ ) I'm down loading 5.0-RC1-i386-disc1.iso Well, I tried again, this time: = I built with DDB, INVARIANTS, INVARIANT_SUPPORT, WITNESS, and WITNESS_SKIPSPIN --- none of these were enabled previously. = I did not try to use ccd nor vinum --- I tried one and then the other previously. = I did not use UFS2 --- I formatted all large filesytems with UFS2 previously. So far things are peachy ... I've gotten along much futher than previously (restored all files from backup while building GNOME 2). Later (much later) I'll try to narrow the problem down further. Cheers, -- Jacques A. Vidrine [EMAIL PROTECTED] http://www.celabo.org/ NTT/Verio SME . FreeBSD UNIX . Heimdal Kerberos [EMAIL PROTECTED] . [EMAIL PROTECTED] . [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Sound familiar? 5.0-RC hangs on dual athlon
Hello All, I finally managed to put some time aside to redo my main development/desktop machine to run FreeBSD 5.0. (I've been running 5.x on my laptop for some months.) I had to retreat back to 4.7 because I could not get through some simple tasks without the system hanging. The system is a dual Athlon box with 1 GB RAM. The dmesg output is below. At first the system hung while I was building GNOME 2.0 and restoring some files from tape. It wasn't _completely_ hung: I could switch VTYs, and enter new commands (though it might take tens of seconds to echo my typing, and longer to actually execute, say, `ps'). I noticed that both an `ld' process and the `restore' process seemed to stuck in state `wdrain'. I attempted to reboot the system, but after several minutes it still had not appeared that it halted --- so I pulled the plug. I then tried again. This time, I thought perhaps that I would be gentler. I tried checking out the ports tree (over ssh) (I had done this previously successfully). Within two minutes, the system was `hung' again. The `cvs' process appeared to be stuck in `wdrain'. One more time. Again, I attempted to check out the ports tree. After 20 minutes or so, again the system was `hung', although this time I couldn't check whether there were any processes in `wdrain', because it was hung hard and completely. Does this ring bells for anyone? What should I look for when I get a few hours again to waste? Cheers, -- Jacques A. Vidrine [EMAIL PROTECTED] http://www.celabo.org/ NTT/Verio SME . FreeBSD UNIX . Heimdal Kerberos [EMAIL PROTECTED] . [EMAIL PROTECTED] . [EMAIL PROTECTED] Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-RC #1: Sat Dec 7 19:30:52 CST 2002 [EMAIL PROTECTED]:/spare1/obj/usr/src/sys/GENERIC Preloaded elf kernel /boot/kernel/kernel at 0xc06a4000. Preloaded elf module /boot/kernel/acpi.ko at 0xc06a40a8. Calibrating clock(s) ... TSC clock: 1194486067 Hz, i8254 clock: 1192995 Hz CLK_USE_I8254_CALIBRATION not specified - using default frequency Timecounter i8254 frequency 1193182 Hz CLK_USE_TSC_CALIBRATION not specified - using old calibration method Timecounter TSC frequency 1194678840 Hz CPU: AMD Athlon(tm) Processor (1194.68-MHz 686-class CPU) Origin = AuthenticAMD Id = 0x661 Stepping = 1 Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE AMD Features=0xc044RSVD,AMIE,DSP,3DNow! Data TLB: 32 entries, fully associative Instruction TLB: 16 entries, fully associative L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative L2 internal cache: 256 kbytes, 64 bytes/line, 1 lines/tag, 8-way associative real memory = 1073676288 (1023 MB) Physical memory chunk(s): 0x1000 - 0x0009efff, 647168 bytes (158 pages) 0x006cb000 - 0x3ffe7fff, 1066520576 bytes (260381 pages) avail memory = 1036201984 (988 MB) bios32: Found BIOS32 Service Directory header at 0xc00f7440 bios32: Entry = 0xfd6a0 (c00fd6a0) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0xfd6a0+0x120 pnpbios: Found PnP BIOS data at 0xc00f7490 pnpbios: Entry = f:9ea2 Rev = 1.0 Other BIOS signatures found: Initializing GEOMetry subsystem random: entropy source mem: memory I/O Pentium Pro MTRR support enabled null: null device, zero device npx0: math processor on motherboard npx0: INT 16 interface acpi0: PTLTDRSDT on motherboard ACPI-0625: *** Info: GPE Block0 defined as GPE0 to GPE15 pci_open(1):mode 1 addr port (0x0cf8) is 0x80008004 pci_open(1a): mode1res=0x8000 (0x8000) pci_cfgcheck: device 0 [class=06] [hdr=00] is there (id=700c1022) acpi0: power button is handled as a fixed feature programming model. acpi0: sleep button is handled as a fixed feature programming model. ACPI timer looks BAD min = 1, max = 4, width = 4 ACPI timer looks BAD min = 1, max = 4, width = 4 ACPI timer looks GOOD min = 1, max = 3, width = 3 ACPI timer looks GOOD min = 1, max = 3, width = 3 ACPI timer looks GOOD min = 1, max = 2, width = 2 ACPI timer looks GOOD min = 1, max = 3, width = 3 ACPI timer looks GOOD min = 1, max = 3, width = 3 ACPI timer looks BAD min = 1, max = 4, width = 4 ACPI timer looks GOOD min = 1, max = 3, width = 3 ACPI timer looks BAD min = 1, max = 4, width = 4 Timecounter ACPI-safe frequency 3579545 Hz acpi_timer0: 24-bit timer at 3.579545MHz port 0x8008-0x800b on acpi0 acpi_cpu0: CPU on acpi0 acpi_cpu1: CPU on acpi0 acpi_button0: Power Button on acpi0 pcib0: ACPI Host-PCI bridge port 0x8080-0x80ff,0x8000-0x807f,0xcf8-0xcff iomem 0xdc000-0xd on acpi0 initial configuration \\_SB_.PCI0.ISA_.LNKA irq 10: [ 3 5 10 11] low,level,sharable 0.13.0 \\_SB_.PCI0.ISA_.LNKB irq
Re: Sound familiar? 5.0-RC hangs on dual athlon
Jacques A. Vidrine wrote: Hello All, I finally managed to put some time aside to redo my main development/desktop machine to run FreeBSD 5.0. (I've been running 5.x on my laptop for some months.) I had to retreat back to 4.7 because I could not get through some simple tasks without the system hanging. The system is a dual Athlon box with 1 GB RAM. The dmesg output is below. At first the system hung while I was building GNOME 2.0 and restoring some files from tape. It wasn't _completely_ hung: I could switch VTYs, and enter new commands (though it might take tens of seconds to echo my typing... I seem to remember something similar from a few months ago that affected machines with lots of RAM because of something to do with high order address bits. I thought it got fixed, but I can't really recall. Was it an Athlon problem, or a gcc problem, or both? Hmpf, can't remember! Anyone else think this might be the same thing? To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Familiar?
cc -O -pipe -mcpu=pentiumpro -DLIBC_SCCS -I/dell/imp/p4/newcard/src/lib/libkvm -c /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c -o kvm_proc.o /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c: In function `kvm_proclist': /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:327: `KI_MTXBLOCK' undeclared (first use in this function) /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:327: (Each undeclared identifier is reported only once /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:327: for each function it appears in.) /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:328: structure has no member named `td_mtxname' /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:330: structure has no member named `td_mtxname' /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:331: structure has no member named `ki_mtxname' /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:331: `MTXNAMELEN' undeclared (first use in this function) /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:332: structure has no member named `ki_mtxname' /dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:359: `SMTX' undeclared (first use in this function) *** Error code 1 This is with last night's sources. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
This look familiar to anyone? (bug in 4.11 maybe)
I know this is not a -current problem, but if it was fixed by someone they are likely to be reading here, and not in -stable.. We have a hybrid (4.11+patches) kernel that sometimes crashes. The crash always has teh same symptoms and I'm hoping that they look familiar to someone... The message is below, followed by analysis. Fatal trap 12: page fault while in kernel mode fault virtual address = 0xe6b95cc8 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01846d9 stack pointer = 0x10:0xc954de64 frame pointer = 0x10:0xc954de84 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 10326 (qftListener) interrupt mask = none trap number = 12 In a VFS operation, %ecx get's corrupted (maybe from an interrupt?) betweeen the instruction where it's loaded with a constant, and the instruction where it's used... It'always the same instruction, though often in DIFFERENT VFS instructions (fsync, bwrite so far) the trap frame usually looks like: #4 0xc0251813 in trap (frame={tf_fs = 0x10, tf_es = 0x10, tf_ds = 0x10, tf_edi = 0x0, tf_esi = 0x1, tf_ebp = 0xc954de84, tf_isp = 0xc954de50, tf_ebx = 0xc27d6d80, tf_edx = 0xc1344600, tf_ecx = 0xc96145b2, tf_eax = 0xc954de78, tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc01846d9, tf_cs = 0x8, tf_eflags = 0x10286, tf_esp = 0xc954de78, tf_ss = 0xc27d6d80}) at /usr/src/sys/i386/i386/trap.c:443 #5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 #6 0xc0189be2 in vop_stdbwrite (ap=0xc954deb4) at /usr/src/sys/kern/vfs_default.c:319 the code there looks like: (kgdb) up 5 #5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 923 rc = VCALL(vp, VOFFSET(vop_strategy), a); (kgdb) list 918 struct vop_strategy_args a; 919 int rc; 920 a.a_desc = VDESC(vop_strategy); 921 a.a_vp = vp; 922 a.a_bp = bp; 923 rc = VCALL(vp, VOFFSET(vop_strategy), a); ---here 924 return (rc); 925 } 926 struct vop_print_args { 927 struct vnodeop_desc *a_desc; In Assembler: 0xc01846cc bwrite+460:mov0xc029dcc0,%ecx 0xc01846d2 bwrite+466:mov0x18(%eax),%edx 0xc01846d5 bwrite+469:lea0xfff4(%ebp),%eax 0xc01846d8 bwrite+472:push %eax 0xc01846d9 bwrite+473:mov(%edx,%ecx,4),%eax **POW** 0xc01846dc bwrite+476:call *%eax 0xc01846de bwrite+478:add$0x4,%esp 0xc01846e1 bwrite+481:mov0xfff0(%ebp),%eax looking at the regs, dx = 0xc1344600, cx = 0xc96145b2, and C1344600+(4*C96145B2) = 3E6B95CC8 the lower 32 bits of which is the same as the fault address but in the code above we see that %cx was just loaded from location 0xc029dcc0 which contains: (kgdb) x/x 0xc029dcc0 0xc029dcc0 vop_strategy_desc: 0x12 0x12 is the correct offset for a strategy call. so cx got corrupted between the instruction at 0xc01846cc and that at 0xc01846d9. Note that the contents of cx (0xc96145b2) is an address somewhat higher than the kernel stack at the time in question. a dump of ram in that area shows: (kgdb) x/64xw 0xc96145a0 0xc96145a0: 0xc954e900 0xc9709c00 0x 0xc96145a8 0xc96145b0:[0xc9580660] 0xc95c7370 0xc04d7504 0xc04d47d4 0xc96145c0: 0xaa26 0x0020 0x 0x 0xc96145d0: 0xfc812c38 0x0002 0x00040010 0x0020 0xc96145e0: 0x 0x 0x 0x 0xc96145f0: 0x 0xc9636a40 0x0001fc93 0x 0xc9614600: 0xc02ed7c0 0xc95b4120 0x 0xc9614608 0xc9614610: 0x 0xc948 0x 0xc9614618 0xc9614620: 0x3f5b 0x0003 0x 0x 0xc9614630: 0xfe37c115 0x2188 0x000e 0x 0xc9614640: 0x 0x 0x 0x 0xc9614650: 0x 0x 0x 0x 0xc9614660: 0xc9722ae0 0xc961c600 0x 0xc9614668 0xc9614670: 0xc9690660 0xc97091f0 0x 0xc9614678 0xc9614680: 0xcabf 0x0012 0x 0x 0xc9614690: 0xfc8189f2 0x0002 0x001d 0x This is obviously SOMETHING, but what? And why does %cx point HALF WAY THROUGH an obvious 32 bit pointer? Thoughts of hardware problems do come to mind... but.. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: This look familiar to anyone? (bug in 4.11 maybe)
On 24-Jul-01 Julian Elischer wrote: In a VFS operation, %ecx get's corrupted (maybe from an interrupt?) betweeen the instruction where it's loaded with a constant, and the instruction where it's used... It'always the same instruction, though often in DIFFERENT VFS instructions (fsync, bwrite so far) the trap frame usually looks like: #4 0xc0251813 in trap (frame={tf_fs = 0x10, tf_es = 0x10, tf_ds = 0x10, tf_edi = 0x0, tf_esi = 0x1, tf_ebp = 0xc954de84, tf_isp = 0xc954de50, tf_ebx = 0xc27d6d80, tf_edx = 0xc1344600, tf_ecx = 0xc96145b2, tf_eax = 0xc954de78, tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc01846d9, tf_cs = 0x8, tf_eflags = 0x10286, tf_esp = 0xc954de78, tf_ss = 0xc27d6d80}) at /usr/src/sys/i386/i386/trap.c:443 #5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 #6 0xc0189be2 in vop_stdbwrite (ap=0xc954deb4) at /usr/src/sys/kern/vfs_default.c:319 the code there looks like: (kgdb) up 5 #5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 923 rc = VCALL(vp, VOFFSET(vop_strategy), a); (kgdb) list 918 struct vop_strategy_args a; 919 int rc; 920 a.a_desc = VDESC(vop_strategy); 921 a.a_vp = vp; 922 a.a_bp = bp; 923 rc = VCALL(vp, VOFFSET(vop_strategy), a); ---here 924 return (rc); 925 } 926 struct vop_print_args { 927 struct vnodeop_desc *a_desc; In Assembler: 0xc01846cc bwrite+460: mov0xc029dcc0,%ecx 0xc01846d2 bwrite+466: mov0x18(%eax),%edx 0xc01846d5 bwrite+469: lea0xfff4(%ebp),%eax 0xc01846d8 bwrite+472: push %eax 0xc01846d9 bwrite+473: mov(%edx,%ecx,4),%eax **POW** 0xc01846dc bwrite+476: call *%eax 0xc01846de bwrite+478: add$0x4,%esp 0xc01846e1 bwrite+481: mov0xfff0(%ebp),%eax looking at the regs, dx = 0xc1344600, cx = 0xc96145b2, and C1344600+(4*C96145B2) = 3E6B95CC8 the lower 32 bits of which is the same as the fault address but in the code above we see that %cx was just loaded from location 0xc029dcc0 which contains: (kgdb) x/x 0xc029dcc0 0xc029dcc0 vop_strategy_desc: 0x12 0x12 is the correct offset for a strategy call. so cx got corrupted between the instruction at 0xc01846cc and that at 0xc01846d9. Very weird. Note that traps and interrupts will save %ecx in the trapframe, so you aren't going to end up with those getting corrupted unless we somehow screw up ecx after popping the frame (or before pushing it). Note that the contents of cx (0xc96145b2) is an address somewhat higher than the kernel stack at the time in question. Could be a stack of some other thread. All the 0xc9X addresses are pointers to automatic variables. The 0xc0[2-4]X are return addresses. a dump of ram in that area shows: (kgdb) x/64xw 0xc96145a0 0xc96145a0: 0xc954e900 0xc9709c00 0x 0xc96145a8 0xc96145b0:[0xc9580660] 0xc95c7370 0xc04d7504 0xc04d47d4 0xc96145c0: 0xaa26 0x0020 0x 0x 0xc96145d0: 0xfc812c38 0x0002 0x00040010 0x0020 0xc96145e0: 0x 0x 0x 0x 0xc96145f0: 0x 0xc9636a40 0x0001fc93 0x 0xc9614600: 0xc02ed7c0 0xc95b4120 0x 0xc9614608 0xc9614610: 0x 0xc948 0x 0xc9614618 0xc9614620: 0x3f5b 0x0003 0x 0x 0xc9614630: 0xfe37c115 0x2188 0x000e 0x 0xc9614640: 0x 0x 0x 0x 0xc9614650: 0x 0x 0x 0x 0xc9614660: 0xc9722ae0 0xc961c600 0x 0xc9614668 0xc9614670: 0xc9690660 0xc97091f0 0x 0xc9614678 0xc9614680: 0xcabf 0x0012 0x 0x 0xc9614690: 0xfc8189f2 0x0002 0x001d 0x This is obviously SOMETHING, but what? And why does %cx point HALF WAY THROUGH an obvious 32 bit pointer? Thoughts of hardware problems do come to mind... but.. Is it just one machine that does this reliably? -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
This look familiar to anyone? (bug in 4.11 maybe)
I know this is not a -current problem, but if it was fixed by someone they are likely to be reading here, and not in -stable.. We have a hybrid (4.11+patches) kernel that sometimes crashes. The crash always has teh same symptoms and I'm hoping that they look familiar to someone... The message is below, followed by analysis. Fatal trap 12: page fault while in kernel mode fault virtual address = 0xe6b95cc8 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01846d9 stack pointer = 0x10:0xc954de64 frame pointer = 0x10:0xc954de84 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 10326 (qftListener) interrupt mask = none trap number = 12 In a VFS operation, %ecx get's corrupted (maybe from an interrupt?) betweeen the instruction where it's loaded with a constant, and the instruction where it's used... It'always the same instruction, though often in DIFFERENT VFS instructions (fsync, bwrite so far) the trap frame usually looks like: #4 0xc0251813 in trap (frame={tf_fs = 0x10, tf_es = 0x10, tf_ds = 0x10, tf_edi = 0x0, tf_esi = 0x1, tf_ebp = 0xc954de84, tf_isp = 0xc954de50, tf_ebx = 0xc27d6d80, tf_edx = 0xc1344600, tf_ecx = 0xc96145b2, tf_eax = 0xc954de78, tf_trapno = 0xc, tf_err = 0x0, tf_eip = 0xc01846d9, tf_cs = 0x8, tf_eflags = 0x10286, tf_esp = 0xc954de78, tf_ss = 0xc27d6d80}) at /usr/src/sys/i386/i386/trap.c:443 #5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 #6 0xc0189be2 in vop_stdbwrite (ap=0xc954deb4) at /usr/src/sys/kern/vfs_default.c:319 the code there looks like: (kgdb) up 5 #5 0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923 923 rc = VCALL(vp, VOFFSET(vop_strategy), a); (kgdb) list 918 struct vop_strategy_args a; 919 int rc; 920 a.a_desc = VDESC(vop_strategy); 921 a.a_vp = vp; 922 a.a_bp = bp; 923 rc = VCALL(vp, VOFFSET(vop_strategy), a); ---here 924 return (rc); 925 } 926 struct vop_print_args { 927 struct vnodeop_desc *a_desc; In Assembler: 0xc01846cc bwrite+460:mov0xc029dcc0,%ecx 0xc01846d2 bwrite+466:mov0x18(%eax),%edx 0xc01846d5 bwrite+469:lea0xfff4(%ebp),%eax 0xc01846d8 bwrite+472:push %eax 0xc01846d9 bwrite+473:mov(%edx,%ecx,4),%eax **POW** 0xc01846dc bwrite+476:call *%eax 0xc01846de bwrite+478:add$0x4,%esp 0xc01846e1 bwrite+481:mov0xfff0(%ebp),%eax looking at the regs, dx = 0xc1344600, cx = 0xc96145b2, and C1344600+(4*C96145B2) = 3E6B95CC8 the lower 32 bits of which is the same as the fault address but in the code above we see that %cx was just loaded from location 0xc029dcc0 which contains: (kgdb) x/x 0xc029dcc0 0xc029dcc0 vop_strategy_desc: 0x12 0x12 is the correct offset for a strategy call. so cx got corrupted between the instruction at 0xc01846cc and that at 0xc01846d9. Note that the contents of cx (0xc96145b2) is an address somewhat higher than the kernel stack at the time in question. a dump of ram in that area shows: (kgdb) x/64xw 0xc96145a0 0xc96145a0: 0xc954e900 0xc9709c00 0x 0xc96145a8 0xc96145b0:[0xc9580660] 0xc95c7370 0xc04d7504 0xc04d47d4 0xc96145c0: 0xaa26 0x0020 0x 0x 0xc96145d0: 0xfc812c38 0x0002 0x00040010 0x0020 0xc96145e0: 0x 0x 0x 0x 0xc96145f0: 0x 0xc9636a40 0x0001fc93 0x 0xc9614600: 0xc02ed7c0 0xc95b4120 0x 0xc9614608 0xc9614610: 0x 0xc948 0x 0xc9614618 0xc9614620: 0x3f5b 0x0003 0x 0x 0xc9614630: 0xfe37c115 0x2188 0x000e 0x 0xc9614640: 0x 0x 0x 0x 0xc9614650: 0x 0x 0x 0x 0xc9614660: 0xc9722ae0 0xc961c600 0x 0xc9614668 0xc9614670: 0xc9690660 0xc97091f0 0x 0xc9614678 0xc9614680: 0xcabf 0x0012 0x 0x 0xc9614690: 0xfc8189f2 0x0002 0x001d 0x This is obviously SOMETHING, but what? And why does %cx point HALF WAY THROUGH an obvious 32 bit pointer? Thoughts of hardware problems do come to mind... but.. My present line of attack is to change the page-fault handler to leave a 500 byte window untouched on the stack (except for the frame) so that I can try see if an interrupt occured recently, and if so, what it was To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe