Re: Sound familiar? 5.0-RC hangs on dual athlon

2002-12-09 Thread Julian H. Stacey
Jacques A. Vidrine 
cc  current@

I have an ASUS P2L97-DS ACPI BIOS Revision 1008 Dual cpu box
FreeBSD 5.0-DP2 #0: Wed Dec  4 00:26:02 CET 2002
CPU: Pentium II/Pentium II Xeon/Celeron (334.09-MHz 686-class CPU)
Origin = GenuineIntel  Id = 0x650  Stepping = 0
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,
MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 536858624 (511 MB)
avail memory = 514600960 (490 MB)

I also experienced unusable instability with 5.0-DP2 DUAL CPU kernel.
it crashed with lots of different stack traces, 
so I didnt chase/report (lack of time, (I migh have found time
if it was one thing consiustently, but no time for a variety))
Easiest way to get it to crash in minutes was do several jobs 
at once, EG
cd /usr/src ; make -j 10
Without the j 10 it reduced to `just' a handful of crashes during make.

I dropped back to a generic single CPU kernel.

( Which cancelled main reason I moved to 5.0-DP2:
to get ATA bus working with dual, see my Nov. 22
   Subject: 5.0-DP2: SMP+ATA OK.  But 4.7  stable boot panic with ASUS P2L97-DS   To: 
freebsd-current@
)

I'm down loading 5.0-RC1-i386-disc1.iso

Julian Stacey
jhs @ berklix.com   Computer Systems Engineer, Unix  Net Consultant, Munich.
  Ihr Rauchen = mein allergischer Kopfschmerz !  Schnupftabak probieren.
  Munich BSD Conference:http://berklix.org/conf/
  Spam phrases triggering deletion: http://berklix.com/jhs/mail/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: Sound familiar? 5.0-RC hangs on dual athlon

2002-12-09 Thread Jacques A. Vidrine
On Mon, Dec 09, 2002 at 10:47:24PM +0100, Julian H. Stacey wrote:
 I dropped back to a generic single CPU kernel.
 
 ( Which cancelled main reason I moved to 5.0-DP2:
 to get ATA bus working with dual, see my Nov. 22
Subject: 5.0-DP2: SMP+ATA OK.  But 4.7  stable boot panic with ASUS P2L97-DS   
To: freebsd-current@
 )
 
 I'm down loading 5.0-RC1-i386-disc1.iso

Well, I tried again, this time:

   =  I built with DDB, INVARIANTS, INVARIANT_SUPPORT, WITNESS, and
  WITNESS_SKIPSPIN  --- none of these were enabled previously.

   =  I did not try to use ccd nor vinum --- I tried one and then the
  other previously.  

   =  I did not use UFS2 --- I formatted all large filesytems with UFS2
  previously.

So far things are peachy ... I've gotten along much futher than
previously (restored all files from backup while building GNOME 2).
Later (much later) I'll try to narrow the problem down further.

Cheers,
-- 
Jacques A. Vidrine [EMAIL PROTECTED]  http://www.celabo.org/
NTT/Verio SME  . FreeBSD UNIX .   Heimdal Kerberos
[EMAIL PROTECTED] .  [EMAIL PROTECTED]  .  [EMAIL PROTECTED]

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Sound familiar? 5.0-RC hangs on dual athlon

2002-12-08 Thread Jacques A. Vidrine
Hello All,

I finally managed to put some time aside to redo my main
development/desktop machine to run FreeBSD 5.0.  (I've been running
5.x on my laptop for some months.)  I had to retreat back to 4.7
because I could not get through some simple tasks without the system
hanging.  The system is a dual Athlon box with 1 GB RAM.  The dmesg
output is below.

At first the system hung while I was building GNOME 2.0 and restoring
some files from tape.  It wasn't _completely_ hung:  I could switch
VTYs, and enter new commands (though it might take tens of seconds to
echo my typing, and longer to actually execute, say, `ps').  I noticed
that both an `ld' process and the `restore' process seemed to stuck in
state `wdrain'.  I attempted to reboot the system, but after several
minutes it still had not appeared that it halted --- so I pulled the
plug.

I then tried again.  This time, I thought perhaps that I would be
gentler.  I tried checking out the ports tree (over ssh) (I had done
this previously successfully).  Within two minutes, the system was
`hung' again.  The `cvs' process appeared to be stuck in `wdrain'.

One more time.  Again, I attempted to check out the ports tree.  After
20 minutes or so, again the system was `hung', although this time I
couldn't check whether there were any processes in `wdrain', because it
was hung hard and completely.

Does this ring bells for anyone?  What should I look for when I get a
few hours again to waste?

Cheers,
-- 
Jacques A. Vidrine [EMAIL PROTECTED]  http://www.celabo.org/
NTT/Verio SME  . FreeBSD UNIX .   Heimdal Kerberos
[EMAIL PROTECTED] .  [EMAIL PROTECTED]  .  [EMAIL PROTECTED]

Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-RC #1: Sat Dec  7 19:30:52 CST 2002
[EMAIL PROTECTED]:/spare1/obj/usr/src/sys/GENERIC
Preloaded elf kernel /boot/kernel/kernel at 0xc06a4000.
Preloaded elf module /boot/kernel/acpi.ko at 0xc06a40a8.
Calibrating clock(s) ... TSC clock: 1194486067 Hz, i8254 clock: 1192995 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254  frequency 1193182 Hz
CLK_USE_TSC_CALIBRATION not specified - using old calibration method
Timecounter TSC  frequency 1194678840 Hz
CPU: AMD Athlon(tm) Processor (1194.68-MHz 686-class CPU)
  Origin = AuthenticAMD  Id = 0x661  Stepping = 1
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
  AMD Features=0xc044RSVD,AMIE,DSP,3DNow!
Data TLB: 32 entries, fully associative
Instruction TLB: 16 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L2 internal cache: 256 kbytes, 64 bytes/line, 1 lines/tag, 8-way associative
real memory  = 1073676288 (1023 MB)
Physical memory chunk(s):
0x1000 - 0x0009efff, 647168 bytes (158 pages)
0x006cb000 - 0x3ffe7fff, 1066520576 bytes (260381 pages)
avail memory = 1036201984 (988 MB)
bios32: Found BIOS32 Service Directory header at 0xc00f7440
bios32: Entry = 0xfd6a0 (c00fd6a0)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xfd6a0+0x120
pnpbios: Found PnP BIOS data at 0xc00f7490
pnpbios: Entry = f:9ea2  Rev = 1.0
Other BIOS signatures found:
Initializing GEOMetry subsystem
random: entropy source
mem: memory  I/O
Pentium Pro MTRR support enabled
null: null device, zero device
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: PTLTDRSDT   on motherboard
ACPI-0625: *** Info: GPE Block0 defined as GPE0 to GPE15
pci_open(1):mode 1 addr port (0x0cf8) is 0x80008004
pci_open(1a):   mode1res=0x8000 (0x8000)
pci_cfgcheck:   device 0 [class=06] [hdr=00] is there (id=700c1022)
acpi0: power button is handled as a fixed feature programming model.
acpi0: sleep button is handled as a fixed feature programming model.
ACPI timer looks BAD  min = 1, max = 4, width = 4
ACPI timer looks BAD  min = 1, max = 4, width = 4
ACPI timer looks GOOD min = 1, max = 3, width = 3
ACPI timer looks GOOD min = 1, max = 3, width = 3
ACPI timer looks GOOD min = 1, max = 2, width = 2
ACPI timer looks GOOD min = 1, max = 3, width = 3
ACPI timer looks GOOD min = 1, max = 3, width = 3
ACPI timer looks BAD  min = 1, max = 4, width = 4
ACPI timer looks GOOD min = 1, max = 3, width = 3
ACPI timer looks BAD  min = 1, max = 4, width = 4
Timecounter ACPI-safe  frequency 3579545 Hz
acpi_timer0: 24-bit timer at 3.579545MHz port 0x8008-0x800b on acpi0
acpi_cpu0: CPU on acpi0
acpi_cpu1: CPU on acpi0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0x8080-0x80ff,0x8000-0x807f,0xcf8-0xcff iomem 
0xdc000-0xd on acpi0
 initial configuration 
\\_SB_.PCI0.ISA_.LNKA irq  10: [  3  5 10 11] low,level,sharable 0.13.0
\\_SB_.PCI0.ISA_.LNKB irq  

Re: Sound familiar? 5.0-RC hangs on dual athlon

2002-12-08 Thread walt
Jacques A. Vidrine wrote:

Hello All,

I finally managed to put some time aside to redo my main
development/desktop machine to run FreeBSD 5.0.  (I've been running
5.x on my laptop for some months.)  I had to retreat back to 4.7
because I could not get through some simple tasks without the system
hanging.  The system is a dual Athlon box with 1 GB RAM.  The dmesg
output is below.

At first the system hung while I was building GNOME 2.0 and restoring
some files from tape.  It wasn't _completely_ hung:  I could switch
VTYs, and enter new commands (though it might take tens of seconds to
echo my typing...


I seem to remember something similar from a few months ago that
affected machines with lots of RAM because of something to do
with high order address bits.  I thought it got fixed, but I can't
really recall.

Was it an Athlon problem, or a gcc problem, or both?  Hmpf, can't
remember!

Anyone else think this might be the same thing?



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Familiar?

2002-10-07 Thread M. Warner Losh

cc -O -pipe -mcpu=pentiumpro -DLIBC_SCCS -I/dell/imp/p4/newcard/src/lib/libkvm  -c 
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c -o kvm_proc.o
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c: In function `kvm_proclist':
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:327: `KI_MTXBLOCK' undeclared (first 
use in this function)
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:327: (Each undeclared identifier is 
reported only once
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:327: for each function it appears in.)
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:328: structure has no member named 
`td_mtxname'
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:330: structure has no member named 
`td_mtxname'
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:331: structure has no member named 
`ki_mtxname'
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:331: `MTXNAMELEN' undeclared (first use 
in this function)
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:332: structure has no member named 
`ki_mtxname'
/dell/imp/p4/newcard/src/lib/libkvm/kvm_proc.c:359: `SMTX' undeclared (first use in 
this function)
*** Error code 1

This is with last night's sources.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



This look familiar to anyone? (bug in 4.11 maybe)

2001-07-24 Thread Julian Elischer


I know this is not a -current problem, but if it was fixed by someone they
are likely to be reading here, and not in -stable..


We have a hybrid (4.11+patches) kernel that sometimes crashes.
The crash always has teh same symptoms and I'm hoping that 
they look familiar to someone...

The message is below, followed by analysis.

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xe6b95cc8
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc01846d9
stack pointer   = 0x10:0xc954de64
frame pointer   = 0x10:0xc954de84
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 10326 (qftListener)
interrupt mask  = none
trap number = 12


In a VFS operation, %ecx get's corrupted (maybe from an interrupt?)
betweeen the instruction where it's loaded with a constant,
and the instruction where it's used...  It'always the same instruction,
though often in DIFFERENT VFS instructions (fsync, bwrite so far)

the trap frame  usually looks like:

#4  0xc0251813 in trap (frame={tf_fs = 0x10, tf_es = 0x10, tf_ds = 0x10,
tf_edi = 0x0, tf_esi = 0x1, tf_ebp = 0xc954de84, 
  tf_isp = 0xc954de50, tf_ebx = 0xc27d6d80, tf_edx = 0xc1344600,
tf_ecx = 0xc96145b2, tf_eax = 0xc954de78, tf_trapno = 0xc, 
  tf_err = 0x0, tf_eip = 0xc01846d9, tf_cs = 0x8, tf_eflags = 0x10286,
tf_esp = 0xc954de78, tf_ss = 0xc27d6d80})
at /usr/src/sys/i386/i386/trap.c:443
#5  0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923
#6  0xc0189be2 in vop_stdbwrite (ap=0xc954deb4) at
/usr/src/sys/kern/vfs_default.c:319


the code there looks like:

(kgdb) up 5
#5  0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923
923 rc = VCALL(vp, VOFFSET(vop_strategy), a);
(kgdb) list
918 struct vop_strategy_args a;
919 int rc;
920 a.a_desc = VDESC(vop_strategy);
921 a.a_vp = vp;
922 a.a_bp = bp;
923 rc = VCALL(vp, VOFFSET(vop_strategy), a); ---here
924 return (rc);
925 }
926 struct vop_print_args {
927 struct vnodeop_desc *a_desc;

In Assembler:

0xc01846cc bwrite+460:mov0xc029dcc0,%ecx
0xc01846d2 bwrite+466:mov0x18(%eax),%edx
0xc01846d5 bwrite+469:lea0xfff4(%ebp),%eax
0xc01846d8 bwrite+472:push   %eax
0xc01846d9 bwrite+473:mov(%edx,%ecx,4),%eax  **POW**
0xc01846dc bwrite+476:call   *%eax
0xc01846de bwrite+478:add$0x4,%esp
0xc01846e1 bwrite+481:mov0xfff0(%ebp),%eax

looking at the regs,
dx = 0xc1344600,
cx = 0xc96145b2,
and 
C1344600+(4*C96145B2) = 3E6B95CC8
the lower 32 bits of which is the same as the fault address

but in the  code above we see that %cx was just loaded from 
location 0xc029dcc0 which contains:
(kgdb) x/x 0xc029dcc0 
0xc029dcc0 vop_strategy_desc: 0x12

0x12 is the correct offset for a strategy call.

so cx got corrupted between the instruction at 0xc01846cc
and that at 0xc01846d9.

Note that the contents of cx (0xc96145b2) is an address
somewhat higher than the kernel stack at the time in question.
a dump of ram in that area shows:
(kgdb) x/64xw 0xc96145a0
0xc96145a0: 0xc954e900  0xc9709c00  0x  0xc96145a8
0xc96145b0:[0xc9580660] 0xc95c7370  0xc04d7504  0xc04d47d4
0xc96145c0: 0xaa26  0x0020  0x  0x
0xc96145d0: 0xfc812c38  0x0002  0x00040010  0x0020
0xc96145e0: 0x  0x  0x  0x
0xc96145f0: 0x  0xc9636a40  0x0001fc93  0x
0xc9614600: 0xc02ed7c0  0xc95b4120  0x  0xc9614608
0xc9614610: 0x  0xc948  0x  0xc9614618
0xc9614620: 0x3f5b  0x0003  0x  0x
0xc9614630: 0xfe37c115  0x2188  0x000e  0x
0xc9614640: 0x  0x  0x  0x
0xc9614650: 0x  0x  0x  0x
0xc9614660: 0xc9722ae0  0xc961c600  0x  0xc9614668
0xc9614670: 0xc9690660  0xc97091f0  0x  0xc9614678
0xc9614680: 0xcabf  0x0012  0x  0x
0xc9614690: 0xfc8189f2  0x0002  0x001d  0x

This is obviously  SOMETHING, but what? And why does %cx point HALF WAY
THROUGH an obvious 32 bit pointer?

Thoughts of hardware problems do come to mind... but..


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: This look familiar to anyone? (bug in 4.11 maybe)

2001-07-24 Thread John Baldwin


On 24-Jul-01 Julian Elischer wrote:
 
 In a VFS operation, %ecx get's corrupted (maybe from an interrupt?)
 betweeen the instruction where it's loaded with a constant,
 and the instruction where it's used...  It'always the same instruction,
 though often in DIFFERENT VFS instructions (fsync, bwrite so far)
 
 the trap frame  usually looks like:
 
#4  0xc0251813 in trap (frame={tf_fs = 0x10, tf_es = 0x10, tf_ds = 0x10,
 tf_edi = 0x0, tf_esi = 0x1, tf_ebp = 0xc954de84, 
   tf_isp = 0xc954de50, tf_ebx = 0xc27d6d80, tf_edx = 0xc1344600,
 tf_ecx = 0xc96145b2, tf_eax = 0xc954de78, tf_trapno = 0xc, 
   tf_err = 0x0, tf_eip = 0xc01846d9, tf_cs = 0x8, tf_eflags = 0x10286,
 tf_esp = 0xc954de78, tf_ss = 0xc27d6d80})
 at /usr/src/sys/i386/i386/trap.c:443
#5  0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923
#6  0xc0189be2 in vop_stdbwrite (ap=0xc954deb4) at
 /usr/src/sys/kern/vfs_default.c:319
 
 
 the code there looks like:
 
 (kgdb) up 5
#5  0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923
 923   rc = VCALL(vp, VOFFSET(vop_strategy), a);
 (kgdb) list
 918   struct vop_strategy_args a;
 919   int rc;
 920   a.a_desc = VDESC(vop_strategy);
 921   a.a_vp = vp;
 922   a.a_bp = bp;
 923   rc = VCALL(vp, VOFFSET(vop_strategy), a); ---here
 924   return (rc);
 925   }
 926   struct vop_print_args {
 927   struct vnodeop_desc *a_desc;
 
 In Assembler:
 
 0xc01846cc bwrite+460:  mov0xc029dcc0,%ecx
 0xc01846d2 bwrite+466:  mov0x18(%eax),%edx
 0xc01846d5 bwrite+469:  lea0xfff4(%ebp),%eax
 0xc01846d8 bwrite+472:  push   %eax
 0xc01846d9 bwrite+473:  mov(%edx,%ecx,4),%eax  **POW**
 0xc01846dc bwrite+476:  call   *%eax
 0xc01846de bwrite+478:  add$0x4,%esp
 0xc01846e1 bwrite+481:  mov0xfff0(%ebp),%eax
 
 looking at the regs,
 dx = 0xc1344600,
 cx = 0xc96145b2,
 and 
 C1344600+(4*C96145B2) = 3E6B95CC8
 the lower 32 bits of which is the same as the fault address
 
 but in the  code above we see that %cx was just loaded from 
 location 0xc029dcc0 which contains:
 (kgdb) x/x 0xc029dcc0 
 0xc029dcc0 vop_strategy_desc:   0x12
 
 0x12 is the correct offset for a strategy call.
 
 so cx got corrupted between the instruction at 0xc01846cc
 and that at 0xc01846d9.

Very weird.  Note that traps and interrupts will save %ecx in the trapframe,
so you aren't going to end up with those getting corrupted unless we somehow
screw up ecx after popping the frame (or before pushing it).

 Note that the contents of cx (0xc96145b2) is an address
 somewhat higher than the kernel stack at the time in question.

Could be a stack of some other thread.  All the 0xc9X addresses are
pointers to automatic variables.  The 0xc0[2-4]X are return addresses.

 a dump of ram in that area shows:
 (kgdb) x/64xw 0xc96145a0
 0xc96145a0:   0xc954e900  0xc9709c00  0x  0xc96145a8
 0xc96145b0:[0xc9580660]   0xc95c7370  0xc04d7504  0xc04d47d4
 0xc96145c0:   0xaa26  0x0020  0x  0x
 0xc96145d0:   0xfc812c38  0x0002  0x00040010  0x0020
 0xc96145e0:   0x  0x  0x  0x
 0xc96145f0:   0x  0xc9636a40  0x0001fc93  0x
 0xc9614600:   0xc02ed7c0  0xc95b4120  0x  0xc9614608
 0xc9614610:   0x  0xc948  0x  0xc9614618
 0xc9614620:   0x3f5b  0x0003  0x  0x
 0xc9614630:   0xfe37c115  0x2188  0x000e  0x
 0xc9614640:   0x  0x  0x  0x
 0xc9614650:   0x  0x  0x  0x
 0xc9614660:   0xc9722ae0  0xc961c600  0x  0xc9614668
 0xc9614670:   0xc9690660  0xc97091f0  0x  0xc9614678
 0xc9614680:   0xcabf  0x0012  0x  0x
 0xc9614690:   0xfc8189f2  0x0002  0x001d  0x
 
 This is obviously  SOMETHING, but what? And why does %cx point HALF WAY
 THROUGH an obvious 32 bit pointer?
 
 Thoughts of hardware problems do come to mind... but..

Is it just one machine that does this reliably?

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



This look familiar to anyone? (bug in 4.11 maybe)

2001-07-24 Thread Julian Elischer



I know this is not a -current problem, but if it was fixed by someone they
are likely to be reading here, and not in -stable..


We have a hybrid (4.11+patches) kernel that sometimes crashes.
The crash always has teh same symptoms and I'm hoping that 
they look familiar to someone...

The message is below, followed by analysis.

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xe6b95cc8
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc01846d9
stack pointer   = 0x10:0xc954de64
frame pointer   = 0x10:0xc954de84
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 10326 (qftListener)
interrupt mask  = none
trap number = 12


In a VFS operation, %ecx get's corrupted (maybe from an interrupt?)
betweeen the instruction where it's loaded with a constant,
and the instruction where it's used...  It'always the same instruction,
though often in DIFFERENT VFS instructions (fsync, bwrite so far)

the trap frame  usually looks like:

#4  0xc0251813 in trap (frame={tf_fs = 0x10, tf_es = 0x10, tf_ds = 0x10,
tf_edi = 0x0, tf_esi = 0x1, tf_ebp = 0xc954de84, 
  tf_isp = 0xc954de50, tf_ebx = 0xc27d6d80, tf_edx = 0xc1344600,
tf_ecx = 0xc96145b2, tf_eax = 0xc954de78, tf_trapno = 0xc, 
  tf_err = 0x0, tf_eip = 0xc01846d9, tf_cs = 0x8, tf_eflags = 0x10286,
tf_esp = 0xc954de78, tf_ss = 0xc27d6d80})
at /usr/src/sys/i386/i386/trap.c:443
#5  0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923
#6  0xc0189be2 in vop_stdbwrite (ap=0xc954deb4) at
/usr/src/sys/kern/vfs_default.c:319


the code there looks like:

(kgdb) up 5
#5  0xc01846d9 in bwrite (bp=0xc27d6d80) at vnode_if.h:923
923 rc = VCALL(vp, VOFFSET(vop_strategy), a);
(kgdb) list
918 struct vop_strategy_args a;
919 int rc;
920 a.a_desc = VDESC(vop_strategy);
921 a.a_vp = vp;
922 a.a_bp = bp;
923 rc = VCALL(vp, VOFFSET(vop_strategy), a); ---here
924 return (rc);
925 }
926 struct vop_print_args {
927 struct vnodeop_desc *a_desc;

In Assembler:

0xc01846cc bwrite+460:mov0xc029dcc0,%ecx
0xc01846d2 bwrite+466:mov0x18(%eax),%edx
0xc01846d5 bwrite+469:lea0xfff4(%ebp),%eax
0xc01846d8 bwrite+472:push   %eax
0xc01846d9 bwrite+473:mov(%edx,%ecx,4),%eax  **POW**
0xc01846dc bwrite+476:call   *%eax
0xc01846de bwrite+478:add$0x4,%esp
0xc01846e1 bwrite+481:mov0xfff0(%ebp),%eax

looking at the regs,
dx = 0xc1344600,
cx = 0xc96145b2,
and 
C1344600+(4*C96145B2) = 3E6B95CC8
the lower 32 bits of which is the same as the fault address

but in the  code above we see that %cx was just loaded from 
location 0xc029dcc0 which contains:
(kgdb) x/x 0xc029dcc0 
0xc029dcc0 vop_strategy_desc: 0x12

0x12 is the correct offset for a strategy call.

so cx got corrupted between the instruction at 0xc01846cc
and that at 0xc01846d9.

Note that the contents of cx (0xc96145b2) is an address
somewhat higher than the kernel stack at the time in question.
a dump of ram in that area shows:
(kgdb) x/64xw 0xc96145a0
0xc96145a0: 0xc954e900  0xc9709c00  0x  0xc96145a8
0xc96145b0:[0xc9580660] 0xc95c7370  0xc04d7504  0xc04d47d4
0xc96145c0: 0xaa26  0x0020  0x  0x
0xc96145d0: 0xfc812c38  0x0002  0x00040010  0x0020
0xc96145e0: 0x  0x  0x  0x
0xc96145f0: 0x  0xc9636a40  0x0001fc93  0x
0xc9614600: 0xc02ed7c0  0xc95b4120  0x  0xc9614608
0xc9614610: 0x  0xc948  0x  0xc9614618
0xc9614620: 0x3f5b  0x0003  0x  0x
0xc9614630: 0xfe37c115  0x2188  0x000e  0x
0xc9614640: 0x  0x  0x  0x
0xc9614650: 0x  0x  0x  0x
0xc9614660: 0xc9722ae0  0xc961c600  0x  0xc9614668
0xc9614670: 0xc9690660  0xc97091f0  0x  0xc9614678
0xc9614680: 0xcabf  0x0012  0x  0x
0xc9614690: 0xfc8189f2  0x0002  0x001d  0x

This is obviously  SOMETHING, but what? And why does %cx point HALF WAY
THROUGH an obvious 32 bit pointer?

Thoughts of hardware problems do come to mind... but..

My present line of attack is to change the page-fault handler
to leave a 500 byte window untouched on the stack (except for the 
frame) so that I can try see if an interrupt occured
recently, and if so, what it was



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe