Hi Gavin,

sorry for the long delay.


Gavin Maltby <[EMAIL PROTECTED]> wrote:

> Hi Joerg,
>
> On 05/30/06 23:51, Joerg Schilling wrote:
> [cut]
> > kmdb: stop at cpu.AuthenticAMD.15`ao_nb_cfg
> > kmdb: target stopped at:
> > cpu.AuthenticAMD.15`ao_nb_cfg:  pushl  %ebp
> > [1]> ao_nb_cfg_add/X
> > cpu.AuthenticAMD.15`ao_nb_cfg_add:
> > cpu.AuthenticAMD.15`ao_nb_cfg_add:              a000105         
> > [1]> :c
> > PCI-device: pci1106,[EMAIL PROTECTED], hci13940
> > hci13940 is /[EMAIL PROTECTED],0/pci1106,[EMAIL PROTECTED]
> > 
> > Here the machine hangs infinitely and I am unable to go into 
> > kmdb via ~# from tip.
>
> OK let's try divide-and-conquer at a higher level.  We'll stop the
> cpu.AuthenticAMD.15 module from initializing thereby forcing us
> to fall back to the generic cpu module (as used on Intel cpus
> and any AMD family 0xf systems not explicitly supported by the
> model range we specify).
>
> boot kmdb -d as before
>
> set breakpoint at module init:
>
> ::bp cpu.AuthenticAMD.15`ao_init
>
> when the breakpoint triggers change the model limit from 0x40 to 0x0
>
> ao_model_limit/W0
>
> You'll hit that breakpoint for each cpu - repeat for each.
>
> If that fails as before then we know I'm barking up the wrong tree
> with our NB config stuff.  If it does boot then we can do some more
> thinking on the next steps.  I'd suggest maybe we nop out the
> call to ao_pcicfg_write made from ao_nb_cfg - then we can see what
> value the BIOS had installed (ao_nb_cfg reads and preserves that)
> and not overwrite it at all.

I did try top change the ao_model_limit var to 0 and did get the following:

SunOS Release 5.11 Version schily35 32-bit
Copyright 1983-2006 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
DEBUG enabled
features: 
1027fdf<cpuid,nx,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,pge,mtrr,msr,tsc,lgpg>
Using default device instance data
Loaded modules: [ specfs cpu.AuthenticAMD.15 ]
kmdb: stop at cpu.AuthenticAMD.15`ao_init
kmdb: target stopped at:
cpu.AuthenticAMD.15`ao_init:    pushl  %ebp
[0]> ao_model_limit/W0
cpu.AuthenticAMD.15`ao_model_limit:             0x40            =       0x0
[0]> :c
cpuid 0: initialized cpumod: cpu.AuthenticAMD.15
mem = 1047036K (0x3fe7f000)
root nexus = i86pc
pseudo0 at root
pseudo0 is /pseudo
scsi_vhci0 at root
scsi_vhci0 is /scsi_vhci
isa0 at root
ramdisk0 at root
ramdisk0 is /ramdisk
SMBIOS v2.3 loaded (1417 bytes)pseudo-device: dld0
dld0 is /pseudo/[EMAIL PROTECTED]
pci0 at root: space 0 offset 0
pci0 is /[EMAIL PROTECTED],0
PCI-device: pci1106,[EMAIL PROTECTED], pci_pci0
pci_pci0 is /[EMAIL PROTECTED],0/pci1106,[EMAIL PROTECTED]
ISA-device: asy0
asy0 is /isa/[EMAIL PROTECTED],3f8
8042 device:  [EMAIL PROTECTED], kb8042 # 0
kb80420 is /isa/[EMAIL PROTECTED],60/[EMAIL PROTECTED]
kb8042 #0: version 1.65 (05/11/02)
boot scratch memory used: 0x56b138
PCI-device: pci1462,[EMAIL PROTECTED],4, ehci0
ehci0 is /[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],4
PCI-device: pci1462,[EMAIL PROTECTED], uhci0
uhci0 is /[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED]
PCI-device: pci1462,[EMAIL PROTECTED],1, uhci1
uhci1 is /[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],1
PCI-device: pci1462,[EMAIL PROTECTED],2, uhci2
uhci2 is /[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],2
PCI-device: pci1106,[EMAIL PROTECTED],3, uhci3
uhci3 is /[EMAIL PROTECTED],0/pci1106,[EMAIL PROTECTED],3
cpu0: x86 (AuthenticAMD family 15 model 5 step 1 clock 1600 MHz)
cpu0: AMD Opteron(tm) Processor 242
cpu1: x86 (AuthenticAMD family 15 model 5 step 1 clock 1600 MHz)
cpu1: AMD Opteron(tm) Processor 242
Loaded modules: [ uppc ufs uhci ip usba pcplusmp sctp ]
kmdb: stop at cpu.AuthenticAMD.15`ao_init
kmdb: target stopped at:
cpu.AuthenticAMD.15`ao_init:    pushl  %ebp
[1]> ao_model_limit/W0
cpu.AuthenticAMD.15`ao_model_limit:             0               =       0x0
[1]> :c
cpuid 1: initialized cpumod: cpu.AuthenticAMD.15
PCI-device: pci1106,[EMAIL PROTECTED], hci13940
hci13940 is /[EMAIL PROTECTED],0/pci1106,[EMAIL PROTECTED]
USB 2.0 device (usb7cc,340) operating at hi speed (USB 2.x) on USB 2.0 root 
hub: [EMAIL PROTECTED], scsa2usb0 at bus address 2
                Ltd Winter Ver1.3    657590181666
scsa2usb0 is /[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],4/[EMAIL PROTECTED]
/[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],4/[EMAIL PROTECTED] (scsa2usb0) 
online
sd0 at scsa2usb0: target 0 lun 0
SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x44c546f9.0x3ae98d7 (0x1a20fcad1e)
PLATFORM: i86pc, CSN: -, HOSTNAME: 
SOURCE: SunOS, REV: 5.11 schily35
DESC: Errors have been detected that require a reboot to ensure system
integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved

ereport.cpu.amd.nb.wdog ena=1a20fc614700001 detector=[ version=0 scheme="hc"
 hc-list=[...] ] bank-status=b200000000070f0f bank-number=4 addr=800008248
 addr-valid=0 ip=0 privileged=1


panic[cpu0]/thread=c7898600: Unrecoverable Machine-Check Exception

c78f1e38 unix:cmi_mca_trap+46 (c78f1e44)
c78f1e44 unix:mcetrap+59 (1b0, c78f0000, fe93)
c78f1ef4 unix:atomic_cas_64+1a (c7898600, 0)
c78f1f90 unix:trap+11da (c78f1fa4, 0, 0)

Are you sure that the 

ao_model_limit/W0

command did the right thing?

BTW: in most cases, I do not see this stack trace as the kernel completely dies 
this way:

...
cpu0: x86 (AuthenticAMD family 15 model 5 step 1 clock 1600 MHz)
cpu0: AMD Opteron(tm) Processor 242
cpu1: x86 (AuthenticAMD family 15 model 5 step 1 clock 1600 MHz)
cpu1: AMD Opteron(tm) Processor 242
cpuid 1: initialized cpumod: cpu.AuthenticAMD.15
USB 2.0 device (usb7cc,340) operating at hi speed (USB 2.x) on USB 2.0 root 
hub: [EMAIL PROTECTED], scsa2usb0 at bus address 2
                Ltd Winter Ver1.3    657590181666
scsa2usb0 is /[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],4/[EMAIL PROTECTED]
/[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],4/[EMAIL PROTECTED] (scsa2usb0) 
online
sd0 at scsa2usb0: target 0 lun 0
sd0 is /[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],4/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0
/[EMAIL PROTECTED],0/pci1462,[EMAIL PROTECTED],4/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd0) online
sd1 at scsa2usb0: target 0 lun 1
sd1 is /[EMAIL PROTECTED],0/


and here hits the watchdog reset.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
       [EMAIL PROTECTED]                (uni)  
       [EMAIL PROTECTED]     (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
_______________________________________________
opensolaris-discuss mailing list
[email protected]

Reply via email to