RE: Recent -current hangs on Tyan S2460 before finishing boot

2003-11-13 Thread John Baldwin

On 13-Nov-2003 Benjamin Lewis wrote:
 Hello,
 
 I'm having trouble getting recent (post- device apic, pre- turnstile) 
 kernels to boot on my Tyan S2460 (Tiger MP) system with dual AMD 
 Athlons.  What happens is that the machine seems to get stuck soon
 after the Waiting for SCSI devices to settle message is printed -- it
 appears to be willing to wait forever rather than the SCSI_DELAY time.
 
 Disabling ACPI in the BIOS has no apparent effect on the hang.  Using
 SCHED_4BSD or SCHED_ULE likewise makes no difference.  I've been 
 following the current@ list hoping to see someone else report a problem
 similar to mine but haven't seen anything yet.
 
 I do have a serial console attached to the machine and DDB enabled so
 I'm able to provide some information and get more if needed.  I'm
 including a copy of the boot messages from my last attempt to boot
 FreeBSD 5.1-CURRENT #2: Tue Nov 11 17:35:40 EST 2003 which was 
 cvsup'ed shortly prior to the build date.  Included in the messages are
 the output of ps and trace once I broke into ddb.
 
 I'm also including output from acpidump -t and mptable -verbose 
 since I've seen that information requested in the past.
 
 Some details about the system that may be pertinent:
   1. It has two 1Ghz Athlon Thunderbird (Not MP) processors.
  That hasn't been a problem so far.
   2. The BIOS is version 1.04 (latest is 1.05).  The last time I
  tried updating to 1.05 (some time ago) I saw lots of error
  messagess complaining about undefined ACPI stuff so I
  reverted.
   3. There is a Tekram 390F (I think that's the model -- it uses
  the sym driver) and an Adaptec 3944 SCSI controller.  A
  single internal SCSI drive is connected to the Tekram and 10
  external drives are connected to the two ports on the 3944.
  The external drives are configured as a Vinum Raid10 array.
  There's also a single IDE drive connected to one of the
  built-in IDE controllers.
 
 Please let me know if there is anything more you want to know.
 
 Thanks,

Can you do a 'show intrcnt' from the ddb prompt?  It sounds like you
may be getting an interrupt storm due to a mis-routed PCI interrupt.

Actually, I think the problem is in the ata driver.  Well, there are
possibly bugs in the interrupt code in that interrupts that don't exist
in the mptable (IRQ's 11 and 15) still get created, but, the fact that
the mptable has no IRQ 15 to me means that there is no IRQ 15 and thus
there should not be an ata1.  Note that in your dmesg, ata1 does say
that it doesn't do DMA because it has been disabled.  Perhaps the ata
driver needs to disable ata1 altogether on that chipset if it sees that
condition.  My guess is that the ata driver is waiting forever for an
interrupt from ata1 which is never going to arrive, hence the hang.
Do you have a boot -v dmesg from a working kernel?

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Recent -current hangs on Tyan S2460 before finishing boot

2003-11-13 Thread Benjamin Lewis
On November 13 2003, John Baldwin wrote:

 
 On 13-Nov-2003 Benjamin Lewis wrote:

[trouble booting -current on Tyan Tiger MP]

 Can you do a 'show intrcnt' from the ddb prompt?  It sounds like you
 may be getting an interrupt storm due to a mis-routed PCI interrupt.

I didn't see any unexpectedly large numbers, but IRQ 15 isn't in the
list:

[...]
GEOM: create disk ad0 dp=0xc2e19370
ad0: 57241MB WDC WD600AB-00BVA0 [116301/16/63] at ata0-master UDMA100
Waiting 10 seconds for SCSI devices to settle
[halt - sent]
Stopped at  siointr1+0xec:  jmp siointr1+0x220
db show intrcnt
irq4: sio0  2
irq8: rtc   1725
irq13: npx0 1
stray irq13 1
irq14: ata0 17
irq19: ohci0 ahc0   1
irq0: clk   1
db cont
[halt - sent]
Stopped at  siointr1+0xec:  jmp siointr1+0x220
db show intrcnt
irq4: sio0  4
irq8: rtc   3407
irq13: npx0 1
stray irq13 1
irq14: ata0 17
irq19: ohci0 ahc0   1
irq0: clk   1
db cont
[halt - sent]
Stopped at  siointr1+0xec:  jmp siointr1+0x220
db show intrcnt
irq4: sio0  6
irq8: rtc   7418
irq13: npx0 1
stray irq13 1
irq14: ata0 17
irq19: ohci0 ahc0   1
irq0: clk   1
db 

 Actually, I think the problem is in the ata driver.  Well, there are
 possibly bugs in the interrupt code in that interrupts that don't exist
 in the mptable (IRQ's 11 and 15) still get created, but, the fact that
 the mptable has no IRQ 15 to me means that there is no IRQ 15 and thus
 there should not be an ata1.  Note that in your dmesg, ata1 does say
 that it doesn't do DMA because it has been disabled.  Perhaps the ata
 driver needs to disable ata1 altogether on that chipset if it sees that
 condition.  My guess is that the ata driver is waiting forever for an
 interrupt from ata1 which is never going to arrive, hence the hang.
 Do you have a boot -v dmesg from a working kernel?

Unfortunately, I'll have to wait until I get home to try enabling the
ata1 controller in the BIOS.  dmesg output from a boot -v with my working
kernel is below.

Thanks for looking at this!

-Ben

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #1: Sat Oct 18 13:47:46 EST 2003
[EMAIL PROTECTED]:/export/obj/usr/src-all/current/src/sys/AKIRA
Preloaded elf kernel /boot/kernel.ok/kernel at 0xc086.
Preloaded elf module /boot/kernel.ok/snd_pcm.ko at 0xc0860254.
Preloaded elf module /boot/kernel.ok/snd_es137x.ko at 0xc0860304.
Preloaded elf module /boot/kernel.ok/usb.ko at 0xc08603b8.
Preloaded elf module /boot/kernel.ok/ums.ko at 0xc0860464.
Preloaded elf module /boot/kernel.ok/agp.ko at 0xc0860510.
Preloaded elf module /boot/kernel.ok/random.ko at 0xc08605bc.
Preloaded elf module /boot/kernel.ok/acpi.ko at 0xc086066c.
Calibrating clock(s) ... i8254 clock: 1192965 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254 frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 138815 Hz
CPU: AMD Athlon(tm) Processor (1000.04-MHz 686-class CPU)
  Origin = AuthenticAMD  Id = 0x642  Stepping = 2
  
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
  AMD Features=0xc044RSVD,AMIE,DSP,3DNow!
Data TLB: 24 entries, fully associative
Instruction TLB: 16 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L2 internal cache: 256 kbytes, 64 bytes/line, 1 lines/tag, 8-way associative
real memory  = 268369920 (255 MB)
Physical memory chunk(s):
0x1000 - 0x0009efff, 647168 bytes (158 pages)
0x0010 - 0x003f, 3145728 bytes (768 pages)
0x00c29000 - 0x0fb3dfff, 250695680 bytes (61205 pages)
avail memory = 251080704 (239 MB)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 - irq 0
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  1, version: 0x00040010, at 0xfee0
 cpu1 (AP):  apic id:  0, version: 0x00040010, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec0
bios32: Found BIOS32 Service Directory header at 0xc00f7480
bios32: Entry = 0xfd6c0 (c00fd6c0)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xfd6c0+0x120
pnpbios: Found PnP BIOS data at 0xc00f74d0
pnpbios: Entry = f:9ece  Rev = 1.0
Other BIOS signatures found:
null: null device, zero device
mem: memory  I/O
Pentium Pro MTRR support enabled
random: entropy source
SMP: CPU0 bsp_apic_configure():
 lint0: 0x00010700 lint1: 0x0400 TPR: 0x SVR: 0x01ff
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: PTLTDRSDT   on motherboard
pci_open(1):mode 

Re: Recent -current hangs on Tyan S2460 before finishing boot

2003-11-13 Thread John Baldwin

On 13-Nov-2003 Benjamin Lewis wrote:
 On November 13 2003, John Baldwin wrote:
 
 
 On 13-Nov-2003 Benjamin Lewis wrote:
 
 [trouble booting -current on Tyan Tiger MP]
 
 Can you do a 'show intrcnt' from the ddb prompt?  It sounds like you
 may be getting an interrupt storm due to a mis-routed PCI interrupt.
 
 I didn't see any unexpectedly large numbers, but IRQ 15 isn't in the
 list:
 
 [...]
 GEOM: create disk ad0 dp=0xc2e19370
 ad0: 57241MB WDC WD600AB-00BVA0 [116301/16/63] at ata0-master UDMA100
 Waiting 10 seconds for SCSI devices to settle
 [halt - sent]
 Stopped at  siointr1+0xec:  jmp siointr1+0x220
 db show intrcnt
 irq4: sio0  2
 irq8: rtc   1725
 irq13: npx0 1
 stray irq13 1
 irq14: ata0 17
 irq19: ohci0 ahc0   1
 irq0: clk   1
 db cont
 [halt - sent]
 Stopped at  siointr1+0xec:  jmp siointr1+0x220
 db show intrcnt
 irq4: sio0  4
 irq8: rtc   3407
 irq13: npx0 1
 stray irq13 1
 irq14: ata0 17
 irq19: ohci0 ahc0   1
 irq0: clk   1
 db cont
 [halt - sent]
 Stopped at  siointr1+0xec:  jmp siointr1+0x220
 db show intrcnt
 irq4: sio0  6
 irq8: rtc   7418
 irq13: npx0 1
 stray irq13 1
 irq14: ata0 17
 irq19: ohci0 ahc0   1
 irq0: clk   1
 db 

You aren't getting any clock interrupts it seems.  Very weird.  Can
you try using NO_MIXED_MODE instead?  I still think the ata1 is bogus,
but I don't think that is causing your hang now.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Recent -current hangs on Tyan S2460 before finishing boot

2003-11-13 Thread Benjamin Lewis
On November 13 2003, John Baldwin wrote:

  On 13-Nov-2003 Benjamin Lewis wrote:
  [trouble booting -current on Tyan Tiger MP]

  Can you do a 'show intrcnt' from the ddb prompt?  It sounds like you
  may be getting an interrupt storm due to a mis-routed PCI interrupt.


  db show intrcnt
[...]
  irq0: clk   1

 You aren't getting any clock interrupts it seems.  Very weird.  Can
 you try using NO_MIXED_MODE instead?  I still think the ata1 is bogus,
 but I don't think that is causing your hang now.

NO_MIXED_MODE did the trick -- an otherwise identical kernel to the one
that was hanging now boots to multiuser successfully.

Thank you so much for your help!

-Ben

-- 
Benjamin Lewis [EMAIL PROTECTED]
Database Analyst/Programmer
IT Security and Policy - Identity and Access Management
Purdue University

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Recent -current hangs on Tyan S2460 before finishing boot

2003-11-12 Thread Benjamin Lewis
Hello,

I'm having trouble getting recent (post- device apic, pre- turnstile) 
kernels to boot on my Tyan S2460 (Tiger MP) system with dual AMD 
Athlons.  What happens is that the machine seems to get stuck soon
after the Waiting for SCSI devices to settle message is printed -- it
appears to be willing to wait forever rather than the SCSI_DELAY time.

Disabling ACPI in the BIOS has no apparent effect on the hang.  Using
SCHED_4BSD or SCHED_ULE likewise makes no difference.  I've been 
following the current@ list hoping to see someone else report a problem
similar to mine but haven't seen anything yet.

I do have a serial console attached to the machine and DDB enabled so
I'm able to provide some information and get more if needed.  I'm
including a copy of the boot messages from my last attempt to boot
FreeBSD 5.1-CURRENT #2: Tue Nov 11 17:35:40 EST 2003 which was 
cvsup'ed shortly prior to the build date.  Included in the messages are
the output of ps and trace once I broke into ddb.

I'm also including output from acpidump -t and mptable -verbose 
since I've seen that information requested in the past.

Some details about the system that may be pertinent:
1. It has two 1Ghz Athlon Thunderbird (Not MP) processors.
   That hasn't been a problem so far.
2. The BIOS is version 1.04 (latest is 1.05).  The last time I
   tried updating to 1.05 (some time ago) I saw lots of error
   messagess complaining about undefined ACPI stuff so I
   reverted.
3. There is a Tekram 390F (I think that's the model -- it uses
   the sym driver) and an Adaptec 3944 SCSI controller.  A
   single internal SCSI drive is connected to the Tekram and 10
   external drives are connected to the two ports on the 3944.
   The external drives are configured as a Vinum Raid10 array.
   There's also a single IDE drive connected to one of the
   built-in IDE controllers.
Please let me know if there is anything more you want to know.

Thanks,

-Ben

Type '?' for a list of commands, 'help' for more detailed help.
OK boot -sv
-\|/-\|SMAP type=01 base= len=0009f400
SMAP type=02 base=0009f400 len=0c00
SMAP type=02 base=000e4800 len=0001b800
SMAP type=01 base=0010 len=0fef
SMAP type=03 base=0fff len=fc00
SMAP type=04 base=0c00 len=0400
SMAP type=02 base=fec0 len=0001
SMAP type=02 base=fee0 len=1000
SMAP type=02 base=fff8 len=0008
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #2: Tue Nov 11 17:35:40 EST 2003
[EMAIL PROTECTED]:/export/obj/usr/src-all/current/src/sys/AKIRA.ULE
Preloaded elf kernel /boot/kernel/kernel at 0xc089d000.
Preloaded elf module /boot/kernel/linux.ko at 0xc089d250.
Preloaded elf module /boot/kernel/snd_pcm.ko at 0xc089d2fc.
Preloaded elf module /boot/kernel/snd_es137x.ko at 0xc089d3a8.
Preloaded elf module /boot/kernel/usb.ko at 0xc089d458.
Preloaded elf module /boot/kernel/ums.ko at 0xc089d500.
Preloaded elf module /boot/kernel/agp.ko at 0xc089d5a8.
Preloaded elf module /boot/kernel/random.ko at 0xc089d650.
Calibrating clock(s) ... i8254 clock: 1192972 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254 frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 141373 Hz
CPU: AMD Athlon(tm) Processor (1000.04-MHz 686-class CPU)
  Origin = AuthenticAMD  Id = 0x642  Stepping = 2
  
Features=0x183fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
  AMD Features=0xc044RSVD,AMIE,DSP,3DNow!
Data TLB: 24 entries, fully associative
Instruction TLB: 16 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L2 internal cache: 256 kbytes, 64 bytes/line, 1 lines/tag, 8-way associative
real memory  = 268369920 (255 MB)
Physical memory chunk(s):
0x1000 - 0x0009efff, 647168 bytes (158 pages)
0x0010 - 0x003f, 3145728 bytes (768 pages)
0x00c29000 - 0x0fb3dfff, 250695680 bytes (61205 pages)
avail memory = 251088896 (239 MB)
ACPI APIC Table: PTLTDAPIC  
APIC ID: physical 0, logical 0:0
APIC ID: physical 1, logical 0:1
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  1
 cpu1 (AP): APIC ID:  0
APIC: CPU 0 has ACPI ID 0
APIC: CPU 1 has ACPI ID 1
bios32: Found BIOS32 Service Directory header at 0xc00f7480
bios32: Entry = 0xfd6c0 (c00fd6c0)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xfd6c0+0x120
pnpbios: Found PnP BIOS data at 0xc00f74d0
pnpbios: Entry =