Re: Strange freezes (seems like SATA related)
Robert Hancock wrote: > Can you post the full dmesg output? What kind of drive is this? Sorry for the delay. I'm on vacation and have sporadic email access. Full dmesg is pretty long. Here SATA related section. sata_nv :00:07.0: version 3.4 ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 ACPI: PCI Interrupt :00:07.0[A] -> Link [LSA0] -> GSI 23 (level, high) -> IRQ 23 sata_nv :00:07.0: Using ADMA mode PCI: Setting latency timer of device :00:07.0 to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0xc2a16480 ctl 0xc2a164a0 bmdma 0x000158b0 irq 23 ata2: SATA max UDMA/133 cmd 0xc2a16580 ctl 0xc2a165a0 bmdma 0x000158b8 irq 23 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: SAMSUNG HD080HJ, WT100-33, max UDMA/100 ata1.00: 156301488 sectors, multi 16: LBA48 ata1.00: configured for UDMA/100 ata2: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA SAMSUNG HD080HJ WT10 PQ: 0 ANSI: 5 ata1: bounce limit 0x, segment boundary 0x, hw segs 61 sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk ACPI: PCI Interrupt Link [LSA1] enabled at IRQ 22 ACPI: PCI Interrupt :00:08.0[A] -> Link [LSA1] -> GSI 22 (level, high) -> IRQ 22 sata_nv :00:08.0: Using ADMA mode Max - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Andrew Morton wrote: > On Mon, 29 Oct 2007 09:54:27 -0700 > Max Krasnyansky <[EMAIL PROTECTED]> wrote: > >> A couple of HP xw9300 machines (dual Opterons) started freezing up. >> We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is >> alive >> (I can switch vts, etc) but everything else is dead (network, etc). >> Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. >> >> Hooked up serial console and the only error that shows up is this. >> >> ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 >> status 0x1540 next cpb count 0x0 next cpb idx 0x0 >> ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 >> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen >> ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) >> Descriptor sense data with sense descriptors (in hex): >> end_request: I/O error, dev sda, sector 8388695 >> Buffer I/O error on device sda1, logical block 1048579 >> lost page write due to I/O error on sda1 >> sd 0:0:0:0: [sda] Write Protect is off >> >> I see a bunch of those and then the box just sits there spewing this >> periodically >> >> ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 >> status 0x1540 next cpb count 0x0 next cpb idx 0x0 >> ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 >> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen >> ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out >> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) >> >> SMART selftest on the drive passed without errors. >> >> Here is how this machine looks like >> >> ... > > So this happens on more than one machine? Yep. > The kernel shouldn't freeze, so even if both machines have magically > identical hardware faults, there's a kernel bug there somewhere. > > I guess it would be useful to test a 2.6.23 kernel if poss. We've seen a > very large number of reports like this one in recent months (many of which > have not been responded to, btw) and perhaps someone has done something > about them. I may not be able to run identical workload on 2.6.23. Will try to give it a shot sometime next week. Also I've upgraded to 2.6.22.10 last week. There are a few fixes in there that may potentially affect those boxes. Max - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Andrew Morton wrote: On Mon, 29 Oct 2007 09:54:27 -0700 Max Krasnyansky [EMAIL PROTECTED] wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like ... So this happens on more than one machine? Yep. The kernel shouldn't freeze, so even if both machines have magically identical hardware faults, there's a kernel bug there somewhere. I guess it would be useful to test a 2.6.23 kernel if poss. We've seen a very large number of reports like this one in recent months (many of which have not been responded to, btw) and perhaps someone has done something about them. I may not be able to run identical workload on 2.6.23. Will try to give it a shot sometime next week. Also I've upgraded to 2.6.22.10 last week. There are a few fixes in there that may potentially affect those boxes. Max - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Robert Hancock wrote: Can you post the full dmesg output? What kind of drive is this? Sorry for the delay. I'm on vacation and have sporadic email access. Full dmesg is pretty long. Here SATA related section. sata_nv :00:07.0: version 3.4 ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23 ACPI: PCI Interrupt :00:07.0[A] - Link [LSA0] - GSI 23 (level, high) - IRQ 23 sata_nv :00:07.0: Using ADMA mode PCI: Setting latency timer of device :00:07.0 to 64 scsi0 : sata_nv scsi1 : sata_nv ata1: SATA max UDMA/133 cmd 0xc2a16480 ctl 0xc2a164a0 bmdma 0x000158b0 irq 23 ata2: SATA max UDMA/133 cmd 0xc2a16580 ctl 0xc2a165a0 bmdma 0x000158b8 irq 23 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: ATA-7: SAMSUNG HD080HJ, WT100-33, max UDMA/100 ata1.00: 156301488 sectors, multi 16: LBA48 ata1.00: configured for UDMA/100 ata2: SATA link down (SStatus 0 SControl 300) scsi 0:0:0:0: Direct-Access ATA SAMSUNG HD080HJ WT10 PQ: 0 ANSI: 5 ata1: bounce limit 0x, segment boundary 0x, hw segs 61 sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 156301488 512-byte hardware sectors (80026 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sda3 sd 0:0:0:0: [sda] Attached SCSI disk ACPI: PCI Interrupt Link [LSA1] enabled at IRQ 22 ACPI: PCI Interrupt :00:08.0[A] - Link [LSA1] - GSI 22 (level, high) - IRQ 22 sata_nv :00:08.0: Using ADMA mode Max - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Heikki Orsila wrote: On Mon, Oct 29, 2007 at 09:54:27AM -0700, Max Krasnyansky wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). I'm thinking this is not a coincidence. I was running 2.6.22.5, and looking at your problems, I just had a similar experience on tuesday.. The network was still fine after kernel errors so that I was able to login with SSH. See: http://lkml.org/lkml/2007/10/30/193 ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off With ata_piix Intel SATA I got these errors: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:68:6f:3a:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 53248 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: port is slow to respond, please be patient (Status 0xd0) ata1: device not ready (errno=-16), forcing hardreset ata1: soft resetting port ata1.00: revalidation failed (errno=-2) ata1: failed to recover some devices, retrying in 5 secs ata1: soft resetting port ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA These are two 100% different issues The only thing they have in common is that they spit out an error. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
On Mon, Oct 29, 2007 at 09:54:27AM -0700, Max Krasnyansky wrote: > A couple of HP xw9300 machines (dual Opterons) started freezing up. > We're running on 2.6.22.1 on them. Freezes a somewhere weird. > VGA console is alive > (I can switch vts, etc) but everything else is dead (network, etc). I'm thinking this is not a coincidence. I was running 2.6.22.5, and looking at your problems, I just had a similar experience on tuesday.. The network was still fine after kernel errors so that I was able to login with SSH. See: http://lkml.org/lkml/2007/10/30/193 > ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 > status 0x1540 next cpb count 0x0 next cpb idx 0x0 > ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > Descriptor sense data with sense descriptors (in hex): > end_request: I/O error, dev sda, sector 8388695 > Buffer I/O error on device sda1, logical block 1048579 > lost page write due to I/O error on sda1 > sd 0:0:0:0: [sda] Write Protect is off With ata_piix Intel SATA I got these errors: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:68:6f:3a:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 53248 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: port is slow to respond, please be patient (Status 0xd0) ata1: device not ready (errno=-16), forcing hardreset ata1: soft resetting port ata1.00: revalidation failed (errno=-2) ata1: failed to recover some devices, retrying in 5 secs ata1: soft resetting port ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > Here is how this machine looks like > > 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) > 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) > 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) > 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) > 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) > 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio > Controller (rev a2) > 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) > 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) > 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) > 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) > 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) > 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) > 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] > HyperTransport Technology Configuration > 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] > Address Map > 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM > Controller > 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] > Miscellaneous Control > 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] > HyperTransport Technology Configuration > 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] > Address Map > 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM > Controller > 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] > Miscellaneous Control > 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY > [Radeon 7000/VE] > 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 > Controller (PHY/Link) > 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet > Controller (Copper) (rev 06) > 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev > 12) > 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) > 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev > 12) > 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) > 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) > 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X > Fusion-MPT Dual Ultra320 SCSI (rev 07) > 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X > Fusion-MPT Dual Ultra320 SCSI (rev 07) > 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) > 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) > 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) > 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) > 80:01.0 Memory
Re: Strange freezes (seems like SATA related)
On Mon, 29 Oct 2007 09:54:27 -0700 Max Krasnyansky <[EMAIL PROTECTED]> wrote: > A couple of HP xw9300 machines (dual Opterons) started freezing up. > We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is > alive > (I can switch vts, etc) but everything else is dead (network, etc). > Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. > > Hooked up serial console and the only error that shows up is this. > > ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 > status 0x1540 next cpb count 0x0 next cpb idx 0x0 > ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > Descriptor sense data with sense descriptors (in hex): > end_request: I/O error, dev sda, sector 8388695 > Buffer I/O error on device sda1, logical block 1048579 > lost page write due to I/O error on sda1 > sd 0:0:0:0: [sda] Write Protect is off > > I see a bunch of those and then the box just sits there spewing this > periodically > > ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 > status 0x1540 next cpb count 0x0 next cpb idx 0x0 > ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) > > SMART selftest on the drive passed without errors. > > Here is how this machine looks like > > ... So this happens on more than one machine? The kernel shouldn't freeze, so even if both machines have magically identical hardware faults, there's a kernel bug there somewhere. I guess it would be useful to test a 2.6.23 kernel if poss. We've seen a very large number of reports like this one in recent months (many of which have not been responded to, btw) and perhaps someone has done something about them. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
On Mon, Oct 29, 2007 at 09:54:27AM -0700, Max Krasnyansky wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). I'm thinking this is not a coincidence. I was running 2.6.22.5, and looking at your problems, I just had a similar experience on tuesday.. The network was still fine after kernel errors so that I was able to login with SSH. See: http://lkml.org/lkml/2007/10/30/193 ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off With ata_piix Intel SATA I got these errors: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:68:6f:3a:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 53248 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: port is slow to respond, please be patient (Status 0xd0) ata1: device not ready (errno=-16), forcing hardreset ata1: soft resetting port ata1.00: revalidation failed (errno=-2) ata1: failed to recover some devices, retrying in 5 secs ata1: soft resetting port ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Here is how this machine looks like 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0e.0
Re: Strange freezes (seems like SATA related)
On Mon, 29 Oct 2007 09:54:27 -0700 Max Krasnyansky [EMAIL PROTECTED] wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like ... So this happens on more than one machine? The kernel shouldn't freeze, so even if both machines have magically identical hardware faults, there's a kernel bug there somewhere. I guess it would be useful to test a 2.6.23 kernel if poss. We've seen a very large number of reports like this one in recent months (many of which have not been responded to, btw) and perhaps someone has done something about them. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Heikki Orsila wrote: On Mon, Oct 29, 2007 at 09:54:27AM -0700, Max Krasnyansky wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). I'm thinking this is not a coincidence. I was running 2.6.22.5, and looking at your problems, I just had a similar experience on tuesday.. The network was still fine after kernel errors so that I was able to login with SSH. See: http://lkml.org/lkml/2007/10/30/193 ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off With ata_piix Intel SATA I got these errors: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:68:6f:3a:00/00:00:00:00:00/e0 tag 0 cdb 0x0 data 53248 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1: port is slow to respond, please be patient (Status 0xd0) ata1: device not ready (errno=-16), forcing hardreset ata1: soft resetting port ata1.00: revalidation failed (errno=-2) ata1: failed to recover some devices, retrying in 5 secs ata1: soft resetting port ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA These are two 100% different issues The only thing they have in common is that they spit out an error. Jeff - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Max Krasnyansky wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. Any ideas what might the problem be ? Can you post the full dmesg output? What kind of drive is this? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe
Strange freezes (seems like SATA related)
A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. Any ideas what might the problem be ? Max - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Strange freezes (seems like SATA related)
A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. Any ideas what might the problem be ? Max - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Strange freezes (seems like SATA related)
Max Krasnyansky wrote: A couple of HP xw9300 machines (dual Opterons) started freezing up. We're running on 2.6.22.1 on them. Freezes a somewhere weird. VGA console is alive (I can switch vts, etc) but everything else is dead (network, etc). Unfortunately SYSRQ was not enabled and I could not get backtraces and stuff. Hooked up serial console and the only error that shows up is this. ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:57:00:80/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Descriptor sense data with sense descriptors (in hex): end_request: I/O error, dev sda, sector 8388695 Buffer I/O error on device sda1, logical block 1048579 lost page write due to I/O error on sda1 sd 0:0:0:0: [sda] Write Protect is off I see a bunch of those and then the box just sits there spewing this periodically ata1: EH in ADMA mode, notifier 0x1 notifier_error 0x0 gen_ctl 0x1581000 status 0x1540 next cpb count 0x0 next cpb idx 0x0 ata1: CPB 0: ctl_flags 0xd, resp_flags 0x1 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd ca/00:08:4f:00:f8/00:00:00:00:00/e1 tag 0 cdb 0x0 data 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) SMART selftest on the drive passed without errors. Here is how this machine looks like 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev f2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev f3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 05:04.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] 05:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 0a:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) 40:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:01.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 40:02.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 40:02.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 61:04.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 61:06.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:06.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 07) 61:09.0 PCI bridge: Intel Corporation Unknown device 537c (rev 07) 62:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 63:09.0 Multimedia controller: BittWare, Inc. Unknown device 0035 (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 81:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) As I mentioned dual Opteron, NUMA. Nothing fancy in the kernel config. Any ideas what might the problem be ? Can you post the full dmesg output? What kind of drive is this? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe