RE: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives
> I've added Jeff to CC in case he's interested about the workaround for > this drive (I assume you're using the AHCI driver with your ATI > controller). Yupe, using AHCI. I've just rebooted after adding that blacklist line to the kernel and recompiling but it doesn't seem to have taken effect: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: ATA-7: WDC WD800BEVS-07RST0, 04.01G04, max UDMA/133 ata1.00: 156301488 sectors, multi 1: LBA48 NCQ (depth 31/32) If NCQ is disabled/blacklisted shouldn't the depth say 0/32 ? I used the string "WDC WD800BEVS-07" in the patch, maybe I should use "WDC WD800BEVS-07RST0" instead? cat /sys/bus/scsi/devices/0\:0\:0\:0/model shows "WDC WD800BEVS-07" Stirk, Lamont & Associates Ltd. Registered Address: Thomas Andrews House, Queens Road, Belfast, BT3 9DU Registered in Northern Ireland, Number: NI 47983. VAT Number: 832 2778 22 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives
On Monday 27 August 2007 10:28:09 Dermot Bradley wrote: [snip] > Thanks for the help Alistair! One other point you may be able to help > with - this is the first time I've used a dual core processor and I > expected that /proc/interrupts would should interrupts distributed > between both cores whereas they actually seem to be mainly handled by > the 1st core: > >CPU0 CPU1 > 0:251 0 IO-APIC-edge timer > 1: 2208 11 IO-APIC-edge i8042 > 8: 0 1 IO-APIC-edge rtc > 9: 0 0 IO-APIC-fasteoi acpi > 16: 5291 3 IO-APIC-fasteoi ehci_hcd:usb1, eth0 > 17: 223026 13 IO-APIC-fasteoi ahci > 18: 0 1 IO-APIC-fasteoi ohci_hcd:usb2 > 19: 0126 IO-APIC-fasteoi ohci_hcd:usb3, > ohci_hcd:usb5 > 20: 0 2 IO-APIC-fasteoi ohci_hcd:usb4, > ohci_hcd:usb6 > 21: 188591 1 IO-APIC-fasteoi HFC-multi > NMI: 0 0 > LOC:30363931527288 > ERR: 0 > MIS: 0 > > Is this to be expected for dual core systems? I assume that's with irqbalance installed (if not, try installing it). My understanding is that not every interrupt can be balanced and it's not always valuable to do it. On an Athlon64, there's probably no point. > I'm currently rebuilding the kernel with the following patch to disable > NCQ for these drives: > > --- linux-2.6.22.5/drivers/ata/libata-core.c2007-08-23 > 00:23:54.0 +0100 > +++ linux-2.6.22.5.new/drivers/ata/libata-core.c2007-08-27 > 10:25:16.0 +0100 > @@ -3788,6 +3788,7 @@ > /* NCQ is broken */ > { "Maxtor 6L250S0", "BANC1G10", ATA_HORKAGE_NONCQ }, > { "Maxtor 6B200M0", "BANC1B10", ATA_HORKAGE_NONCQ }, > +{ "WDC WD800BEVS-07", NULL, ATA_HORKAGE_NONCQ }, > /* NCQ hard hangs device under heavier load, needs hard power > cycle */ > { "Maxtor 6B250S0", "BANC1B70", ATA_HORKAGE_NONCQ }, > /* Blacklist entries taken from Silicon Image 3124/3132 I've added Jeff to CC in case he's interested about the workaround for this drive (I assume you're using the AHCI driver with your ATI controller). (Dermot's been having problems with his WD drive on an ATI chipset) > Aug 24 13:55:31 playpbx kernel: ata3.00: exception Emask 0x42 SAct > 0x7fc77 SErr0x800 action 0x6 frozen > Aug 24 13:55:31 playpbx kernel: ata3.00: (spurious completions during > NCQ issue=0x0 SAct=0x7fc77 FIS=004040a1:0008) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/08:00:9a:b7:fc/00:00:04:00:00/40 tag 0 cdb 0x0 data 4096 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/08:08:72:ba:fc/00:00:04:00:00/40 tag 1 cdb 0x0 data 4096 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/10:10:f2:bd:fc/00:00:04:00:00/40 tag 2 cdb 0x0 data 8192 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/08:20:8a:be:fc/00:00:04:00:00/40 tag 4 cdb 0x0 data 4096 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/08:28:12:bf:fc/00:00:04:00:00/40 tag 5 cdb 0x0 data 4096 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/10:30:4a:c0:fc/00:00:04:00:00/40 tag 6 cdb 0x0 data 8192 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/08:50:4a:b5:fc/00:00:04:00:00/40 tag 10 cdb 0x0 data 4096 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/08:58:d2:b5:fc/00:00:04:00:00/40 tag 11 cdb 0x0 data 4096 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/10:60:02:b7:fc/00:00:04:00:00/40 tag 12 cdb 0x0 data 8192 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/08:68:22:b8:fc/00:00:04:00:00/40 tag 13 cdb 0x0 data 4096 out > Aug 24 13:55:31 playpbx kernel: res > 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) > Aug 24 13:55:31 playpbx kernel: ata3.00: cmd > 61/10:70:52:b9:fc/00:00:04:00:00/40 tag 14 cdb 0x0 data 8192 out > Aug 24 13:55:31 playpbx kernel
RE: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives
> FWIW, I've got the HDMI version of this board and I have exactly the same > problem (even with the newest BIOS) if nmi_watchdog is not set to zero. > Try booting with nmi_watchdog=0 (default on x86-64, I think) and see if > these go away. > > I guess the APIC has some difficulties handling NMIs. On Friday I had rebooted the box (remotely) after adding "noapic" to the boot options and the machine didn't come back up - came in this morning to see a kernel panic after the reboot to do with nmi_watchdog. This morning I've removed the "noapic" option and instead set "nmi_watchdog=0" and its booted and been up and running for 30 minutes with no APCI whereas these happened every thing previously during boot. So nmi_watch does indeed seem to related to the root cause of the problem. Thanks for the help Alistair! One other point you may be able to help with - this is the first time I've used a dual core processor and I expected that /proc/interrupts would should interrupts distributed between both cores whereas they actually seem to be mainly handled by the 1st core: CPU0 CPU1 0:251 0 IO-APIC-edge timer 1: 2208 11 IO-APIC-edge i8042 8: 0 1 IO-APIC-edge rtc 9: 0 0 IO-APIC-fasteoi acpi 16: 5291 3 IO-APIC-fasteoi ehci_hcd:usb1, eth0 17: 223026 13 IO-APIC-fasteoi ahci 18: 0 1 IO-APIC-fasteoi ohci_hcd:usb2 19: 0126 IO-APIC-fasteoi ohci_hcd:usb3, ohci_hcd:usb5 20: 0 2 IO-APIC-fasteoi ohci_hcd:usb4, ohci_hcd:usb6 21: 188591 1 IO-APIC-fasteoi HFC-multi NMI: 0 0 LOC:30363931527288 ERR: 0 MIS: 0 Is this to be expected for dual core systems? > I get the feeling this problem is independent of the APIC errors, and I > don't see it here Yeah, I thought as much. After disabling NMI Watchdog I didn't see these NCQ messages initially but after running a few bonnie runs to load the disks I'm now getting them again. > As Alan said, it's very possibly just the drive not properly supporting > NCQ. Yeah that seems the case - last Friday these errors seemed to be occurring every 1-2 hours or so even when I wasn't using the machine. I'm currently rebuilding the kernel with the following patch to disable NCQ for these drives: --- linux-2.6.22.5/drivers/ata/libata-core.c2007-08-23 00:23:54.0 +0100 +++ linux-2.6.22.5.new/drivers/ata/libata-core.c2007-08-27 10:25:16.0 +0100 @@ -3788,6 +3788,7 @@ /* NCQ is broken */ { "Maxtor 6L250S0", "BANC1G10", ATA_HORKAGE_NONCQ }, { "Maxtor 6B200M0", "BANC1B10", ATA_HORKAGE_NONCQ }, +{ "WDC WD800BEVS-07", NULL, ATA_HORKAGE_NONCQ }, /* NCQ hard hangs device under heavier load, needs hard power cycle */ { "Maxtor 6B250S0", "BANC1B70", ATA_HORKAGE_NONCQ }, /* Blacklist entries taken from Silicon Image 3124/3132 Stirk, Lamont & Associates Ltd. Registered Address: Thomas Andrews House, Queens Road, Belfast, BT3 9DU Registered in Northern Ireland, Number: NI 47983. VAT Number: 832 2778 22 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives
On Friday 24 August 2007 20:20:02 Alan Cox wrote: > On Fri, 24 Aug 2007 14:39:10 +0100 > > "Dermot Bradley" <[EMAIL PROTECTED]> wrote: > > I've just built a new machine using a ASUS M2A-VM boardboard (ATI SB600 > > chipset), AMD X2 3800+ processor, and 2 Western Digital 2.5" 80Gb drives > > running in RAID-1 using MD. I've had these problems with both 2.6.22.1 > > and now 2.6.22.5 kernels. > > > > I'm getting the following errors on occasion: > > > > Aug 24 13:19:22 playpbx kernel: APIC error on CPU0: 00(40) > > Aug 24 13:19:33 playpbx kernel: APIC error on CPU0: 40(40) > > This is not good. FWIW, I've got the HDMI version of this board and I have exactly the same problem (even with the newest BIOS) if nmi_watchdog is not set to zero. Try booting with nmi_watchdog=0 (default on x86-64, I think) and see if these go away. I guess the APIC has some difficulties handling NMIs. > > Aug 24 13:55:31 playpbx kernel: ata3.00: exception Emask 0x42 SAct > > 0x7fc77 SErr0x800 action 0x6 frozen > > Aug 24 13:55:31 playpbx kernel: ata3.00: (spurious completions during > > NCQ issue=0x0 SAct=0x7fc77 FIS=004040a1:0008) > > Probably not connected - your drive seems to be talking rubbish > > Neither are good, the latter is probably a drive firmware problem and the > kernel will give up using NCQ with it if it keeps doing that, which > should be just fine. I get the feeling this problem is independent of the APIC errors, and I don't see it here. I'm using Hitachi Deskstars on the on-board controller in AHCI mode, and everything works fine. As Alan said, it's very possibly just the drive not properly supporting NCQ. -- Cheers, Alistair. 137/1 Warrender Park Road, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives
> (1) there is 1 newer official BIOS release and 2 beta BIOSes out for > this motherboard. The recent official BIOS (0901) does mention Linux: > >"Fixed Red Hat Enterprise Linux 4 installation failed" > > So maybe a BIOS update will fix things. Yep > So I guess these drives should be added to the ata_blacklist_entry in > libata-core.c with ATA_HORKAGE_NONCQ? If you keep seeing the problem then yes - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives
Alan Cox wrote: > > I'm getting the following errors on occasion: > > > > Aug 24 13:19:22 playpbx kernel: APIC error on CPU0: 00(40) > > Aug 24 13:19:33 playpbx kernel: APIC error on CPU0: 40(40) > > This is not good. I've two things in mind to try: (1) there is 1 newer official BIOS release and 2 beta BIOSes out for this motherboard. The recent official BIOS (0901) does mention Linux: "Fixed Red Hat Enterprise Linux 4 installation failed" So maybe a BIOS update will fix things. (2) I will try booting with the "noapic" kernel option to see what difference that makes. > Neither are good, the latter is probably a drive firmware problem and > the kernel will give up using NCQ with it if it keeps doing that, > which should be just fine. So I guess these drives should be added to the ata_blacklist_entry in libata-core.c with ATA_HORKAGE_NONCQ? Stirk, Lamont & Associates Ltd. Registered Address: Thomas Andrews House, Queens Road, Belfast, BT3 9DU Registered in Northern Ireland, Number: NI 47983. VAT Number: 832 2778 22 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "exception Emask: 0x42" errors with 2.6.22.x and SATA drives
On Fri, 24 Aug 2007 14:39:10 +0100 "Dermot Bradley" <[EMAIL PROTECTED]> wrote: > I've just built a new machine using a ASUS M2A-VM boardboard (ATI SB600 > chipset), AMD X2 3800+ processor, and 2 Western Digital 2.5" 80Gb drives > running in RAID-1 using MD. I've had these problems with both 2.6.22.1 > and now 2.6.22.5 kernels. > > I'm getting the following errors on occasion: > > Aug 24 13:19:22 playpbx kernel: APIC error on CPU0: 00(40) > Aug 24 13:19:33 playpbx kernel: APIC error on CPU0: 40(40) This is not good. > Aug 24 13:55:31 playpbx kernel: ata3.00: exception Emask 0x42 SAct > 0x7fc77 SErr0x800 action 0x6 frozen > Aug 24 13:55:31 playpbx kernel: ata3.00: (spurious completions during > NCQ issue=0x0 SAct=0x7fc77 FIS=004040a1:0008) Probably not connected - your drive seems to be talking rubbish Neither are good, the latter is probably a drive firmware problem and the kernel will give up using NCQ with it if it keeps doing that, which should be just fine. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
"exception Emask: 0x42" errors with 2.6.22.x and SATA drives
I've just built a new machine using a ASUS M2A-VM boardboard (ATI SB600 chipset), AMD X2 3800+ processor, and 2 Western Digital 2.5" 80Gb drives running in RAID-1 using MD. I've had these problems with both 2.6.22.1 and now 2.6.22.5 kernels. I'm getting the following errors on occasion: Aug 24 13:19:22 playpbx kernel: APIC error on CPU0: 00(40) Aug 24 13:19:33 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:20:01 playpbx last message repeated 5 times Aug 24 13:20:54 playpbx last message repeated 2 times Aug 24 13:21:40 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:23:23 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:24:35 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:25:51 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:26:37 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:29:11 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:30:01 playpbx last message repeated 4 times Aug 24 13:30:46 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:32:21 playpbx last message repeated 2 times Aug 24 13:33:33 playpbx last message repeated 2 times Aug 24 13:35:14 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:36:10 playpbx last message repeated 3 times Aug 24 13:37:04 playpbx last message repeated 2 times Aug 24 13:38:25 playpbx last message repeated 2 times Aug 24 13:38:43 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:40:08 playpbx last message repeated 3 times Aug 24 13:40:57 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:42:28 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:43:37 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:45:04 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:45:11 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:46:47 playpbx last message repeated 2 times Aug 24 13:47:33 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:48:15 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:49:56 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:51:34 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:51:44 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:53:28 playpbx last message repeated 2 times Aug 24 13:54:02 playpbx kernel: APIC error on CPU0: 40(40) Aug 24 13:55:31 playpbx kernel: ata3.00: exception Emask 0x42 SAct 0x7fc77 SErr0x800 action 0x6 frozen Aug 24 13:55:31 playpbx kernel: ata3.00: (spurious completions during NCQ issue=0x0 SAct=0x7fc77 FIS=004040a1:0008) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:00:9a:b7:fc/00:00:04:00:00/40 tag 0 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:08:72:ba:fc/00:00:04:00:00/40 tag 1 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/10:10:f2:bd:fc/00:00:04:00:00/40 tag 2 cdb 0x0 data 8192 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:20:8a:be:fc/00:00:04:00:00/40 tag 4 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:28:12:bf:fc/00:00:04:00:00/40 tag 5 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/10:30:4a:c0:fc/00:00:04:00:00/40 tag 6 cdb 0x0 data 8192 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:50:4a:b5:fc/00:00:04:00:00/40 tag 10 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:58:d2:b5:fc/00:00:04:00:00/40 tag 11 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/10:60:02:b7:fc/00:00:04:00:00/40 tag 12 cdb 0x0 data 8192 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:68:22:b8:fc/00:00:04:00:00/40 tag 13 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/10:70:52:b9:fc/00:00:04:00:00/40 tag 14 cdb 0x0 data 8192 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42 (HSM violation) Aug 24 13:55:31 playpbx kernel: ata3.00: cmd 61/08:78:ea:b9:fc/00:00:04:00:00/40 tag 15 cdb 0x0 data 4096 out Aug 24 13:55:31 playpbx kernel: res 40/00:34:4a:c0:fc/00:00:04:00:00/40 Emask 0x42