PCI Master Aborts effect multiple subsystems?
Hi, I am trying to do a small amount of work on the wcfxo device driver (or an fxo card), which is part of zapatel, which is used by asterisk, the linux open source PBX (hence cross post). question 1: Are PCI Master Aborts delivered to all subsystems, if they are, do I need to "fix" ALL the drivers in my system to handle them? Here's more detail on my problem: My problem is that my (2) machines both deliver interrupts from the fxo card to both the cards driver and to a.n.other sub system. In one case, the scsi driver, in the other the eth0 driver. In both cases, of course, the drivers get a little upset. I can not find out why my machines delivers these interrupts (they are PCI Master Aborts). I would be VERY grateful for any help in tracking that problem. Some information on what could cause a PCI Master Abort would be helpful! My approach has been to fix the driver to do the right thing in the case of a PCI Master Abort. I believe I now have a patch that does indeed fix the wcfxo driver (it picks the card up again, and continues working). However, meanwhile the other subsystem has crashed and burnt. So, at the same time that the wcfxo driver receives an IRQ (reportedly because of a master abort), e.g. the eth0 driver (3c59x) gives: Aug 30 22:46:04 localhost kernel: eth0: Too much work in interrupt, status e003. Aug 30 22:46:05 localhost kernel: ACPI: PCI interrupt :02:08.0[A] -> GSI 18 (level, low) -> IRQ 185 Aug 30 22:46:05 localhost last message repeated 32 times (repeated over and over) I have tried on several kernel versions, but this is Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005 I'm afraid I dont have a record of the /proc/interrupts from this run, but I did look at them, and the wcfxo driver was on a different IRQ than the 3c59x... I have tried with, and without APIC (noapic on the boot line), I've tried playing with bios options, I've even tried, with noapic (when the eth0 card is on IRQ 3) reserving IRQ 3, forcing the eth0 card onto irq 7. But it still received the IRQ :-( Can anybody help? Has anybody seen similar effects before? Cheers Mark. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
PCI Master Aborts effect multiple subsystems?
Hi, I am trying to do a small amount of work on the wcfxo device driver (or an fxo card), which is part of zapatel, which is used by asterisk, the linux open source PBX (hence cross post). question 1: Are PCI Master Aborts delivered to all subsystems, if they are, do I need to fix ALL the drivers in my system to handle them? Here's more detail on my problem: My problem is that my (2) machines both deliver interrupts from the fxo card to both the cards driver and to a.n.other sub system. In one case, the scsi driver, in the other the eth0 driver. In both cases, of course, the drivers get a little upset. I can not find out why my machines delivers these interrupts (they are PCI Master Aborts). I would be VERY grateful for any help in tracking that problem. Some information on what could cause a PCI Master Abort would be helpful! My approach has been to fix the driver to do the right thing in the case of a PCI Master Abort. I believe I now have a patch that does indeed fix the wcfxo driver (it picks the card up again, and continues working). However, meanwhile the other subsystem has crashed and burnt. So, at the same time that the wcfxo driver receives an IRQ (reportedly because of a master abort), e.g. the eth0 driver (3c59x) gives: Aug 30 22:46:04 localhost kernel: eth0: Too much work in interrupt, status e003. Aug 30 22:46:05 localhost kernel: ACPI: PCI interrupt :02:08.0[A] - GSI 18 (level, low) - IRQ 185 Aug 30 22:46:05 localhost last message repeated 32 times (repeated over and over) I have tried on several kernel versions, but this is Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005 I'm afraid I dont have a record of the /proc/interrupts from this run, but I did look at them, and the wcfxo driver was on a different IRQ than the 3c59x... I have tried with, and without APIC (noapic on the boot line), I've tried playing with bios options, I've even tried, with noapic (when the eth0 card is on IRQ 3) reserving IRQ 3, forcing the eth0 card onto irq 7. But it still received the IRQ :-( Can anybody help? Has anybody seen similar effects before? Cheers Mark. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)
For reference, Thanks to Ogawa Hirofumi, the solution was to use noapic on the boot line, as my machine is too old to handle APIC well. Cheers Mark. On 19 Jul 2005, at 18:00, Mark Burton wrote: Hi, I'm getting similar results to Nick Warne, in that when my ethernet is stressed at all (for instance by NFS), I end up with nfs: server. not responding, still trying nfs: server OK With a realtec card, I get errors in /var/spool/messages along the lines of: Jul 3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 0005 c07f media 00. Jul 3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160 dirty entry 1156. Jul 3 14:31:13 localhost kernel: eth1: Tx descriptor 0 is 0008a03c. (queue head) I have no TPM (as far as I can find) Hence I dont think this is the same problem, but it's manifestation is identical. I was using a realtec card, using the 8139too driver, hence I first suspected that. As a test, I have an even older 3com509B, using that gives exactly the same results (though it doens't seem to be kind enough to output anything to /var/log/debug, so all you get are the "server not responding" messages under heavy NFS load. lsmod however, shows both modules loaded I'm running debian, and recently got a recent kernel image /proc/version gives: Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005 Im not a kernel expert at all, any help sorting this problem would be appreciated, but Its only worth fixing if it's a general problem -- if' I'm on my own, I'll fix it with a band-aid :-) Cheers Mark. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)
For reference, Thanks to Ogawa Hirofumi, the solution was to use noapic on the boot line, as my machine is too old to handle APIC well. Cheers Mark. On 19 Jul 2005, at 18:00, Mark Burton wrote: Hi, I'm getting similar results to Nick Warne, in that when my ethernet is stressed at all (for instance by NFS), I end up with nfs: server. not responding, still trying nfs: server OK With a realtec card, I get errors in /var/spool/messages along the lines of: Jul 3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 0005 c07f media 00. Jul 3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160 dirty entry 1156. Jul 3 14:31:13 localhost kernel: eth1: Tx descriptor 0 is 0008a03c. (queue head) I have no TPM (as far as I can find) Hence I dont think this is the same problem, but it's manifestation is identical. I was using a realtec card, using the 8139too driver, hence I first suspected that. As a test, I have an even older 3com509B, using that gives exactly the same results (though it doens't seem to be kind enough to output anything to /var/log/debug, so all you get are the server not responding messages under heavy NFS load. lsmod however, shows both modules loaded I'm running debian, and recently got a recent kernel image /proc/version gives: Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005 Im not a kernel expert at all, any help sorting this problem would be appreciated, but Its only worth fixing if it's a general problem -- if' I'm on my own, I'll fix it with a band-aid :-) Cheers Mark. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)
rr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Latency: 248 (750ns min, 21500ns max) Interrupt: pin A routed to IRQ 11 Region 0: Memory at ffbcc000 (32-bit, non-prefetchable) [size=4K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- :00:0d.3 USB Controller: Lucent Microelectronics USS-344S USB Controller (rev 11) (prog-if 10 [OHCI]) Subsystem: Lucent Microelectronics USS-344S USB Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Latency: 248 (750ns min, 21500ns max) Interrupt: pin A routed to IRQ 11 Region 0: Memory at ffbbc000 (32-bit, non-prefetchable) [size=4K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- :00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) Subsystem: Realtek Semiconductor Co., Ltd. RT8139 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- Latency: 248 (8000ns min, 16000ns max) Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at f400 [size=256] Region 1: Memory at ffbac000 (32-bit, non-prefetchable) [size=256] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- :00:14.0 RAM memory: Intel Corp. 450KX/GX [Orion] - 82453KX/GX Memory controller (rev 04) Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- :00:19.0 Host bridge: Intel Corp. 450KX/GX [Orion] - 82454KX/GX PCI bridge (rev 04) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Latency: 32, Cache Line Size: 0x08 (32 bytes) -- On 23 Jul 2005, at 21:11, OGAWA Hirofumi wrote: Mark Burton <[EMAIL PROTECTED]> writes: Hi, I'm getting similar results to Nick Warne, in that when my ethernet is stressed at all (for instance by NFS), I end up with nfs: server. not responding, still trying nfs: server OK With a realtec card, I get errors in /var/spool/messages along the lines of: Jul 3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 0005 c07f media 00. Jul 3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160 dirty entry 1156. Jul 3 14:31:13 localhost kernel: eth1: Tx descriptor 0 is 0008a03c. (queue head) OGAWA Hirofumi <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)
]) Subsystem: Lucent Microelectronics USS-344S USB Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Latency: 248 (750ns min, 21500ns max) Interrupt: pin A routed to IRQ 11 Region 0: Memory at ffbcc000 (32-bit, non-prefetchable) [size=4K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- :00:0d.3 USB Controller: Lucent Microelectronics USS-344S USB Controller (rev 11) (prog-if 10 [OHCI]) Subsystem: Lucent Microelectronics USS-344S USB Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Latency: 248 (750ns min, 21500ns max) Interrupt: pin A routed to IRQ 11 Region 0: Memory at ffbbc000 (32-bit, non-prefetchable) [size=4K] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- :00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) Subsystem: Realtek Semiconductor Co., Ltd. RT8139 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- TAbort- MAbort- SERR- PERR- Latency: 248 (8000ns min, 16000ns max) Interrupt: pin A routed to IRQ 11 Region 0: I/O ports at f400 [size=256] Region 1: Memory at ffbac000 (32-bit, non-prefetchable) [size=256] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- :00:14.0 RAM memory: Intel Corp. 450KX/GX [Orion] - 82453KX/GX Memory controller (rev 04) Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- :00:19.0 Host bridge: Intel Corp. 450KX/GX [Orion] - 82454KX/GX PCI bridge (rev 04) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium TAbort- TAbort- MAbort+ SERR- PERR- Latency: 32, Cache Line Size: 0x08 (32 bytes) -- On 23 Jul 2005, at 21:11, OGAWA Hirofumi wrote: Mark Burton [EMAIL PROTECTED] writes: Hi, I'm getting similar results to Nick Warne, in that when my ethernet is stressed at all (for instance by NFS), I end up with nfs: server. not responding, still trying nfs: server OK With a realtec card, I get errors in /var/spool/messages along the lines of: Jul 3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 0005 c07f media 00. Jul 3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160 dirty entry 1156. Jul 3 14:31:13 localhost kernel: eth1: Tx descriptor 0 is 0008a03c. (queue head) OGAWA Hirofumi [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)
Hi, I'm getting similar results to Nick Warne, in that when my ethernet is stressed at all (for instance by NFS), I end up with nfs: server. not responding, still trying nfs: server OK With a realtec card, I get errors in /var/spool/messages along the lines of: Jul 3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 0005 c07f media 00. Jul 3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160 dirty entry 1156. Jul 3 14:31:13 localhost kernel: eth1: Tx descriptor 0 is 0008a03c. (queue head) I have no TPM (as far as I can find) Hence I dont think this is the same problem, but it's manifestation is identical. I was using a realtec card, using the 8139too driver, hence I first suspected that. As a test, I have an even older 3com509B, using that gives exactly the same results (though it doens't seem to be kind enough to output anything to /var/log/debug, so all you get are the "server not responding" messages under heavy NFS load. lsmod however, shows both modules loaded I'm running debian, and recently got a recent kernel image /proc/version gives: Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005 Im not a kernel expert at all, any help sorting this problem would be appreciated, but Its only worth fixing if it's a general problem -- if' I'm on my own, I'll fix it with a band-aid :-) Cheers Mark. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)
Hi, I'm getting similar results to Nick Warne, in that when my ethernet is stressed at all (for instance by NFS), I end up with nfs: server. not responding, still trying nfs: server OK With a realtec card, I get errors in /var/spool/messages along the lines of: Jul 3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 0005 c07f media 00. Jul 3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160 dirty entry 1156. Jul 3 14:31:13 localhost kernel: eth1: Tx descriptor 0 is 0008a03c. (queue head) I have no TPM (as far as I can find) Hence I dont think this is the same problem, but it's manifestation is identical. I was using a realtec card, using the 8139too driver, hence I first suspected that. As a test, I have an even older 3com509B, using that gives exactly the same results (though it doens't seem to be kind enough to output anything to /var/log/debug, so all you get are the server not responding messages under heavy NFS load. lsmod however, shows both modules loaded I'm running debian, and recently got a recent kernel image /proc/version gives: Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005 Im not a kernel expert at all, any help sorting this problem would be appreciated, but Its only worth fixing if it's a general problem -- if' I'm on my own, I'll fix it with a band-aid :-) Cheers Mark. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/