PCI Master Aborts effect multiple subsystems?

2005-08-31 Thread Mark Burton
Hi, I am trying to do a small amount of work on the wcfxo device driver 
(or an fxo card), which is part of zapatel, which is used by asterisk, 
the linux open source PBX (hence cross post).


question 1: Are PCI Master Aborts delivered to all subsystems, if they 
are, do I need to "fix" ALL the drivers in my system to handle them?


Here's more detail on my problem:

My problem is that my (2) machines both deliver interrupts from the fxo 
card to both the cards driver and to a.n.other sub system. In one case, 
the scsi driver, in the other the eth0 driver. In both cases, of 
course, the drivers get a little upset.


I can not find out why my machines delivers these interrupts (they are 
PCI Master Aborts). I would be VERY grateful for any help in tracking 
that problem. Some information on what could cause a PCI Master Abort 
would be helpful!


My approach has been to fix the driver to do the right thing in the 
case of a PCI Master Abort. I believe I now have a patch that does 
indeed fix the wcfxo driver (it picks the card up again, and continues 
working). However, meanwhile the other subsystem has crashed and burnt.


So, at the same time that the wcfxo driver receives an IRQ 
(reportedly because of a master abort),

e.g. the eth0 driver (3c59x) gives:
Aug 30 22:46:04 localhost kernel: eth0: Too much work in interrupt, 
status e003.
Aug 30 22:46:05 localhost kernel: ACPI: PCI interrupt :02:08.0[A] 
-> GSI 18 (level, low) -> IRQ 185

Aug 30 22:46:05 localhost last message repeated 32 times
(repeated over and over)

I have tried on several kernel versions, but this is Linux version 
2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) 
#1 Mon Jun 20 20:53:17 MDT 2005


I'm afraid I dont have a record of the /proc/interrupts from this run, 
but I did look at them, and the wcfxo driver was on a different IRQ 
than the 3c59x...



I have tried with, and without APIC (noapic on the boot line), I've 
tried playing with bios options, I've even tried, with noapic (when the 
eth0 card is on IRQ 3) reserving IRQ 3, forcing the eth0 card onto irq 
7. But it still received the IRQ :-(


Can anybody help?
Has anybody seen similar effects before?

Cheers

Mark.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PCI Master Aborts effect multiple subsystems?

2005-08-31 Thread Mark Burton
Hi, I am trying to do a small amount of work on the wcfxo device driver 
(or an fxo card), which is part of zapatel, which is used by asterisk, 
the linux open source PBX (hence cross post).


question 1: Are PCI Master Aborts delivered to all subsystems, if they 
are, do I need to fix ALL the drivers in my system to handle them?


Here's more detail on my problem:

My problem is that my (2) machines both deliver interrupts from the fxo 
card to both the cards driver and to a.n.other sub system. In one case, 
the scsi driver, in the other the eth0 driver. In both cases, of 
course, the drivers get a little upset.


I can not find out why my machines delivers these interrupts (they are 
PCI Master Aborts). I would be VERY grateful for any help in tracking 
that problem. Some information on what could cause a PCI Master Abort 
would be helpful!


My approach has been to fix the driver to do the right thing in the 
case of a PCI Master Abort. I believe I now have a patch that does 
indeed fix the wcfxo driver (it picks the card up again, and continues 
working). However, meanwhile the other subsystem has crashed and burnt.


So, at the same time that the wcfxo driver receives an IRQ 
(reportedly because of a master abort),

e.g. the eth0 driver (3c59x) gives:
Aug 30 22:46:04 localhost kernel: eth0: Too much work in interrupt, 
status e003.
Aug 30 22:46:05 localhost kernel: ACPI: PCI interrupt :02:08.0[A] 
- GSI 18 (level, low) - IRQ 185

Aug 30 22:46:05 localhost last message repeated 32 times
(repeated over and over)

I have tried on several kernel versions, but this is Linux version 
2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 1:3.3.6-6)) 
#1 Mon Jun 20 20:53:17 MDT 2005


I'm afraid I dont have a record of the /proc/interrupts from this run, 
but I did look at them, and the wcfxo driver was on a different IRQ 
than the 3c59x...



I have tried with, and without APIC (noapic on the boot line), I've 
tried playing with bios options, I've even tried, with noapic (when the 
eth0 card is on IRQ 3) reserving IRQ 3, forcing the eth0 card onto irq 
7. But it still received the IRQ :-(


Can anybody help?
Has anybody seen similar effects before?

Cheers

Mark.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)

2005-07-28 Thread Mark Burton

For reference,
Thanks to Ogawa Hirofumi,
the solution was to use noapic on the boot line, as my machine is too 
old to handle APIC well.


Cheers

Mark.

On 19 Jul 2005, at 18:00, Mark Burton wrote:


Hi,
I'm getting similar results to Nick Warne, in that when my ethernet is 
stressed at all (for instance by NFS), I end up with

nfs: server. not responding, still trying
nfs: server  OK

With a realtec card, I get errors in /var/spool/messages along the 
lines of:
Jul  3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 
0005 c07f media 00.
Jul  3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160  
dirty entry 1156.
Jul  3 14:31:13 localhost kernel: eth1:  Tx descriptor 0 is 0008a03c. 
(queue head)


I have no TPM (as far as I can find)

Hence I dont think this is the same problem, but it's manifestation is 
identical.


I was using a realtec card, using the 8139too driver, hence I first 
suspected that. As a test, I have an even older 3com509B, using that 
gives exactly the same results (though it doens't seem to be kind 
enough to output anything to /var/log/debug, so all you get are the 
"server not responding" messages under heavy NFS load.

lsmod however, shows both modules loaded

I'm running debian, and recently got a recent kernel image
/proc/version gives:
Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 
(Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005


Im not a kernel expert at all, any help sorting this problem would be 
appreciated, but Its only worth fixing if it's a general problem -- 
if' I'm on my own, I'll fix it with a band-aid :-)


Cheers

Mark.


-
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)

2005-07-28 Thread Mark Burton

For reference,
Thanks to Ogawa Hirofumi,
the solution was to use noapic on the boot line, as my machine is too 
old to handle APIC well.


Cheers

Mark.

On 19 Jul 2005, at 18:00, Mark Burton wrote:


Hi,
I'm getting similar results to Nick Warne, in that when my ethernet is 
stressed at all (for instance by NFS), I end up with

nfs: server. not responding, still trying
nfs: server  OK

With a realtec card, I get errors in /var/spool/messages along the 
lines of:
Jul  3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 
0005 c07f media 00.
Jul  3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160  
dirty entry 1156.
Jul  3 14:31:13 localhost kernel: eth1:  Tx descriptor 0 is 0008a03c. 
(queue head)


I have no TPM (as far as I can find)

Hence I dont think this is the same problem, but it's manifestation is 
identical.


I was using a realtec card, using the 8139too driver, hence I first 
suspected that. As a test, I have an even older 3com509B, using that 
gives exactly the same results (though it doens't seem to be kind 
enough to output anything to /var/log/debug, so all you get are the 
server not responding messages under heavy NFS load.

lsmod however, shows both modules loaded

I'm running debian, and recently got a recent kernel image
/proc/version gives:
Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 
(Debian 1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005


Im not a kernel expert at all, any help sorting this problem would be 
appreciated, but Its only worth fixing if it's a general problem -- 
if' I'm on my own, I'll fix it with a band-aid :-)


Cheers

Mark.


-
To unsubscribe from this list: send the line unsubscribe 
linux-kernel in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)

2005-07-25 Thread Mark Burton
rr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 248 (750ns min, 21500ns max)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ffbcc000 (32-bit, non-prefetchable) 
[size=4K]

Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold-)

Status: D0 PME-Enable- DSel=0 DScale=1 PME-

:00:0d.3 USB Controller: Lucent Microelectronics USS-344S USB 
Controller (rev 11) (prog-if 10 [OHCI])

Subsystem: Lucent Microelectronics USS-344S USB Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 248 (750ns min, 21500ns max)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ffbbc000 (32-bit, non-prefetchable) 
[size=4K]

Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold-)

Status: D0 PME-Enable- DSel=0 DScale=1 PME-

:00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL-8139/8139C/8139C+ (rev 10)

Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 248 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at f400 [size=256]
Region 1: Memory at ffbac000 (32-bit, non-prefetchable) 
[size=256]

Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold-)

Status: D0 PME-Enable- DSel=0 DScale=0 PME-

:00:14.0 RAM memory: Intel Corp. 450KX/GX [Orion] - 82453KX/GX 
Memory controller (rev 04)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- 
SERR- 

:00:19.0 Host bridge: Intel Corp. 450KX/GX [Orion] - 82454KX/GX PCI 
bridge (rev 04)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
>TAbort- SERR- 
Latency: 32, Cache Line Size: 0x08 (32 bytes)



--


On 23 Jul 2005, at 21:11, OGAWA Hirofumi wrote:


Mark Burton <[EMAIL PROTECTED]> writes:


Hi,
I'm getting similar results to Nick Warne, in that when my ethernet is
stressed at all (for instance by NFS), I end up with
nfs: server. not responding, still trying
nfs: server  OK

With a realtec card, I get errors in /var/spool/messages along the
lines of:
Jul  3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c
0005 c07f media 00.
Jul  3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160
dirty entry 1156.
Jul  3 14:31:13 localhost kernel: eth1:  Tx descriptor 0 is
0008a03c. (queue head)

OGAWA Hirofumi <[EMAIL PROTECTED]>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)

2005-07-25 Thread Mark Burton
])

Subsystem: Lucent Microelectronics USS-344S USB Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
TAbort- TAbort- MAbort- SERR- PERR-

Latency: 248 (750ns min, 21500ns max)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ffbcc000 (32-bit, non-prefetchable) 
[size=4K]

Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold-)

Status: D0 PME-Enable- DSel=0 DScale=1 PME-

:00:0d.3 USB Controller: Lucent Microelectronics USS-344S USB 
Controller (rev 11) (prog-if 10 [OHCI])

Subsystem: Lucent Microelectronics USS-344S USB Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
TAbort- TAbort- MAbort- SERR- PERR-

Latency: 248 (750ns min, 21500ns max)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ffbbc000 (32-bit, non-prefetchable) 
[size=4K]

Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold-)

Status: D0 PME-Enable- DSel=0 DScale=1 PME-

:00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL-8139/8139C/8139C+ (rev 10)

Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium 
TAbort- TAbort- MAbort- SERR- PERR-

Latency: 248 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at f400 [size=256]
Region 1: Memory at ffbac000 (32-bit, non-prefetchable) 
[size=256]

Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1+,D2+,D3hot+,D3cold-)

Status: D0 PME-Enable- DSel=0 DScale=0 PME-

:00:14.0 RAM memory: Intel Corp. 450KX/GX [Orion] - 82453KX/GX 
Memory controller (rev 04)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- 
TAbort- MAbort- SERR- PERR-


:00:19.0 Host bridge: Intel Corp. 450KX/GX [Orion] - 82454KX/GX PCI 
bridge (rev 04)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium 
TAbort- TAbort- MAbort+ SERR- PERR-

Latency: 32, Cache Line Size: 0x08 (32 bytes)



--


On 23 Jul 2005, at 21:11, OGAWA Hirofumi wrote:


Mark Burton [EMAIL PROTECTED] writes:


Hi,
I'm getting similar results to Nick Warne, in that when my ethernet is
stressed at all (for instance by NFS), I end up with
nfs: server. not responding, still trying
nfs: server  OK

With a realtec card, I get errors in /var/spool/messages along the
lines of:
Jul  3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c
0005 c07f media 00.
Jul  3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160
dirty entry 1156.
Jul  3 14:31:13 localhost kernel: eth1:  Tx descriptor 0 is
0008a03c. (queue head)

OGAWA Hirofumi [EMAIL PROTECTED]


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)

2005-07-19 Thread Mark Burton

Hi,
I'm getting similar results to Nick Warne, in that when my ethernet is 
stressed at all (for instance by NFS), I end up with

nfs: server. not responding, still trying
nfs: server  OK

With a realtec card, I get errors in /var/spool/messages along the 
lines of:
Jul  3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 
0005 c07f media 00.
Jul  3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160  
dirty entry 1156.
Jul  3 14:31:13 localhost kernel: eth1:  Tx descriptor 0 is 0008a03c. 
(queue head)


I have no TPM (as far as I can find)

Hence I dont think this is the same problem, but it's manifestation is 
identical.


I was using a realtec card, using the 8139too driver, hence I first 
suspected that. As a test, I have an even older 3com509B, using that 
gives exactly the same results (though it doens't seem to be kind 
enough to output anything to /var/log/debug, so all you get are the 
"server not responding" messages under heavy NFS load.

lsmod however, shows both modules loaded

I'm running debian, and recently got a recent kernel image
/proc/version gives:
Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 
1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005


Im not a kernel expert at all, any help sorting this problem would be 
appreciated, but Its only worth fixing if it's a general problem -- if' 
I'm on my own, I'll fix it with a band-aid :-)


Cheers

Mark.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


tx queue start entry x dirty entry y (was 8139too PCI IRQ issues)

2005-07-19 Thread Mark Burton

Hi,
I'm getting similar results to Nick Warne, in that when my ethernet is 
stressed at all (for instance by NFS), I end up with

nfs: server. not responding, still trying
nfs: server  OK

With a realtec card, I get errors in /var/spool/messages along the 
lines of:
Jul  3 14:31:13 localhost kernel: eth1: Transmit timeout, status 0c 
0005 c07f media 00.
Jul  3 14:31:13 localhost kernel: eth1: Tx queue start entry 1160  
dirty entry 1156.
Jul  3 14:31:13 localhost kernel: eth1:  Tx descriptor 0 is 0008a03c. 
(queue head)


I have no TPM (as far as I can find)

Hence I dont think this is the same problem, but it's manifestation is 
identical.


I was using a realtec card, using the 8139too driver, hence I first 
suspected that. As a test, I have an even older 3com509B, using that 
gives exactly the same results (though it doens't seem to be kind 
enough to output anything to /var/log/debug, so all you get are the 
server not responding messages under heavy NFS load.

lsmod however, shows both modules loaded

I'm running debian, and recently got a recent kernel image
/proc/version gives:
Linux version 2.6.11-1-386 ([EMAIL PROTECTED]) (gcc version 3.3.6 (Debian 
1:3.3.6-6)) #1 Mon Jun 20 20:53:17 MDT 2005


Im not a kernel expert at all, any help sorting this problem would be 
appreciated, but Its only worth fixing if it's a general problem -- if' 
I'm on my own, I'll fix it with a band-aid :-)


Cheers

Mark.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/