Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-13 Thread Michal Piotrowski

On 12/03/07, Tejun Heo [EMAIL PROTECTED] wrote:

Stephen Hemminger wrote:
 On Tue, 13 Mar 2007 04:03:00 +0900
 Tejun Heo [EMAIL PROTECTED] wrote:

 Stephen Hemminger wrote:
 1. the controller has IRQ stuck high (infrequent but possible)
 2. the IRQ is already requested by another device
 3. the IRQ gets disabled due to screaming interrupts at the moment
 ata_piix does pci_enable_device().

 I think we can be much more resilient to screaming interrupts if we
 enable device with IRQ disabled and enable it after the device is
 initialized to some level, possibly when requesting IRQ.
 The first thing the skge driver does is do a chip reset, and that should
 cause IRQ to be disabled and cleared. The driver has no chance to
 fix it if the BIOS left the IRQ screaming...
 What if we do something like...

  pci_intx(pdev, 0);
  pci_enable_device(pdev);
  /* initialize */
  request_irq(blah blah...);
  pci_intx(pdev, 1);

 Would this work for skge?


 Okay for testing, but any change like this should be done in the base
 PCI layer, not one off in a particular driver.

Yeap, it was a proof-of-concept pseudo code.  I attached a patch to do
above in skge.  Please point out if it is broken (e.g. intx needs to be
enabled earlier).

Michal, can you apply the attached patch and see whether it fixes the
problem.


I think that problem is solved.

Thanks.



Thanks.

--
tejun

diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index eea75a4..2c990f2 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3585,6 +3585,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
struct skge_hw *hw;
int err, using_dac = 0;

+   pci_intx(pdev, 0);
err = pci_enable_device(pdev);
if (err) {
dev_err(pdev-dev, cannot enable PCI device\n);
@@ -3669,6 +3670,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
   dev-name, pdev-irq);
goto err_out_unregister;
}
+   pci_intx(pdev, 1);
skge_show_addr(dev);

if (hw-ports  1  (dev1 = skge_devinit(hw, 1, using_dac))) {




Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Michal Piotrowski
Hi,

Tejun Heo napisaƂ(a):
 Michal Piotrowski wrote:
 Hi Jeff,

 I've got some problems with my SATA controller on crashdump kernel.

 Calling initcall 0xc1916081: fc_transport_init+0x0/0x35()
 Calling initcall 0xc19160b6: init_sd+0x0/0xbc()
 Calling initcall 0xc19161ec: piix_init+0x0/0x27()
 ata_piix :00:1f.2: version 2.10
 ata_piix :00:1f.2: MAP [ P0 -- P1 -- ]
 ACPI: PCI Interrupt :00:1f.2[A] - Link [LNKC] - GSI 5 (level,
 low) - IRQ 5
 PCI: Setting latency timer of device :00:1f.2 to 64
 ata1: SATA max UDMA/133 cmd 0x0001cc00 ctl 0x0001c882 bmdma 0x0001c400
 irq 5
 ata2: SATA max UDMA/133 cmd 0x0001c800 ctl 0x0001c482 bmdma 0x0001c408
 irq 5
 scsi0 : ata_piix
 PM: Adding info for No Bus:host0
 ata1.00: ATA-7: ST3160811AS, 3.AAE, max UDMA/133
 ata1.00: 312581808 sectors, multi 16: LBA48 NCQ (depth 0/32)
 ata1.00: qc timeout (cmd 0xef)
 ata1.00: failed to set xfermode (err_mask=0x4)
 
 Does giving 'irqpoll' boot parameter fix the problem?
 

Hmmm... it works.

Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
Calling initcall 0xc191572e: ide_init+0x0/0x81()
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH5: IDE controller at PCI slot :00:1f.1
irq 5: nobody cared (try booting with the irqpoll option)
 [c1604556] show_trace_log_lvl+0x1a/0x2f
 [c1604c2c] show_trace+0x12/0x14
 [c1604cde] dump_stack+0x16/0x18
 [c164341c] __report_bad_irq+0x39/0x79
 [c16435eb] note_interrupt+0x18f/0x1c8
 [c1643ec6] handle_level_irq+0x95/0xcb
 [c1605dd8] do_IRQ+0xb4/0xe0
 ===
handlers:
[c174f55e] (skge_intr+0x0/0x3ff)
Disabling IRQ #5
ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
ACPI: PCI Interrupt :00:1f.1[A] - Link [LNKC] - GSI 5 (level, low) - IRQ 
5
ICH5: chipset revision 2
ICH5: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA

Is this an IDE or skge bug?

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-rc3-git4-kdump/git-config

Thomas, Ingo - this soft lockup with irqpoll seems to be fixed
http://www.ussg.iu.edu/hypermail/linux/kernel/0703.0/index.html#1116
Thanks!

Regards,
Michal

-- 
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Tejun Heo
Michal Piotrowski wrote:
 Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
 Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
 Calling initcall 0xc191572e: ide_init+0x0/0x81()
 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
 ICH5: IDE controller at PCI slot :00:1f.1
 irq 5: nobody cared (try booting with the irqpoll option)
  [c1604556] show_trace_log_lvl+0x1a/0x2f
  [c1604c2c] show_trace+0x12/0x14
  [c1604cde] dump_stack+0x16/0x18
  [c164341c] __report_bad_irq+0x39/0x79
  [c16435eb] note_interrupt+0x18f/0x1c8
  [c1643ec6] handle_level_irq+0x95/0xcb
  [c1605dd8] do_IRQ+0xb4/0xe0
  ===
 handlers:
 [c174f55e] (skge_intr+0x0/0x3ff)
 Disabling IRQ #5
 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
 ACPI: PCI Interrupt :00:1f.1[A] - Link [LNKC] - GSI 5 (level, low) - 
 IRQ 5
 ICH5: chipset revision 2
 ICH5: not 100% native mode: will probe irqs later
 ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
 ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA
 
 Is this an IDE or skge bug?

It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
serviced and commands time out.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Thomas Gleixner
On Mon, 2007-03-12 at 17:31 +0100, Michal Piotrowski wrote:
 Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
 Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
 Calling initcall 0xc191572e: ide_init+0x0/0x81()
 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
 ICH5: IDE controller at PCI slot :00:1f.1
 irq 5: nobody cared (try booting with the irqpoll option)
  [c1604556] show_trace_log_lvl+0x1a/0x2f
  [c1604c2c] show_trace+0x12/0x14
  [c1604cde] dump_stack+0x16/0x18
  [c164341c] __report_bad_irq+0x39/0x79
  [c16435eb] note_interrupt+0x18f/0x1c8
  [c1643ec6] handle_level_irq+0x95/0xcb
  [c1605dd8] do_IRQ+0xb4/0xe0
  ===
 handlers:
 [c174f55e] (skge_intr+0x0/0x3ff)
 Disabling IRQ #5

I know this one :( 

It seems to be related to the BIOS spinning up the CDROM and leaving the
IDE controller in some weird state. When we come back the interrupt is
screaming and nobody cares, so it gets disabled. I have no clue yet, how
to handle this.

Disabling the interrupt across suspend/resume helps, but does not work,
when the interrupt is shared with some other device.

tglx


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Thomas Gleixner
On Tue, 2007-03-13 at 01:37 +0900, Tejun Heo wrote:
 Michal Piotrowski wrote:
  Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
  Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
  Calling initcall 0xc191572e: ide_init+0x0/0x81()
  Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
  ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
  ICH5: IDE controller at PCI slot :00:1f.1
  irq 5: nobody cared (try booting with the irqpoll option)
   [c1604556] show_trace_log_lvl+0x1a/0x2f
   [c1604c2c] show_trace+0x12/0x14
   [c1604cde] dump_stack+0x16/0x18
   [c164341c] __report_bad_irq+0x39/0x79
   [c16435eb] note_interrupt+0x18f/0x1c8
   [c1643ec6] handle_level_irq+0x95/0xcb
   [c1605dd8] do_IRQ+0xb4/0xe0
   ===
  handlers:
  [c174f55e] (skge_intr+0x0/0x3ff)
  Disabling IRQ #5
  ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
  ACPI: PCI Interrupt :00:1f.1[A] - Link [LNKC] - GSI 5 (level, low) - 
  IRQ 5
  ICH5: chipset revision 2
  ICH5: not 100% native mode: will probe irqs later
  ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
  ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA
  
  Is this an IDE or skge bug?
 
 It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
 ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
 serviced and commands time out.

I doubt that. On my box the interrupt is solely used by ata_piix.

tglx


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Tejun Heo
Thomas Gleixner wrote:
 On Tue, 2007-03-13 at 01:37 +0900, Tejun Heo wrote:
 Michal Piotrowski wrote:
 Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
 Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
 Calling initcall 0xc191572e: ide_init+0x0/0x81()
 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
 ICH5: IDE controller at PCI slot :00:1f.1
 irq 5: nobody cared (try booting with the irqpoll option)
  [c1604556] show_trace_log_lvl+0x1a/0x2f
  [c1604c2c] show_trace+0x12/0x14
  [c1604cde] dump_stack+0x16/0x18
  [c164341c] __report_bad_irq+0x39/0x79
  [c16435eb] note_interrupt+0x18f/0x1c8
  [c1643ec6] handle_level_irq+0x95/0xcb
  [c1605dd8] do_IRQ+0xb4/0xe0
  ===
 handlers:
 [c174f55e] (skge_intr+0x0/0x3ff)
 Disabling IRQ #5
 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
 ACPI: PCI Interrupt :00:1f.1[A] - Link [LNKC] - GSI 5 (level, low) - 
 IRQ 5
 ICH5: chipset revision 2
 ICH5: not 100% native mode: will probe irqs later
 ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
 ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA

 Is this an IDE or skge bug?
 It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
 ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
 serviced and commands time out.
 
 I doubt that. On my box the interrupt is solely used by ata_piix.

Ah right.  ata_piix could be screaming when the skge requested IRQ#5,
but ata_piix is in native mode meaning that the PCI device is probably
in disabled state when skge requests IRQ#5.

Michal, can you please test the machine with skge disabled?  If it's an
on board device, you can probably disable it in the BIOS configuration menu.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Tejun Heo
Thomas Gleixner wrote:
 On Mon, 2007-03-12 at 17:31 +0100, Michal Piotrowski wrote:
 Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
 Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
 Calling initcall 0xc191572e: ide_init+0x0/0x81()
 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
 ICH5: IDE controller at PCI slot :00:1f.1
 irq 5: nobody cared (try booting with the irqpoll option)
  [c1604556] show_trace_log_lvl+0x1a/0x2f
  [c1604c2c] show_trace+0x12/0x14
  [c1604cde] dump_stack+0x16/0x18
  [c164341c] __report_bad_irq+0x39/0x79
  [c16435eb] note_interrupt+0x18f/0x1c8
  [c1643ec6] handle_level_irq+0x95/0xcb
  [c1605dd8] do_IRQ+0xb4/0xe0
  ===
 handlers:
 [c174f55e] (skge_intr+0x0/0x3ff)
 Disabling IRQ #5
 
 I know this one :( 
 
 It seems to be related to the BIOS spinning up the CDROM and leaving the
 IDE controller in some weird state. When we come back the interrupt is
 screaming and nobody cares, so it gets disabled. I have no clue yet, how
 to handle this.
 
 Disabling the interrupt across suspend/resume helps, but does not work,
 when the interrupt is shared with some other device.

Similar thing can happen during initialization.  I haven't actually
instrumented the code but I think what happens is

1. the controller has IRQ stuck high (infrequent but possible)
2. the IRQ is already requested by another device
3. the IRQ gets disabled due to screaming interrupts at the moment
ata_piix does pci_enable_device().

I think we can be much more resilient to screaming interrupts if we
enable device with IRQ disabled and enable it after the device is
initialized to some level, possibly when requesting IRQ.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Michal Piotrowski

On 12/03/07, Tejun Heo [EMAIL PROTECTED] wrote:

Thomas Gleixner wrote:
 On Tue, 2007-03-13 at 01:37 +0900, Tejun Heo wrote:
 Michal Piotrowski wrote:
 Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
 Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
 Calling initcall 0xc191572e: ide_init+0x0/0x81()
 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
 ICH5: IDE controller at PCI slot :00:1f.1
 irq 5: nobody cared (try booting with the irqpoll option)
  [c1604556] show_trace_log_lvl+0x1a/0x2f
  [c1604c2c] show_trace+0x12/0x14
  [c1604cde] dump_stack+0x16/0x18
  [c164341c] __report_bad_irq+0x39/0x79
  [c16435eb] note_interrupt+0x18f/0x1c8
  [c1643ec6] handle_level_irq+0x95/0xcb
  [c1605dd8] do_IRQ+0xb4/0xe0
  ===
 handlers:
 [c174f55e] (skge_intr+0x0/0x3ff)
 Disabling IRQ #5
 ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 5
 ACPI: PCI Interrupt :00:1f.1[A] - Link [LNKC] - GSI 5 (level, low) - 
IRQ 5
 ICH5: chipset revision 2
 ICH5: not 100% native mode: will probe irqs later
 ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:DMA
 ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:pio, hdd:DMA

 Is this an IDE or skge bug?
 It seems skge's.  skge is screaming and kernel shuts down IRQ 5.
 ata_piix is unfortunately sharing the IRQ, so its IRQ doesn't get
 serviced and commands time out.

 I doubt that. On my box the interrupt is solely used by ata_piix.

Ah right.  ata_piix could be screaming when the skge requested IRQ#5,
but ata_piix is in native mode meaning that the PCI device is probably
in disabled state when skge requests IRQ#5.

Michal, can you please test the machine with skge disabled?


It seems to work fine with skge disabled.


 If it's an
on board device, you can probably disable it in the BIOS configuration menu.

Thanks.

--
tejun



Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Stephen Hemminger
On Tue, 13 Mar 2007 01:56:36 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

 Thomas Gleixner wrote:
  On Mon, 2007-03-12 at 17:31 +0100, Michal Piotrowski wrote:
  Calling initcall 0xc19154d8: piix_ide_init+0x0/0xbb()
  Calling initcall 0xc19155b6: generic_ide_init+0x0/0x16()
  Calling initcall 0xc191572e: ide_init+0x0/0x81()
  Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
  ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
  ICH5: IDE controller at PCI slot :00:1f.1
  irq 5: nobody cared (try booting with the irqpoll option)
   [c1604556] show_trace_log_lvl+0x1a/0x2f
   [c1604c2c] show_trace+0x12/0x14
   [c1604cde] dump_stack+0x16/0x18
   [c164341c] __report_bad_irq+0x39/0x79
   [c16435eb] note_interrupt+0x18f/0x1c8
   [c1643ec6] handle_level_irq+0x95/0xcb
   [c1605dd8] do_IRQ+0xb4/0xe0
   ===
  handlers:
  [c174f55e] (skge_intr+0x0/0x3ff)
  Disabling IRQ #5
  
  I know this one :( 
  
  It seems to be related to the BIOS spinning up the CDROM and leaving the
  IDE controller in some weird state. When we come back the interrupt is
  screaming and nobody cares, so it gets disabled. I have no clue yet, how
  to handle this.
  
  Disabling the interrupt across suspend/resume helps, but does not work,
  when the interrupt is shared with some other device.
 
 Similar thing can happen during initialization.  I haven't actually
 instrumented the code but I think what happens is
 
 1. the controller has IRQ stuck high (infrequent but possible)
 2. the IRQ is already requested by another device
 3. the IRQ gets disabled due to screaming interrupts at the moment
 ata_piix does pci_enable_device().
 
 I think we can be much more resilient to screaming interrupts if we
 enable device with IRQ disabled and enable it after the device is
 initialized to some level, possibly when requesting IRQ.

The first thing the skge driver does is do a chip reset, and that should
cause IRQ to be disabled and cleared. The driver has no chance to
fix it if the BIOS left the IRQ screaming...

 
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Tejun Heo
Stephen Hemminger wrote:
 1. the controller has IRQ stuck high (infrequent but possible)
 2. the IRQ is already requested by another device
 3. the IRQ gets disabled due to screaming interrupts at the moment
 ata_piix does pci_enable_device().

 I think we can be much more resilient to screaming interrupts if we
 enable device with IRQ disabled and enable it after the device is
 initialized to some level, possibly when requesting IRQ.
 
 The first thing the skge driver does is do a chip reset, and that should
 cause IRQ to be disabled and cleared. The driver has no chance to
 fix it if the BIOS left the IRQ screaming...

What if we do something like...

pci_intx(pdev, 0);
pci_enable_device(pdev);
/* initialize */
request_irq(blah blah...);
pci_intx(pdev, 1);

Would this work for skge?

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Stephen Hemminger
On Tue, 13 Mar 2007 04:03:00 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

 Stephen Hemminger wrote:
  1. the controller has IRQ stuck high (infrequent but possible)
  2. the IRQ is already requested by another device
  3. the IRQ gets disabled due to screaming interrupts at the moment
  ata_piix does pci_enable_device().
 
  I think we can be much more resilient to screaming interrupts if we
  enable device with IRQ disabled and enable it after the device is
  initialized to some level, possibly when requesting IRQ.
  
  The first thing the skge driver does is do a chip reset, and that should
  cause IRQ to be disabled and cleared. The driver has no chance to
  fix it if the BIOS left the IRQ screaming...
 
 What if we do something like...
 
   pci_intx(pdev, 0);
   pci_enable_device(pdev);
   /* initialize */
   request_irq(blah blah...);
   pci_intx(pdev, 1);
 
 Would this work for skge?
 

Okay for testing, but any change like this should be done in the base
PCI layer, not one off in a particular driver.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc3-git4 ata1.00: qc timeout (cmd 0xef) (crashdump kernel)

2007-03-12 Thread Tejun Heo
Stephen Hemminger wrote:
 On Tue, 13 Mar 2007 04:03:00 +0900
 Tejun Heo [EMAIL PROTECTED] wrote:
 
 Stephen Hemminger wrote:
 1. the controller has IRQ stuck high (infrequent but possible)
 2. the IRQ is already requested by another device
 3. the IRQ gets disabled due to screaming interrupts at the moment
 ata_piix does pci_enable_device().

 I think we can be much more resilient to screaming interrupts if we
 enable device with IRQ disabled and enable it after the device is
 initialized to some level, possibly when requesting IRQ.
 The first thing the skge driver does is do a chip reset, and that should
 cause IRQ to be disabled and cleared. The driver has no chance to
 fix it if the BIOS left the IRQ screaming...
 What if we do something like...

  pci_intx(pdev, 0);
  pci_enable_device(pdev);
  /* initialize */
  request_irq(blah blah...);
  pci_intx(pdev, 1);

 Would this work for skge?

 
 Okay for testing, but any change like this should be done in the base
 PCI layer, not one off in a particular driver.

Yeap, it was a proof-of-concept pseudo code.  I attached a patch to do
above in skge.  Please point out if it is broken (e.g. intx needs to be
enabled earlier).

Michal, can you apply the attached patch and see whether it fixes the
problem.

Thanks.

-- 
tejun
diff --git a/drivers/net/skge.c b/drivers/net/skge.c
index eea75a4..2c990f2 100644
--- a/drivers/net/skge.c
+++ b/drivers/net/skge.c
@@ -3585,6 +3585,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
struct skge_hw *hw;
int err, using_dac = 0;
 
+   pci_intx(pdev, 0);
err = pci_enable_device(pdev);
if (err) {
dev_err(pdev-dev, cannot enable PCI device\n);
@@ -3669,6 +3670,7 @@ static int __devinit skge_probe(struct pci_dev *pdev,
   dev-name, pdev-irq);
goto err_out_unregister;
}
+   pci_intx(pdev, 1);
skge_show_addr(dev);
 
if (hw-ports  1  (dev1 = skge_devinit(hw, 1, using_dac))) {