Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)
Yeah, this is basically mostly copy-pasted from the sboot code, would need some cleaning up. I've been playing more a little with other bits of the hardware, writing some test fw from scratch, mostly without using the builtin rom (except for interrupts). Oleksij Rempel wrote: > Am 08.06.2017 um 00:39 schrieb Tobias Diedrich: > > Oleksij Rempel wrote: > >> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich: > >>> Oleksij Rempel wrote: > >>>> Yes, this is "normal" problem. The firmware has no error handler for PCI > >>>> bus related exceptions. So if we filed to read PCI bus first time, we > >>>> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot > >>>> and provide an kernel "firmware panic!" message. > >>>> Every one who can or will to fix this, is welcome. > >>>> > >>>>> * > >>>>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic! > >>>>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038. > >>> [...] > >>> > >>>> memdmp 50ae78 50ae88 > >>> > >>> 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940 l...j..@ > >>> > >>> [...copy to bin...] > >>> $ bin/objdump -b binary -m xtensa -D /tmp/memdump.bin > >>> [..] > >>>0: 6c1004 entry a1, 32 > >>>3: 126aa2 l32ra2, 0xfffdaa8c > >>>6: 0c0200 memw > >>>9: 8820l32i.n a8, a2, 0 <--Exception cause > >>> PC still points at load > >>>b: c020movi.n a2, 0 > >>>d: 081940 extui a9, a8, 1, 1 > >>> > >>> Judging from that it should be fairly simple to at least implement > >>> some sort of retry, possible after triggering a PCIe link retrain? > >> > >> I assume, yes. > >> > >>> There are some related PCIe root complex registers that may point to > >>> what exactly failed if they were dumped. > >>> > >>> The root complex registers live at 0x0004 and I think match the > >>> registers described for the root complex in the AR9344 datasheet. > >> > >> Suddenly I don't have ar7010 docs to tell.. > >> > >>> PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR: > >>> "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in > >>> the hierarchy reports any of the following errors and the associated > >>> enable bit is set in the Root Control register: ERR_COR, ERR_FATAL, > >>> ERR_NONFATAL." > >>> > >>> AFAICS link retrain can be done by setting bit3 (INIT_RST, > >>> "Application request to initiate a training reset") in > >>> PCIE_APP (0x4). > >>> > >>> See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which > >>> flips some bits in the RC to enable the PCIe bus for reading the > >>> EEPROM). > >>> > >>> The root complex pci configuration space is at 0x2 which could > >>> have further error details: > >>>> memdmp 2 20200 > >>> > >>> 02: a02a 168c 0010 0006 0001 0001 .*.. > >>> 020010: > >>> 020020: > >>> 020030: 0040 01ff ...@ > >>> 020040: 5bc3 5001 [.P. > >>> 020050: 0080 7005 ..p. > >>> 020060: > >>> 020070: 0042 0010 8701 2010 0013 4411 .BD. > >>> 020080: 3011 00c0 03c0 0... > >>> 020090: 0010 > >>> 0200a0: > >>> 0200b0: > >>> 0200c0: > >>> 0200d0: > >>> 0200e0: > >>> 0200f0: > >>> 020100: 1401 0001 0006 20
Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)
Oleksij Rempel wrote: > Am 07.06.2017 um 02:12 schrieb Tobias Diedrich: > > Oleksij Rempel wrote: > >> Yes, this is "normal" problem. The firmware has no error handler for PCI > >> bus related exceptions. So if we filed to read PCI bus first time, we > >> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot > >> and provide an kernel "firmware panic!" message. > >> Every one who can or will to fix this, is welcome. > >> > >>> * > >>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic! > >>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038. > > [...] > > > >> memdmp 50ae78 50ae88 > > > > 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940 l...j..@ > > > > [...copy to bin...] > > $ bin/objdump -b binary -m xtensa -D /tmp/memdump.bin > > [..] > >0: 6c1004 entry a1, 32 > >3: 126aa2 l32ra2, 0xfffdaa8c > >6: 0c0200 memw > >9: 8820l32i.n a8, a2, 0 <--Exception cause > > PC still points at load > >b: c020movi.n a2, 0 > >d: 081940 extui a9, a8, 1, 1 > > > > Judging from that it should be fairly simple to at least implement > > some sort of retry, possible after triggering a PCIe link retrain? > > I assume, yes. > > > There are some related PCIe root complex registers that may point to > > what exactly failed if they were dumped. > > > > The root complex registers live at 0x0004 and I think match the > > registers described for the root complex in the AR9344 datasheet. > > Suddenly I don't have ar7010 docs to tell.. > > > PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR: > > "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in > > the hierarchy reports any of the following errors and the associated > > enable bit is set in the Root Control register: ERR_COR, ERR_FATAL, > > ERR_NONFATAL." > > > > AFAICS link retrain can be done by setting bit3 (INIT_RST, > > "Application request to initiate a training reset") in > > PCIE_APP (0x4). > > > > See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which > > flips some bits in the RC to enable the PCIe bus for reading the > > EEPROM). > > > > The root complex pci configuration space is at 0x2 which could > > have further error details: > >> memdmp 2 20200 > > > > 02: a02a 168c 0010 0006 0001 0001 .*.. > > 020010: > > 020020: > > 020030: 0040 01ff ...@ > > 020040: 5bc3 5001 [.P. > > 020050: 0080 7005 ..p. > > 020060: > > 020070: 0042 0010 8701 2010 0013 4411 .BD. > > 020080: 3011 00c0 03c0 0... > > 020090: 0010 > > 0200a0: > > 0200b0: > > 0200c0: > > 0200d0: > > 0200e0: > > 0200f0: > > 020100: 1401 0001 0006 2030 ...0 > > 020110: 2000 00a0 > > 020120: > > 020130: > > 020140: 0001 0002 > > 020150: 8000 00ff > > 020160: > > 020170: > > 020180: > > 020190: > > 0201a0: > > 0201b0: > > 0201c0: > > 0
Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)
Oleksij Rempel wrote: > Yes, this is "normal" problem. The firmware has no error handler for PCI > bus related exceptions. So if we filed to read PCI bus first time, we > have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot > and provide an kernel "firmware panic!" message. > Every one who can or will to fix this, is welcome. > > > * > > Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic! > > exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038. [...] >memdmp 50ae78 50ae88 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940 l...j..@ [...copy to bin...] $ bin/objdump -b binary -m xtensa -D /tmp/memdump.bin [..] 0: 6c1004 entry a1, 32 3: 126aa2 l32ra2, 0xfffdaa8c 6: 0c0200 memw 9: 8820l32i.n a8, a2, 0 <--Exception cause PC still points at load b: c020movi.n a2, 0 d: 081940 extui a9, a8, 1, 1 Judging from that it should be fairly simple to at least implement some sort of retry, possible after triggering a PCIe link retrain? There are some related PCIe root complex registers that may point to what exactly failed if they were dumped. The root complex registers live at 0x0004 and I think match the registers described for the root complex in the AR9344 datasheet. PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR: "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in the hierarchy reports any of the following errors and the associated enable bit is set in the Root Control register: ERR_COR, ERR_FATAL, ERR_NONFATAL." AFAICS link retrain can be done by setting bit3 (INIT_RST, "Application request to initiate a training reset") in PCIE_APP (0x4). See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which flips some bits in the RC to enable the PCIe bus for reading the EEPROM). The root complex pci configuration space is at 0x2 which could have further error details: >memdmp 2 20200 02: a02a 168c 0010 0006 0001 0001 .*.. 020010: 020020: 020030: 0040 01ff ...@ 020040: 5bc3 5001 [.P. 020050: 0080 7005 ..p. 020060: 020070: 0042 0010 8701 2010 0013 4411 .BD. 020080: 3011 00c0 03c0 0... 020090: 0010 0200a0: 0200b0: 0200c0: 0200d0: 0200e0: 0200f0: 020100: 1401 0001 0006 2030 ...0 020110: 2000 00a0 020120: 020130: 020140: 0001 0002 020150: 8000 00ff 020160: 020170: 020180: 020190: 0201a0: 0201b0: 0201c0: 0201d0: 0201e0: 0201f0: Transformed into something suitable for feeding into lspci -F: 00:00.0 Description filled in by lspci 00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00 40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00 50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 10 00 42 00 01 87 00 00 10 20 00 00 11 44 13 00 80: 00 00 11 30 00 00 00 00 c0 03 c0 00 00 00 00 00 90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00
Re: forcedeth problems on 2.6.20-rc6-mm3
Tobias Diedrich wrote: Jeff Garzik wrote: Tobias Diedrich wrote: Tobias Diedrich wrote: Ayaz Abdulla wrote: For all those who are having issues, please try out the attached patch. Will try. Does not apply cleanly against 2.6.20, is this one fixed up right? It probably needs to be top of 2.6.20-git-latest or 2.6.20-rc6-mm3. IOW, the forcedeth changes in question are not in 2.6.20, and you need to apply the patch on top of the latest batch of forcedeth changes. Well, it hasn't blown up on me despite being applied to 2.6.20... The problem I was seeing might even be fixed in 2.6.20 vanilla, since the last version I saw it in was 2.6.20-rc6 and then I reverted to 2.6.19 to make sure that one is ok (see [EMAIL PROTECTED]). And having run vanilla 2.6.20 since my last mail, I haven't seen the problem on that one either. So I _guess_ the particular problem I was seeing was fixed somewhere between 2.6.20-rc6 and 2.6.20. But since I can't reliably trigger it, I can't say that for sure. -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: forcedeth problems on 2.6.20-rc6-mm3
Jeff Garzik wrote: Tobias Diedrich wrote: Tobias Diedrich wrote: Ayaz Abdulla wrote: For all those who are having issues, please try out the attached patch. Will try. Does not apply cleanly against 2.6.20, is this one fixed up right? It probably needs to be top of 2.6.20-git-latest or 2.6.20-rc6-mm3. IOW, the forcedeth changes in question are not in 2.6.20, and you need to apply the patch on top of the latest batch of forcedeth changes. Well, it hasn't blown up on me despite being applied to 2.6.20... The problem I was seeing might even be fixed in 2.6.20 vanilla, since the last version I saw it in was 2.6.20-rc6 and then I reverted to 2.6.19 to make sure that one is ok (see [EMAIL PROTECTED]). -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: forcedeth problems on 2.6.20-rc6-mm3
Tobias Diedrich wrote: Ayaz Abdulla wrote: For all those who are having issues, please try out the attached patch. Will try. Does not apply cleanly against 2.6.20, is this one fixed up right? --- linux-2.6.20/drivers/net/forcedeth.c.orig 2007-02-09 13:02:02.0 +0100 +++ linux-2.6.20/drivers/net/forcedeth.c.new2007-02-09 13:03:45.0 +0100 @@ -2603,10 +2603,16 @@ struct fe_priv *np = netdev_priv(dev); u8 __iomem *base = get_hwbase(dev); unsigned long flags; + u32 retcode; - pkts = nv_rx_process(dev, limit); + if (np-desc_ver == DESC_VER_1 || np-desc_ver == DESC_VER_2) { + pkts = nv_rx_process(dev, limit); + retcode = nv_alloc_rx(dev); + } else { + retcode = nv_alloc_rx_optimized(dev); + } - if (nv_alloc_rx(dev)) { + if (retcode) { spin_lock_irqsave(np-lock, flags); if (!np-in_shutdown) mod_timer(np-oom_kick, jiffies + OOM_REFILL); -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: forcedeth problems on 2.6.20-rc6-mm3
Ayaz Abdulla wrote: For all those who are having issues, please try out the attached patch. Will try. I reverted to 2.6.19 w/o suspend/resume patch last weekend to make sure on 2.6.19 forcedeth is stable and noticed something odd: Because I didn't include the suspend/resume patch I obviously had to a down/rmmod/modprobe/up cycle after each resume and I noticed that the behaviour seems to alternate between resumes: Behaviour 1: After modprobe I get two interface 'eth0' and 'eth1' for the two ports, as expected. Behaviour 2: After modprobe I get one interface 'eth3' (which should be 'eth1') and one interface with increasing numbers (which should be 'eth0', last resume it was 'eth12' IIRC). As I said if I get behaviour 1 on one resume I get behaviour 2 on the next resume and vice versa. That seems rather odd to me. On a not quite related note, forcedeth shows a different ethtool output (compared to e100), when no cable is connected to the port: forcedeth, no cable connected: |Settings for eth1: | Supported ports: [ MII ] | Supported link modes: 10baseT/Half 10baseT/Full | 100baseT/Half 100baseT/Full | 1000baseT/Full | Supports auto-negotiation: Yes | Advertised link modes: 10baseT/Half 10baseT/Full | 100baseT/Half 100baseT/Full | 1000baseT/Full | Advertised auto-negotiation: Yes | Speed: Unknown! (65535) | Duplex: Unknown! (255) | Port: MII | PHYAD: 1 | Transceiver: external | Auto-negotiation: on | Supports Wake-on: g | Wake-on: d | Link detected: no e100, no cable connected: |Settings for eth0: |Supported ports: [ TP MII ] |Supported link modes: 10baseT/Half 10baseT/Full |100baseT/Half 100baseT/Full |Supports auto-negotiation: Yes |Advertised link modes: 10baseT/Half 10baseT/Full |100baseT/Half 100baseT/Full |Advertised auto-negotiation: Yes |Speed: 10Mb/s |Duplex: Half |Port: MII |PHYAD: 1 |Transceiver: internal |Auto-negotiation: on |Supports Wake-on: g |Wake-on: g |Current message level: 0x0007 (7) |Link detected: no Note that e100 returns the lowest possible speed if no link is detected, while forcedeth seems to return -1, which ethtool doesn't seem to recognise as a valid response (I guess, why else would it show the number after 'Unknown!'). -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
forcedeth broken powermanagement/irq handling ?
Hi, since there hasn't been much progress with the bugzilla entry I'm bringing this issue to your attention here. :) http://bugzilla.kernel.org/show_bug.cgi?id=6398 vanilla forcedeth doesn't seem to support suspend and an ifdown/up-cycle is needed to get it working again after suspend. Francois Romieu's Awfully experimental patch is working just fine for me (with message signalled interrupts disabled) and has survived quite a few suspend/resume cycles. So I'd very much like to see (at least partial, with msi disabled) suspend support for forcedeth in mainline. Romieu's patch: --- linux-2.6.18-rc6/drivers/net/forcedeth.c2006-09-09 09:45:43.0 +0200 +++ linux-2.6.17.11-xen/drivers/net/forcedeth.c 2006-09-09 09:41:25.0 +0200 @@ -4433,6 +4433,50 @@ pci_set_drvdata(pci_dev, NULL); } + +#ifdef CONFIG_PM + +static int nv_suspend(struct pci_dev *pdev, pm_message_t state) +{ + struct net_device *dev = pci_get_drvdata(pdev); + struct fe_priv *np = netdev_priv(dev); + + if (!netif_running(dev)) + goto out; + + netif_device_detach(dev); + + // Gross. + nv_close(dev); + + pci_save_state(pdev); + pci_enable_wake(pdev, pci_choose_state(pdev, state), np-wolenabled); + pci_set_power_state(pdev, pci_choose_state(pdev, state)); +out: + return 0; +} + +static int nv_resume(struct pci_dev *pdev) +{ + struct net_device *dev = pci_get_drvdata(pdev); + int rc = 0; + + if (!netif_running(dev)) + goto out; + + netif_device_attach(dev); + + pci_set_power_state(pdev, PCI_D0); + pci_restore_state(pdev); + pci_enable_wake(pdev, PCI_D0, 0); + + rc = nv_open(dev); +out: + return rc; +} + +#endif /* CONFIG_PM */ + static struct pci_device_id pci_tbl[] = { { /* nForce Ethernet Controller */ PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, PCI_DEVICE_ID_NVIDIA_NVENET_1), @@ -4534,6 +4578,10 @@ .id_table = pci_tbl, .probe = nv_probe, .remove = __devexit_p(nv_remove), +#ifdef CONFIG_PM + .suspend= nv_suspend, + .resume = nv_resume, +#endif }; -- Tobias PGP: http://9ac7e0bc.uguu.de このメールは十割再利用されたビットで作られています。 - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html