Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-15 Thread Tobias Diedrich
Yeah, this is basically mostly copy-pasted from the sboot code,
would need some cleaning up.
I've been playing more a little with other bits of the hardware,
writing some test fw from scratch, mostly without using the builtin
rom (except for interrupts).

Oleksij Rempel wrote:
> Am 08.06.2017 um 00:39 schrieb Tobias Diedrich:
> > Oleksij Rempel wrote:
> >> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
> >>> Oleksij Rempel wrote:
> >>>> Yes, this is "normal" problem. The firmware has no error handler for PCI
> >>>> bus related exceptions. So if we filed to read PCI bus first time, we
> >>>> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> >>>> and provide an kernel "firmware panic!" message.
> >>>> Every one who can or will to fix this, is welcome.
> >>>>
> >>>>> *
> >>>>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> >>>>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> >>> [...]
> >>>
> >>>> memdmp 50ae78 50ae88
> >>>
> >>> 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
> >>>
> >>> [...copy to bin...]
> >>> $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
> >>> [..]
> >>>0:   6c1004  entry   a1, 32
> >>>3:   126aa2  l32ra2, 0xfffdaa8c
> >>>6:   0c0200  memw
> >>>9:   8820l32i.n  a8, a2, 0  <--Exception cause 
> >>> PC still points at load
> >>>b:   c020movi.n  a2, 0
> >>>d:   081940  extui   a9, a8, 1, 1
> >>>
> >>> Judging from that it should be fairly simple to at least implement
> >>> some sort of retry, possible after triggering a PCIe link retrain?
> >>
> >> I assume, yes.
> >>
> >>> There are some related PCIe root complex registers that may point to
> >>> what exactly failed if they were dumped.
> >>>
> >>> The root complex registers live at 0x0004 and I think match the
> >>> registers described for the root complex in the AR9344 datasheet.
> >>
> >> Suddenly I don't have ar7010 docs to tell..
> >>
> >>> PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
> >>> "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
> >>> the hierarchy reports any of the following errors and the associated
> >>> enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
> >>> ERR_NONFATAL."
> >>>
> >>> AFAICS link retrain can be done by setting bit3 (INIT_RST,
> >>> "Application request to initiate a training reset") in
> >>> PCIE_APP (0x4).
> >>>
> >>> See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
> >>> flips some bits in the RC to enable the PCIe bus for reading the
> >>> EEPROM).
> >>>
> >>> The root complex pci configuration space is at 0x2 which could
> >>> have further error details:
> >>>> memdmp 2 20200
> >>>
> >>> 02: a02a 168c 0010 0006  0001 0001   .*..
> >>> 020010:          
> >>> 020020:          
> >>> 020030:    0040    01ff  ...@
> >>> 020040: 5bc3 5001        [.P.
> >>> 020050: 0080 7005        ..p.
> >>> 020060:          
> >>> 020070: 0042 0010  8701  2010 0013 4411  .BD.
> >>> 020080: 3011    00c0 03c0    0...
> >>> 020090:    0010      
> >>> 0200a0:          
> >>> 0200b0:          
> >>> 0200c0:          
> >>> 0200d0:          
> >>> 0200e0:          
> >>> 0200f0:          
> >>> 020100: 1401 0001     0006 20

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-07 Thread Tobias Diedrich
Oleksij Rempel wrote:
> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
> > Oleksij Rempel wrote:
> >> Yes, this is "normal" problem. The firmware has no error handler for PCI
> >> bus related exceptions. So if we filed to read PCI bus first time, we
> >> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> >> and provide an kernel "firmware panic!" message.
> >> Every one who can or will to fix this, is welcome.
> >>
> >>> *
> >>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> >>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> > [...]
> > 
> >> memdmp 50ae78 50ae88
> > 
> > 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
> > 
> > [...copy to bin...]
> > $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
> > [..]
> >0:   6c1004  entry   a1, 32
> >3:   126aa2  l32ra2, 0xfffdaa8c
> >6:   0c0200  memw
> >9:   8820l32i.n  a8, a2, 0  <--Exception cause 
> > PC still points at load
> >b:   c020movi.n  a2, 0
> >d:   081940  extui   a9, a8, 1, 1
> > 
> > Judging from that it should be fairly simple to at least implement
> > some sort of retry, possible after triggering a PCIe link retrain?
> 
> I assume, yes.
> 
> > There are some related PCIe root complex registers that may point to
> > what exactly failed if they were dumped.
> > 
> > The root complex registers live at 0x0004 and I think match the
> > registers described for the root complex in the AR9344 datasheet.
> 
> Suddenly I don't have ar7010 docs to tell..
> 
> > PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
> > "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
> > the hierarchy reports any of the following errors and the associated
> > enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
> > ERR_NONFATAL."
> > 
> > AFAICS link retrain can be done by setting bit3 (INIT_RST,
> > "Application request to initiate a training reset") in
> > PCIE_APP (0x4).
> > 
> > See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
> > flips some bits in the RC to enable the PCIe bus for reading the
> > EEPROM).
> > 
> > The root complex pci configuration space is at 0x2 which could
> > have further error details:
> >> memdmp 2 20200
> > 
> > 02: a02a 168c 0010 0006  0001 0001   .*..
> > 020010:          
> > 020020:          
> > 020030:    0040    01ff  ...@
> > 020040: 5bc3 5001        [.P.
> > 020050: 0080 7005        ..p.
> > 020060:          
> > 020070: 0042 0010  8701  2010 0013 4411  .BD.
> > 020080: 3011    00c0 03c0    0...
> > 020090:    0010      
> > 0200a0:          
> > 0200b0:          
> > 0200c0:          
> > 0200d0:          
> > 0200e0:          
> > 0200f0:          
> > 020100: 1401 0001     0006 2030  ...0
> > 020110:    2000  00a0    
> > 020120:          
> > 020130:          
> > 020140: 0001 0002        
> > 020150:   8000 00ff      
> > 020160:          
> > 020170:          
> > 020180:          
> > 020190:          
> > 0201a0:          
> > 0201b0:          
> > 0201c0:          
> > 0

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-06 Thread Tobias Diedrich
Oleksij Rempel wrote:
> Yes, this is "normal" problem. The firmware has no error handler for PCI
> bus related exceptions. So if we filed to read PCI bus first time, we
> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> and provide an kernel "firmware panic!" message.
> Every one who can or will to fix this, is welcome.
> 
> > *
> > Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> > exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
[...]

>memdmp 50ae78 50ae88

50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@

[...copy to bin...]
$ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
[..]
   0:   6c1004  entry   a1, 32
   3:   126aa2  l32ra2, 0xfffdaa8c
   6:   0c0200  memw
   9:   8820l32i.n  a8, a2, 0  <--Exception cause PC 
still points at load
   b:   c020movi.n  a2, 0
   d:   081940  extui   a9, a8, 1, 1

Judging from that it should be fairly simple to at least implement
some sort of retry, possible after triggering a PCIe link retrain?
There are some related PCIe root complex registers that may point to
what exactly failed if they were dumped.

The root complex registers live at 0x0004 and I think match the
registers described for the root complex in the AR9344 datasheet.

PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
"A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
the hierarchy reports any of the following errors and the associated
enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
ERR_NONFATAL."

AFAICS link retrain can be done by setting bit3 (INIT_RST,
"Application request to initiate a training reset") in
PCIE_APP (0x4).

See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
flips some bits in the RC to enable the PCIe bus for reading the
EEPROM).

The root complex pci configuration space is at 0x2 which could
have further error details:
>memdmp 2 20200

02: a02a 168c 0010 0006  0001 0001   .*..
020010:          
020020:          
020030:    0040    01ff  ...@
020040: 5bc3 5001        [.P.
020050: 0080 7005        ..p.
020060:          
020070: 0042 0010  8701  2010 0013 4411  .BD.
020080: 3011    00c0 03c0    0...
020090:    0010      
0200a0:          
0200b0:          
0200c0:          
0200d0:          
0200e0:          
0200f0:          
020100: 1401 0001     0006 2030  ...0
020110:    2000  00a0    
020120:          
020130:          
020140: 0001 0002        
020150:   8000 00ff      
020160:          
020170:          
020180:          
020190:          
0201a0:          
0201b0:          
0201c0:          
0201d0:          
0201e0:          
0201f0:          

Transformed into something suitable for feeding into lspci -F:

00:00.0 Description filled in by lspci
00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 00 42 00 01 87 00 00 10 20 00 00 11 44 13 00
80: 00 00 11 30 00 00 00 00 c0 03 c0 00 00 00 00 00
90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 

Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-16 Thread Tobias Diedrich
Tobias Diedrich wrote:
 Jeff Garzik wrote:
  Tobias Diedrich wrote:
  Tobias Diedrich wrote:
  Ayaz Abdulla wrote:
  For all those who are having issues, please try out the attached patch.
  Will try.
  
  Does not apply cleanly against 2.6.20, is this one fixed up right?
  
  It probably needs to be top of 2.6.20-git-latest or 2.6.20-rc6-mm3.
  
  IOW, the forcedeth changes in question are not in 2.6.20, and you need 
  to apply the patch on top of the latest batch of forcedeth changes.
 
 Well, it hasn't blown up on me despite being applied to 2.6.20...
 The problem I was seeing might even be fixed in 2.6.20 vanilla,
 since the last version I saw it in was 2.6.20-rc6 and then I
 reverted to 2.6.19 to make sure that one is ok (see
 [EMAIL PROTECTED]).

And having run vanilla 2.6.20 since my last mail, I haven't seen the
problem on that one either.
So I _guess_ the particular problem I was seeing was fixed somewhere
between 2.6.20-rc6 and 2.6.20.  But since I can't reliably trigger
it, I can't say that for sure.

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-11 Thread Tobias Diedrich
Jeff Garzik wrote:
 Tobias Diedrich wrote:
 Tobias Diedrich wrote:
 Ayaz Abdulla wrote:
 For all those who are having issues, please try out the attached patch.
 Will try.
 
 Does not apply cleanly against 2.6.20, is this one fixed up right?
 
 It probably needs to be top of 2.6.20-git-latest or 2.6.20-rc6-mm3.
 
 IOW, the forcedeth changes in question are not in 2.6.20, and you need 
 to apply the patch on top of the latest batch of forcedeth changes.

Well, it hasn't blown up on me despite being applied to 2.6.20...
The problem I was seeing might even be fixed in 2.6.20 vanilla,
since the last version I saw it in was 2.6.20-rc6 and then I
reverted to 2.6.19 to make sure that one is ok (see
[EMAIL PROTECTED]).

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-09 Thread Tobias Diedrich
Tobias Diedrich wrote:
 Ayaz Abdulla wrote:
  For all those who are having issues, please try out the attached patch.
 
 Will try.

Does not apply cleanly against 2.6.20, is this one fixed up right?
--- linux-2.6.20/drivers/net/forcedeth.c.orig   2007-02-09 13:02:02.0 
+0100
+++ linux-2.6.20/drivers/net/forcedeth.c.new2007-02-09 13:03:45.0 
+0100
@@ -2603,10 +2603,16 @@
struct fe_priv *np = netdev_priv(dev);
u8 __iomem *base = get_hwbase(dev);
unsigned long flags;
+   u32 retcode;
 
-   pkts = nv_rx_process(dev, limit);
+   if (np-desc_ver == DESC_VER_1 || np-desc_ver == DESC_VER_2) {
+   pkts = nv_rx_process(dev, limit);
+   retcode = nv_alloc_rx(dev);
+   } else {
+   retcode = nv_alloc_rx_optimized(dev);
+   }
 
-   if (nv_alloc_rx(dev)) {
+   if (retcode) {
spin_lock_irqsave(np-lock, flags);
if (!np-in_shutdown)
mod_timer(np-oom_kick, jiffies + OOM_REFILL);

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: forcedeth problems on 2.6.20-rc6-mm3

2007-02-09 Thread Tobias Diedrich
Ayaz Abdulla wrote:
 For all those who are having issues, please try out the attached patch.

Will try.
I reverted to 2.6.19 w/o suspend/resume patch last weekend to make
sure on 2.6.19 forcedeth is stable and noticed something odd:

Because I didn't include the suspend/resume patch I obviously had to
a down/rmmod/modprobe/up cycle after each resume and I noticed that
the behaviour seems to alternate between resumes:

Behaviour 1:
  After modprobe I get two interface 'eth0' and 'eth1' for the two
  ports, as expected.

Behaviour 2:
  After modprobe I get one interface 'eth3' (which should be 'eth1')
  and one interface with increasing numbers (which should be 'eth0',
  last resume it was 'eth12' IIRC).

As I said if I get behaviour 1 on one resume I get behaviour 2 on
the next resume and vice versa.  That seems rather odd to me.

On a not quite related note, forcedeth shows a different ethtool
output (compared to e100), when no cable is connected to the port:

forcedeth, no cable connected:
|Settings for eth1:
|   Supported ports: [ MII ]
|   Supported link modes:   10baseT/Half 10baseT/Full 
|   100baseT/Half 100baseT/Full 
|   1000baseT/Full 
|   Supports auto-negotiation: Yes
|   Advertised link modes:  10baseT/Half 10baseT/Full 
|   100baseT/Half 100baseT/Full 
|   1000baseT/Full 
|   Advertised auto-negotiation: Yes
|   Speed: Unknown! (65535)
|   Duplex: Unknown! (255)
|   Port: MII
|   PHYAD: 1
|   Transceiver: external
|   Auto-negotiation: on
|   Supports Wake-on: g
|   Wake-on: d
|   Link detected: no

e100, no cable connected:
|Settings for eth0:
|Supported ports: [ TP MII ]
|Supported link modes:   10baseT/Half 10baseT/Full 
|100baseT/Half 100baseT/Full 
|Supports auto-negotiation: Yes
|Advertised link modes:  10baseT/Half 10baseT/Full 
|100baseT/Half 100baseT/Full 
|Advertised auto-negotiation: Yes
|Speed: 10Mb/s
|Duplex: Half
|Port: MII
|PHYAD: 1
|Transceiver: internal
|Auto-negotiation: on
|Supports Wake-on: g
|Wake-on: g
|Current message level: 0x0007 (7)
|Link detected: no

Note that e100 returns the lowest possible speed if no link is
detected, while forcedeth seems to return -1, which ethtool doesn't
seem to recognise as a valid response (I guess, why else would it
show the number after 'Unknown!').

-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


forcedeth broken powermanagement/irq handling ?

2006-09-23 Thread Tobias Diedrich
Hi,

since there hasn't been much progress with the bugzilla entry I'm
bringing this issue to your attention here. :)

http://bugzilla.kernel.org/show_bug.cgi?id=6398

vanilla forcedeth doesn't seem to support suspend and an
ifdown/up-cycle is needed to get it working again after suspend.
Francois Romieu's Awfully experimental patch is working just fine
for me (with message signalled interrupts disabled) and has survived
quite a few suspend/resume cycles.

So I'd very much like to see (at least partial, with msi disabled)
suspend support for forcedeth in mainline. 

Romieu's patch:

--- linux-2.6.18-rc6/drivers/net/forcedeth.c2006-09-09 09:45:43.0 
+0200
+++ linux-2.6.17.11-xen/drivers/net/forcedeth.c 2006-09-09 09:41:25.0 
+0200
@@ -4433,6 +4433,50 @@
pci_set_drvdata(pci_dev, NULL);
 }
 
+
+#ifdef CONFIG_PM
+
+static int nv_suspend(struct pci_dev *pdev, pm_message_t state)
+{
+   struct net_device *dev = pci_get_drvdata(pdev);
+   struct fe_priv *np = netdev_priv(dev);
+
+   if (!netif_running(dev))
+   goto out;
+
+   netif_device_detach(dev);
+
+   // Gross.
+   nv_close(dev);
+
+   pci_save_state(pdev);
+   pci_enable_wake(pdev, pci_choose_state(pdev, state), np-wolenabled);
+   pci_set_power_state(pdev, pci_choose_state(pdev, state));
+out:
+   return 0;
+}
+
+static int nv_resume(struct pci_dev *pdev)
+{
+   struct net_device *dev = pci_get_drvdata(pdev);
+   int rc = 0;
+
+   if (!netif_running(dev))
+   goto out;
+
+   netif_device_attach(dev);
+
+   pci_set_power_state(pdev, PCI_D0);
+   pci_restore_state(pdev);
+   pci_enable_wake(pdev, PCI_D0, 0);
+
+   rc = nv_open(dev);
+out:
+   return rc;
+}
+
+#endif /* CONFIG_PM */
+
 static struct pci_device_id pci_tbl[] = {
{   /* nForce Ethernet Controller */
PCI_DEVICE(PCI_VENDOR_ID_NVIDIA, PCI_DEVICE_ID_NVIDIA_NVENET_1),
@@ -4534,6 +4578,10 @@
.id_table = pci_tbl,
.probe = nv_probe,
.remove = __devexit_p(nv_remove),
+#ifdef CONFIG_PM
+   .suspend= nv_suspend,
+   .resume = nv_resume,
+#endif
 };
 
 
-- 
Tobias  PGP: http://9ac7e0bc.uguu.de
このメールは十割再利用されたビットで作られています。
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html