Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2018-04-24 Thread Nathan Royce
I finally got around to applying your patch, building the toolchain
(based on master source (gcc8)), but alas while there is no firmware
panic in the log, wifi drops off the face of the planet (ssid
disappears and hostapd doesn't know wifi failed (nothing in the log
either)).

On Wed, Jun 7, 2017 at 5:39 PM, Tobias Diedrich
 wrote:
> Oleksij Rempel wrote:
>> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
>> > Oleksij Rempel wrote:
>> >> Yes, this is "normal" problem. The firmware has no error handler for PCI
>> >> bus related exceptions. So if we filed to read PCI bus first time, we
>> >> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
>> >> and provide an kernel "firmware panic!" message.
>> >> Every one who can or will to fix this, is welcome.
>> >>
>> >>> *
>> >>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
>> >>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
>> > [...]
>> >
>> >> memdmp 50ae78 50ae88
>> >
>> > 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
>> >
>> > [...copy to bin...]
>> > $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin
>> > [..]
>> >0:   6c1004  entry   a1, 32
>> >3:   126aa2  l32ra2, 0xfffdaa8c
>> >6:   0c0200  memw
>> >9:   8820l32i.n  a8, a2, 0  <--Exception cause 
>> > PC still points at load
>> >b:   c020movi.n  a2, 0
>> >d:   081940  extui   a9, a8, 1, 1
>> >
>> > Judging from that it should be fairly simple to at least implement
>> > some sort of retry, possible after triggering a PCIe link retrain?
>>
>> I assume, yes.
>>
>> > There are some related PCIe root complex registers that may point to
>> > what exactly failed if they were dumped.
>> >
>> > The root complex registers live at 0x0004 and I think match the
>> > registers described for the root complex in the AR9344 datasheet.
>>
>> Suddenly I don't have ar7010 docs to tell..
>>
>> > PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
>> > "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
>> > the hierarchy reports any of the following errors and the associated
>> > enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
>> > ERR_NONFATAL."
>> >
>> > AFAICS link retrain can be done by setting bit3 (INIT_RST,
>> > "Application request to initiate a training reset") in
>> > PCIE_APP (0x4).
>> >
>> > See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
>> > flips some bits in the RC to enable the PCIe bus for reading the
>> > EEPROM).
>> >
>> > The root complex pci configuration space is at 0x2 which could
>> > have further error details:
>> >> memdmp 2 20200
>> >
>> > 02: a02a 168c 0010 0006  0001 0001   .*..
>> > 020010:          
>> > 020020:          
>> > 020030:    0040    01ff  ...@
>> > 020040: 5bc3 5001        [.P.
>> > 020050: 0080 7005        ..p.
>> > 020060:          
>> > 020070: 0042 0010  8701  2010 0013 4411  .BD.
>> > 020080: 3011    00c0 03c0    0...
>> > 020090:    0010      
>> > 0200a0:          
>> > 0200b0:          
>> > 0200c0:          
>> > 0200d0:          
>> > 0200e0:          
>> > 0200f0:          
>> > 020100: 1401 0001     0006 2030  ...0
>> > 020110:    2000  00a0    
>> > 020120:          
>> > 020130:          
>> > 020140: 0001 0002        
>> > 020150:   8000 00ff      
>> > 020160:          
>> > 020170:          
>> > 020180:          
>> > 020190:          
>> > 0201a0:          
>> > 0201b0:          
>> > 0201c0:          
>> > 0201d0:          
>> > 0201e0:          
>> > 0201f0:  

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-15 Thread Tobias Diedrich
Yeah, this is basically mostly copy-pasted from the sboot code,
would need some cleaning up.
I've been playing more a little with other bits of the hardware,
writing some test fw from scratch, mostly without using the builtin
rom (except for interrupts).

Oleksij Rempel wrote:
> Am 08.06.2017 um 00:39 schrieb Tobias Diedrich:
> > Oleksij Rempel wrote:
> >> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
> >>> Oleksij Rempel wrote:
>  Yes, this is "normal" problem. The firmware has no error handler for PCI
>  bus related exceptions. So if we filed to read PCI bus first time, we
>  have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
>  and provide an kernel "firmware panic!" message.
>  Every one who can or will to fix this, is welcome.
> 
> > *
> > Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> > exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> >>> [...]
> >>>
>  memdmp 50ae78 50ae88
> >>>
> >>> 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
> >>>
> >>> [...copy to bin...]
> >>> $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
> >>> [..]
> >>>0:   6c1004  entry   a1, 32
> >>>3:   126aa2  l32ra2, 0xfffdaa8c
> >>>6:   0c0200  memw
> >>>9:   8820l32i.n  a8, a2, 0  <--Exception cause 
> >>> PC still points at load
> >>>b:   c020movi.n  a2, 0
> >>>d:   081940  extui   a9, a8, 1, 1
> >>>
> >>> Judging from that it should be fairly simple to at least implement
> >>> some sort of retry, possible after triggering a PCIe link retrain?
> >>
> >> I assume, yes.
> >>
> >>> There are some related PCIe root complex registers that may point to
> >>> what exactly failed if they were dumped.
> >>>
> >>> The root complex registers live at 0x0004 and I think match the
> >>> registers described for the root complex in the AR9344 datasheet.
> >>
> >> Suddenly I don't have ar7010 docs to tell..
> >>
> >>> PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
> >>> "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
> >>> the hierarchy reports any of the following errors and the associated
> >>> enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
> >>> ERR_NONFATAL."
> >>>
> >>> AFAICS link retrain can be done by setting bit3 (INIT_RST,
> >>> "Application request to initiate a training reset") in
> >>> PCIE_APP (0x4).
> >>>
> >>> See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
> >>> flips some bits in the RC to enable the PCIe bus for reading the
> >>> EEPROM).
> >>>
> >>> The root complex pci configuration space is at 0x2 which could
> >>> have further error details:
>  memdmp 2 20200
> >>>
> >>> 02: a02a 168c 0010 0006  0001 0001   .*..
> >>> 020010:          
> >>> 020020:          
> >>> 020030:    0040    01ff  ...@
> >>> 020040: 5bc3 5001        [.P.
> >>> 020050: 0080 7005        ..p.
> >>> 020060:          
> >>> 020070: 0042 0010  8701  2010 0013 4411  .BD.
> >>> 020080: 3011    00c0 03c0    0...
> >>> 020090:    0010      
> >>> 0200a0:          
> >>> 0200b0:          
> >>> 0200c0:          
> >>> 0200d0:          
> >>> 0200e0:          
> >>> 0200f0:          
> >>> 020100: 1401 0001     0006 2030  ...0
> >>> 020110:    2000  00a0    
> >>> 020120:          
> >>> 020130:          
> >>> 020140: 0001 0002        
> >>> 020150:   8000 00ff      
> >>> 020160:          
> >>> 020170:          
> >>> 020180:          
> >>> 020190:          
> >>> 0201a0:          
> >>> 0201b0:          
> >>> 0201c0:          
> >>> 0201d0:          
> >>> 0201e0:    

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-08 Thread Oleksij Rempel
Am 08.06.2017 um 00:39 schrieb Tobias Diedrich:
> Oleksij Rempel wrote:
>> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
>>> Oleksij Rempel wrote:
 Yes, this is "normal" problem. The firmware has no error handler for PCI
 bus related exceptions. So if we filed to read PCI bus first time, we
 have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
 and provide an kernel "firmware panic!" message.
 Every one who can or will to fix this, is welcome.

> *
> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
>>> [...]
>>>
 memdmp 50ae78 50ae88
>>>
>>> 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
>>>
>>> [...copy to bin...]
>>> $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
>>> [..]
>>>0:   6c1004  entry   a1, 32
>>>3:   126aa2  l32ra2, 0xfffdaa8c
>>>6:   0c0200  memw
>>>9:   8820l32i.n  a8, a2, 0  <--Exception cause 
>>> PC still points at load
>>>b:   c020movi.n  a2, 0
>>>d:   081940  extui   a9, a8, 1, 1
>>>
>>> Judging from that it should be fairly simple to at least implement
>>> some sort of retry, possible after triggering a PCIe link retrain?
>>
>> I assume, yes.
>>
>>> There are some related PCIe root complex registers that may point to
>>> what exactly failed if they were dumped.
>>>
>>> The root complex registers live at 0x0004 and I think match the
>>> registers described for the root complex in the AR9344 datasheet.
>>
>> Suddenly I don't have ar7010 docs to tell..
>>
>>> PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
>>> "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
>>> the hierarchy reports any of the following errors and the associated
>>> enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
>>> ERR_NONFATAL."
>>>
>>> AFAICS link retrain can be done by setting bit3 (INIT_RST,
>>> "Application request to initiate a training reset") in
>>> PCIE_APP (0x4).
>>>
>>> See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
>>> flips some bits in the RC to enable the PCIe bus for reading the
>>> EEPROM).
>>>
>>> The root complex pci configuration space is at 0x2 which could
>>> have further error details:
 memdmp 2 20200
>>>
>>> 02: a02a 168c 0010 0006  0001 0001   .*..
>>> 020010:          
>>> 020020:          
>>> 020030:    0040    01ff  ...@
>>> 020040: 5bc3 5001        [.P.
>>> 020050: 0080 7005        ..p.
>>> 020060:          
>>> 020070: 0042 0010  8701  2010 0013 4411  .BD.
>>> 020080: 3011    00c0 03c0    0...
>>> 020090:    0010      
>>> 0200a0:          
>>> 0200b0:          
>>> 0200c0:          
>>> 0200d0:          
>>> 0200e0:          
>>> 0200f0:          
>>> 020100: 1401 0001     0006 2030  ...0
>>> 020110:    2000  00a0    
>>> 020120:          
>>> 020130:          
>>> 020140: 0001 0002        
>>> 020150:   8000 00ff      
>>> 020160:          
>>> 020170:          
>>> 020180:          
>>> 020190:          
>>> 0201a0:          
>>> 0201b0:          
>>> 0201c0:          
>>> 0201d0:          
>>> 0201e0:          
>>> 0201f0:          
>>>
>>> Transformed into something suitable for feeding into lspci -F:
>>>
>>> 00:00.0 Description filled in by lspci
>>> 00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
>>> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
>>> 40: 

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-07 Thread Tobias Diedrich
Oleksij Rempel wrote:
> Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
> > Oleksij Rempel wrote:
> >> Yes, this is "normal" problem. The firmware has no error handler for PCI
> >> bus related exceptions. So if we filed to read PCI bus first time, we
> >> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> >> and provide an kernel "firmware panic!" message.
> >> Every one who can or will to fix this, is welcome.
> >>
> >>> *
> >>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> >>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> > [...]
> > 
> >> memdmp 50ae78 50ae88
> > 
> > 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
> > 
> > [...copy to bin...]
> > $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
> > [..]
> >0:   6c1004  entry   a1, 32
> >3:   126aa2  l32ra2, 0xfffdaa8c
> >6:   0c0200  memw
> >9:   8820l32i.n  a8, a2, 0  <--Exception cause 
> > PC still points at load
> >b:   c020movi.n  a2, 0
> >d:   081940  extui   a9, a8, 1, 1
> > 
> > Judging from that it should be fairly simple to at least implement
> > some sort of retry, possible after triggering a PCIe link retrain?
> 
> I assume, yes.
> 
> > There are some related PCIe root complex registers that may point to
> > what exactly failed if they were dumped.
> > 
> > The root complex registers live at 0x0004 and I think match the
> > registers described for the root complex in the AR9344 datasheet.
> 
> Suddenly I don't have ar7010 docs to tell..
> 
> > PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
> > "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
> > the hierarchy reports any of the following errors and the associated
> > enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
> > ERR_NONFATAL."
> > 
> > AFAICS link retrain can be done by setting bit3 (INIT_RST,
> > "Application request to initiate a training reset") in
> > PCIE_APP (0x4).
> > 
> > See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
> > flips some bits in the RC to enable the PCIe bus for reading the
> > EEPROM).
> > 
> > The root complex pci configuration space is at 0x2 which could
> > have further error details:
> >> memdmp 2 20200
> > 
> > 02: a02a 168c 0010 0006  0001 0001   .*..
> > 020010:          
> > 020020:          
> > 020030:    0040    01ff  ...@
> > 020040: 5bc3 5001        [.P.
> > 020050: 0080 7005        ..p.
> > 020060:          
> > 020070: 0042 0010  8701  2010 0013 4411  .BD.
> > 020080: 3011    00c0 03c0    0...
> > 020090:    0010      
> > 0200a0:          
> > 0200b0:          
> > 0200c0:          
> > 0200d0:          
> > 0200e0:          
> > 0200f0:          
> > 020100: 1401 0001     0006 2030  ...0
> > 020110:    2000  00a0    
> > 020120:          
> > 020130:          
> > 020140: 0001 0002        
> > 020150:   8000 00ff      
> > 020160:          
> > 020170:          
> > 020180:          
> > 020190:          
> > 0201a0:          
> > 0201b0:          
> > 0201c0:          
> > 0201d0:          
> > 0201e0:          
> > 0201f0:          
> > 
> > Transformed into something suitable for feeding into lspci -F:
> > 
> > 00:00.0 Description filled in by lspci
> > 00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
> > 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
> > 40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-07 Thread Oleksij Rempel
Am 07.06.2017 um 02:12 schrieb Tobias Diedrich:
> Oleksij Rempel wrote:
>> Yes, this is "normal" problem. The firmware has no error handler for PCI
>> bus related exceptions. So if we filed to read PCI bus first time, we
>> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
>> and provide an kernel "firmware panic!" message.
>> Every one who can or will to fix this, is welcome.
>>
>>> *
>>> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
>>> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> [...]
> 
>> memdmp 50ae78 50ae88
> 
> 50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@
> 
> [...copy to bin...]
> $ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
> [..]
>0:   6c1004  entry   a1, 32
>3:   126aa2  l32ra2, 0xfffdaa8c
>6:   0c0200  memw
>9:   8820l32i.n  a8, a2, 0  <--Exception cause PC 
> still points at load
>b:   c020movi.n  a2, 0
>d:   081940  extui   a9, a8, 1, 1
> 
> Judging from that it should be fairly simple to at least implement
> some sort of retry, possible after triggering a PCIe link retrain?

I assume, yes.

> There are some related PCIe root complex registers that may point to
> what exactly failed if they were dumped.
> 
> The root complex registers live at 0x0004 and I think match the
> registers described for the root complex in the AR9344 datasheet.

Suddenly I don't have ar7010 docs to tell..

> PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
> "A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
> the hierarchy reports any of the following errors and the associated
> enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
> ERR_NONFATAL."
> 
> AFAICS link retrain can be done by setting bit3 (INIT_RST,
> "Application request to initiate a training reset") in
> PCIE_APP (0x4).
> 
> See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
> flips some bits in the RC to enable the PCIe bus for reading the
> EEPROM).
> 
> The root complex pci configuration space is at 0x2 which could
> have further error details:
>> memdmp 2 20200
> 
> 02: a02a 168c 0010 0006  0001 0001   .*..
> 020010:          
> 020020:          
> 020030:    0040    01ff  ...@
> 020040: 5bc3 5001        [.P.
> 020050: 0080 7005        ..p.
> 020060:          
> 020070: 0042 0010  8701  2010 0013 4411  .BD.
> 020080: 3011    00c0 03c0    0...
> 020090:    0010      
> 0200a0:          
> 0200b0:          
> 0200c0:          
> 0200d0:          
> 0200e0:          
> 0200f0:          
> 020100: 1401 0001     0006 2030  ...0
> 020110:    2000  00a0    
> 020120:          
> 020130:          
> 020140: 0001 0002        
> 020150:   8000 00ff      
> 020160:          
> 020170:          
> 020180:          
> 020190:          
> 0201a0:          
> 0201b0:          
> 0201c0:          
> 0201d0:          
> 0201e0:          
> 0201f0:          
> 
> Transformed into something suitable for feeding into lspci -F:
> 
> 00:00.0 Description filled in by lspci
> 00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
> 40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 10 00 42 00 01 87 00 00 10 20 00 00 11 44 13 00
> 80: 00 00 11 30 00 00 00 00 c0 03 c0 00 00 00 00 00

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-06 Thread Tobias Diedrich
Oleksij Rempel wrote:
> Yes, this is "normal" problem. The firmware has no error handler for PCI
> bus related exceptions. So if we filed to read PCI bus first time, we
> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> and provide an kernel "firmware panic!" message.
> Every one who can or will to fix this, is welcome.
> 
> > *
> > Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> > exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
[...]

>memdmp 50ae78 50ae88

50ae78: 6c10 0412 6aa2 0c02 0088 20c0 2008 1940  l...j..@

[...copy to bin...]
$ bin/objdump -b binary -m xtensa  -D /tmp/memdump.bin 
[..]
   0:   6c1004  entry   a1, 32
   3:   126aa2  l32ra2, 0xfffdaa8c
   6:   0c0200  memw
   9:   8820l32i.n  a8, a2, 0  <--Exception cause PC 
still points at load
   b:   c020movi.n  a2, 0
   d:   081940  extui   a9, a8, 1, 1

Judging from that it should be fairly simple to at least implement
some sort of retry, possible after triggering a PCIe link retrain?
There are some related PCIe root complex registers that may point to
what exactly failed if they were dumped.

The root complex registers live at 0x0004 and I think match the
registers described for the root complex in the AR9344 datasheet.

PCIE_INT_MASK would map to 0x40050 and has a bit for SYS_ERR:
"A system error. The RC Core asserts CFG_SYS_ERR_RC if any device in
the hierarchy reports any of the following errors and the associated
enable bit is set in the Root Control register: ERR_COR, ERR_FATAL,
ERR_NONFATAL."

AFAICS link retrain can be done by setting bit3 (INIT_RST,
"Application request to initiate a training reset") in
PCIE_APP (0x4).

See sboot/magpie_1_1/sboot/cmnos/eeprom/src/cmnos_eeprom.c (which
flips some bits in the RC to enable the PCIe bus for reading the
EEPROM).

The root complex pci configuration space is at 0x2 which could
have further error details:
>memdmp 2 20200

02: a02a 168c 0010 0006  0001 0001   .*..
020010:          
020020:          
020030:    0040    01ff  ...@
020040: 5bc3 5001        [.P.
020050: 0080 7005        ..p.
020060:          
020070: 0042 0010  8701  2010 0013 4411  .BD.
020080: 3011    00c0 03c0    0...
020090:    0010      
0200a0:          
0200b0:          
0200c0:          
0200d0:          
0200e0:          
0200f0:          
020100: 1401 0001     0006 2030  ...0
020110:    2000  00a0    
020120:          
020130:          
020140: 0001 0002        
020150:   8000 00ff      
020160:          
020170:          
020180:          
020190:          
0201a0:          
0201b0:          
0201c0:          
0201d0:          
0201e0:          
0201f0:          

Transformed into something suitable for feeding into lspci -F:

00:00.0 Description filled in by lspci
00: 8c 16 2a a0 06 00 10 00 01 00 00 00 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 01 50 c3 5b 00 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 00 42 00 01 87 00 00 10 20 00 00 11 44 13 00
80: 00 00 11 30 00 00 00 00 c0 03 c0 00 00 00 00 00
90: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 

Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-03 Thread Nathan Royce
On Sat, Jun 3, 2017 at 2:57 AM, Oleksij Rempel  wrote:
> Hm... this function and file:
> linux/drivers/net/wireless/ath/ath9k/common-beacon.c
> didn't changed since 2015. So, it should be some thing different.
> Can you run
> git bisect to find exact patch caused this regression?
>
That was the first time I experienced the x/0 issue and don't know how
I'd reproduce it.

> Yes, this is "normal" problem. The firmware has no error handler for PCI
> bus related exceptions. So if we filed to read PCI bus first time, we
> have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
> and provide an kernel "firmware panic!" message.
> Every one who can or will to fix this, is welcome.
>
Thanks for that explanation. I'm not sure it's something I could
tackle though. My best bet in the meantime is to coax systemd to
restart the services when the device pops up. However, every attempt
has failed so far.

> It is possible. If adapter is used in AP mode, then lots of WiFi noise
> is dumped over this interface. I assume the reproducibility depends on
> external environment, not internal.
>
I find this quite believable. I have 2.4ghz happening with the
TP-Link, ZTE Mobley, bluetooth, logitech unifying, usb 3.0. Though all
4 devices are going through the USB 2.0 port, and the tp-link and
mobley have long usb cables in the attic and the hub connects through
a 2m usb extension. So yeah, I've got a lot of variables in play.


Re: ath9k_htc - Division by zero in kernel (as well as firmware panic)

2017-06-03 Thread Oleksij Rempel
Hi,

Am 03.06.2017 um 00:02 schrieb Nathan Royce:
> ODroid XU4
> 
> $ uname -a
> Linux computer 4.12.0-rc3-dirty #1 SMP Wed May 31 15:02:05 CDT 2017
> armv7l GNU/Linux
> 
> $ lsusb
> ...
> Bus 001 Device 002: ID 2109:2813 VIA Labs, Inc.
> Bus 001 Device 010: ID 0cf3:7015 Qualcomm Atheros Communications
> TP-Link TL-WN821N v3 / TL-WN822N v2 802.11n [Atheros AR7010+AR9287]
> ...
> 
> *
> Jun 02 16:20:11 computer hostapd[14954]: vwlan0: interface state
> COUNTRY_UPDATE->HT_SCAN
> Jun 02 16:20:17 computer hostapd[14954]: 20/40 MHz operation not
> permitted on channel pri=7 sec=3 based on overlapping BSSes
> Jun 02 16:20:18 computer kernel: Division by zero in kernel.
> Jun 02 16:20:18 computer kernel: CPU: 1 PID: 14507 Comm: kworker/u16:2
> Tainted: GW   4.12.0-rc3-dirty #1
> Jun 02 16:20:18 computer kernel: Hardware name: SAMSUNG EXYNOS
> (Flattened Device Tree)
> Jun 02 16:20:18 computer kernel: Workqueue: phy5 ieee80211_scan_work 
> [mac80211]
> Jun 02 16:20:18 computer kernel: [] (unwind_backtrace) from
> [] (show_stack+0x10/0x14)
> Jun 02 16:20:18 computer kernel: [] (show_stack) from
> [] (dump_stack+0x88/0x9c)
> Jun 02 16:20:18 computer kernel: [] (dump_stack) from
> [] (Ldiv0_64+0x8/0x18)
> Jun 02 16:20:18 computer kernel: [] (Ldiv0_64) from
> [] (ath9k_get_next_tbtt+0x58/0x5c [ath9k_common])

Hm... this function and file:
linux/drivers/net/wireless/ath/ath9k/common-beacon.c
didn't changed since 2015. So, it should be some thing different.
Can you run
git bisect to find exact patch caused this regression?

> Jun 02 16:20:18 computer kernel: [] (ath9k_get_next_tbtt
> [ath9k_common]) from [] (ath9k_cmn_beacon_config
> Jun 02 16:20:18 computer kernel: []
> (ath9k_cmn_beacon_config_ap [ath9k_common]) from []
> (ath9k_htc_beacon
> Jun 02 16:20:18 computer kernel: []
> (ath9k_htc_beacon_config_ap [ath9k_htc]) from []
> (ath9k_htc_vif_recon
> Jun 02 16:20:18 computer kernel: [] (ath9k_htc_vif_reconfig
> [ath9k_htc]) from [] (ath9k_htc_sw_scan_compl
> Jun 02 16:20:18 computer kernel: []
> (ath9k_htc_sw_scan_complete [ath9k_htc]) from []
> (__ieee80211_scan_co
> Jun 02 16:20:18 computer kernel: []
> (__ieee80211_scan_completed [mac80211]) from []
> (ieee80211_scan_work+
> Jun 02 16:20:18 computer kernel: [] (ieee80211_scan_work
> [mac80211]) from [] (process_one_work+0x1d8/0x40
> Jun 02 16:20:18 computer kernel: [] (process_one_work) from
> [] (worker_thread+0x4c/0x564)
> Jun 02 16:20:18 computer kernel: [] (worker_thread) from
> [] (kthread+0x14c/0x154)
> Jun 02 16:20:18 computer kernel: [] (kthread) from
> [] (ret_from_fork+0x14/0x3c)
> Jun 02 16:20:18 computer hostapd[14954]: Using interface wlan0 with
> hwaddr  and ssid ""
> Jun 02 16:20:18 computer kernel: IPv6: ADDRCONF(NETDEV_CHANGE):
> vwlan0: link becomes ready
> *
> This is a new one on me.
> 
> The "normal" problem (search shows to be a very old issue) I
> consistently (daily or multiple times/day) encounter is:

Yes, this is "normal" problem. The firmware has no error handler for PCI
bus related exceptions. So if we filed to read PCI bus first time, we
have choice to Ooops and stall or Ooops and reboot ASAP. So we reboot
and provide an kernel "firmware panic!" message.
Every one who can or will to fix this, is welcome.

> *
> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath: firmware panic!
> exccause: 0x000d; pc: 0x0090ae81; badvaddr: 0x10ff4038.
> Jun 02 14:55:30 computer kernel: usb 1-1.1: USB disconnect, device number 9
> Jun 02 14:55:30 computer systemd-networkd[11959]: vwlan0: Lost carrier
> Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state
> Jun 02 14:55:30 computer kernel: wlan0: deauthenticating from
>  by local choice (Reason: 3=DEAUTH_LEAVING)
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer kernel: ath: phy4: Failed to wakeup in 500us
> Jun 02 14:55:30 computer systemd-networkd[11959]: wlan0: Lost carrier
> Jun 02 14:55:30 computer systemd[1]: Stopping A simple WPA encrypted
> wireless connection using a static IP...
> -- Subject: Unit netctl@wlan0.service has begun shutting down
> -- Defined-By: systemd
> -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
> --
> -- Unit netctl@wlan0.service has begun shutting down.
> Jun 02 14:55:30 computer kernel: device vwlan0 left promiscuous mode
> Jun 02 14:55:30 computer kernel: br0: port 2(vwlan0) entered disabled state
> Jun 02 14:55:30 computer audit: ANOM_PROMISCUOUS dev=vwlan0 prom=0
> old_prom=256 auid=4294967295 uid=0 gid=0 ses=4294967295
> Jun 02 14:55:30 computer hostapd[13218]: vwlan0: AP-STA-DISCONNECTED 
> 
> Jun 02 14:55:30 computer hostapd[13218]: Failed to set beacon parameters
> Jun 02 14:55:30 computer hostapd[13218]: vwlan0: INTERFACE-DISABLED
> Jun 02 14:55:30 computer kernel: usb 1-1.1: ath9k_htc: USB layer