Re: OpenBSD 6.7 crashes on APU2C4 with LTE modem Huawei E3372s-153 HiLink

2020-06-12 Thread Łukasz Lejtkowski
Good news - no more kernel panics on USB 3.0(xHCI), it’s fixed.

Bad news - after 2-3h LTE modem lost local network connection via USB 
3.0(cdce0). I have to remove modem and put it back to usb port - then local 
network connection between OpenBSD and modem back for 2-3h, sometimes 30-40 
min. It looks like the same problem as kernel panic, but this time there is 
lost network connection via usb 3.0(xhci).  

root@master[~]ping 192.168.8.1
PING 192.168.8.1 (192.168.8.1): 56 data bytes
ping: sendmsg: Network is down
ping: wrote 192.168.8.1 64 chars, ret=-1
ping: sendmsg: Network is down
ping: wrote 192.168.8.1 64 chars, ret=-1

192.168.8.1 is default static IP on lte modem. 

Your changes in if_cdce.c 1.77 not completely fix the problem. 


> On 11 Jun 2020, at 11:13, Łukasz Lejtkowski  wrote:
> 
> Hi Gerhard,
> 
> Today I added Your patches to 6.7-stable and moved back LTE modem to USB 3.0. 
> So, just waiting for… nothing or kernel panic. I’ll let you know. 
> 
>> On 8 Jun 2020, at 19:13, Patrick Wildt > > wrote:
>> 
>> On Mon, Jun 08, 2020 at 05:31:44PM +0200, Gerhard Roth wrote:
>>> On 2020-05-25 13:19, Martin Pieuchot wrote:
 On 25/05/20(Mon) 12:56, Gerhard Roth wrote:
> On 5/22/20 9:05 PM, Mark Kettenis wrote:
>>> From: Łukasz Lejtkowski mailto:emig...@gmail.com>>
>>> Date: Fri, 22 May 2020 20:51:57 +0200
>>> 
>>> Probably power supply 12 V is broken. Showing 16,87 V(Fluke 179) -
>>> too high. Should be 12,25-12,50 V. I replaced to the new one.
>> 
>> That might be why the device stops responding.  The fact that cleaning
>> up from a failed USB transaction leads to this panic is a bug though.
>> 
>> And somebody just posted a very similar panic with ure(4).  Something
>> in the network stack is holding a mutex when it shouldn't.
> 
> I think that holding the mutex is ok. The bug is calling the stop
> routine in case of errors.
> 
> This is what common foo_start() does:
> 
>   m_head = ifq_deq_begin(>if_snd);
>   if (foo_encap(sc, m_head, 0)) {
>   ifq_deq_rollback(>if_snd, m_head);
>   ...
>   return;
>   }
>   ifq_deq_commit(>if_snd, m_head);
> 
> Here, ifq_deq_begin() grabs a mutex and it is held while
> calling foo_encap().
> 
> For USB network interfaces foo_encap() mostly does this:
> 
>   err = usbd_transfer(sc->sc_xfer);
>   if (err != USBD_IN_PROGRESS) {
>   foo_stop(sc);
>   return EIO;
>   }
> 
> And foo_stop() calls usbd_abort_pipe() -> xhci_command_submit(),
> which might sleep.
> 
> How to fix? We could do the foo_encap() after the ifq_deq_commit(),
> possibly dropping the current mbuf if encap fails (who cares
> for the packets after foo_stop() anyway).
 
 That's the approach taken by drivers using ifq_dequeue(9) instead of
 ifq_deq_begin/commit().
 
> Or change all the drivers to follow the path that if_aue.c takes:
> 
>   err = usbd_transfer(c->aue_xfer);
>   if (err != USBD_IN_PROGRESS) {
>   ...
>   /* Stop the interface from process context. */
>   usb_add_task(sc->aue_udev, >aue_stop_task);
>   return (EIO);
>   }
 
 That's just trading the current problem for another one with higher
 complexity.
 
> Any ideas, what's better? Or alternative proposals?
 
 Using ifq_dequeue(9) would have the advantage of unifying the code base.
 It introduces a behavior change.  A simpler fix would be to call
 foo_stop() in the error path after ifq_deq_rollback().
 
>>> 
>>> Hi,
>>> 
>>> two weeks passed any nobody objected Martin's proposal. So I thought,
>>> we could try to move on this way.
>>> 
>>> Gerhard
>>> 
>> 
>> From what I remember from various discussions, the goal should be to
>> check if there's a buffer free in the ring, then dequeue and send, and
>> it it can't be sent out, then drop it.  With USB apparently those
>> drivers "always" have an open buffer, so we can just dequeue and send,
>> like you do in this diff.  And if it gets dropped, that's fine.
>> 
>> That said, I think IFQ_DEQUEUE() is old compat code, and we actually
>> nowadays prefer:
>> 
>> m_head = ifq_dequeue(>if_snd);
>> 
>> If you look at the define for IFQ_DEQUEUE() you'll see it's marked
>> as compat code.  If you look at a new driver, like ixl(4), you'll
>> see that it also uses ifq_dequeue().
>> 
>> Sorry to to give you some work, but with that fixed: ok patrick@
>> 
>> Patrick
>> 
>>> 
>>> Index: sys/dev/usb/if_axe.c
>>> ===
>>> RCS file: /cvs/src/sys/dev/usb/if_axe.c,v
>>> retrieving revision 1.139
>>> diff -u -p -u -p -r1.139 if_axe.c
>>> --- sys/dev/usb/if_axe.c7 Jul 2019 06:40:10 -   1.139
>>> +++ sys/dev/usb/if_axe.c8 Jun 2020 15:13:25 -
>>> @@ -1223,6 +1223,7 @@ 

Re: Interfaces errors and latency spikes with Intel 82583V

2020-06-12 Thread Gabri Tofano

Flow control is off and pause frames counters shows 0 packets:

GigabitEthernet0/23 is up, line protocol is up (connected)
  Hardware is Gigabit Ethernet
  Description: FRW-FW1-OUT
  MTU 1500 bytes, BW 100 Kbit/sec, DLY 10 usec,
 reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:01, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 3000 bits/sec, 5 packets/sec
  5 minute output rate 115000 bits/sec, 5 packets/sec
 974201764 packets input, 216703921981 bytes, 0 no buffer
 Received 12293 broadcasts (114 multicasts)
 0 runts, 0 giants, 0 throttles
 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
 0 watchdog, 114 multicast, 0 pause input
 0 input packets with dribble condition detected
 2218244753 packets output, 2712046308343 bytes, 0 underruns
 0 output errors, 0 collisions, 1 interface resets
 0 unknown protocol drops
 0 babbles, 0 late collision, 0 deferred
 0 lost carrier, 0 no carrier, 0 pause output
 0 output buffer failures, 0 output buffers swapped out


On 2020-06-11 22:52, David Gwynne wrote:

are there any config options on the switch site relating to flow
control you can try turning off? are there any counters for pause
frames on the switch side too?

dlg


On 12 Jun 2020, at 12:16 pm, Gabri Tofano  wrote:

Apparently it is not:

#ifconfig em0 hwfeatures
em0: flags=808843 
mtu 1500
   hwfeatures=36 
hardmtu 9216

   lladdr XX:XX:XX:XX:XX:XX
   index 1 priority 0 llprio 3
   groups: egress
   media: Ethernet autoselect (1000baseT full-duplex)
   status: active
   inet XX:XX:XX:XX netmask 0xff00 broadcast XX:XX:XX:XX


On 2020-06-11 21:57, David Gwynne wrote:

Is flow control enabled? Can you try disabling rxpause and txpause?

On 12 Jun 2020, at 10:36 am, Gabri Tofano  wrote:
Yes, this is today without resetting the interface:
#netstat -ie
NameMtu   Network Address  Ipkts IerrsOpkts 
Oerrs Colls
em0 1500XX:XX:XX:XX:XX:XX  5351463  1868  3016695  
   0 0
em0 1500  XX:XX:XX:XX XX:XX:XX:XX:XX:XX  5351463  1868  3016695  
   0 0
em1 1500XX:XX:XX:XX:XX:XX  2839738 0  5147702  
   0 0
em1 1500  172.16.200. XX:XX:XX:XX:XX:XX  2839738 0  5147702  
   0 0
em2 1500XX:XX:XX:XX:XX:XX46977 044135  
   0 0
em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX46977 044135  
   0 0
em3*150000:e0:67:10:9d:970 00  
   0 0
enc0*   00 00  
   0 0
pflog0  331360 0   128982  
   0 0

On 2020-06-11 20:29, David Gwynne wrote:

Is it consistently Ierrs?
dlg
On 11 Jun 2020, at 10:14 pm, Gabri Tofano  
wrote:

#netstat -id
NameMtu   Network Address  Ipkts Idrop
Opkts Odrop Colls
em0 1500XX:XX:XX:XX:XX:XX   266894 0   
202813 0 0
em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   266894 0   
202813 0 0
em1 1500XX:XX:XX:XX:XX:XX   170280 0   
230226 1 0
em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   170280 0   
230226 1 0
em2 1500XX:XX:XX:XX:XX:XX15788 0
13249 2 0
em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15788 0
13249 2 0
em3*1500XX:XX:XX:XX:XX:XX0 0
0 0 0
enc0*   00 0
0 0 0
pflog0  331360 0
29771 0 0

#netstat -ie
NameMtu   Network Address  Ipkts Ierrs
Opkts Oerrs Colls
em0 1500XX:XX:XX:XX:XX:XX   26971372   
205469 0 0
em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   26971372   
205469 0 0
em1 1500XX:XX:XX:XX:XX:XX   172137 0   
232148 0 0
em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   172137 0   
232148 0 0
em2 1500XX:XX:XX:XX:XX:XX15892 0
13316 0 0
em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15892 0
13316 0 0
em3*1500XX:XX:XX:XX:XX:XX0 0
0 0 0
enc0*   00 0
0 0 0
pflog0  331360 0
30174 0 0

#systat queues
QUEUE  BW/FL SCH  PKTSBYTES   
DROP_P   DROP_B QLEN BORROW SUSPEN P/S B/S
main on em0 120M fifo00
000
defq 

Re: athn0 works in 6.6, fails in 6.7

2020-06-12 Thread Stefan Sperling
On Thu, Jun 11, 2020 at 06:08:31PM -0500, Tim Chase wrote:
> and it works fine there.  The big distinction is that after 
> 
>   sending msg 4/4 of the 4-way handshake
> 
> my `ifconfig athn0 debug` output is giving me these two lines in the
> 6.6 bsd.rd:
> 
>   received msg 1/2 of the group key handshake from [MAC]
>   sending msg 2/2 of the group key handshake to [same MAC]
> 
> that never happen in the 6.7 (both -RELEASE and -CURRENT snap) output.

Can you please boot into 6.7, let it fail to connect, and then get the
output of the following command and show it to me?

netstat -W athn0



Re: Interfaces errors and latency spikes with Intel 82583V

2020-06-12 Thread Gabri Tofano

Apparently it is not:

#ifconfig em0 hwfeatures
em0: flags=808843 mtu 
1500
hwfeatures=36 
hardmtu 9216

lladdr XX:XX:XX:XX:XX:XX
index 1 priority 0 llprio 3
groups: egress
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet XX:XX:XX:XX netmask 0xff00 broadcast XX:XX:XX:XX


On 2020-06-11 21:57, David Gwynne wrote:

Is flow control enabled? Can you try disabling rxpause and txpause?


On 12 Jun 2020, at 10:36 am, Gabri Tofano  wrote:

Yes, this is today without resetting the interface:

#netstat -ie
NameMtu   Network Address  Ipkts IerrsOpkts 
Oerrs Colls
em0 1500XX:XX:XX:XX:XX:XX  5351463  1868  3016695
 0 0
em0 1500  XX:XX:XX:XX XX:XX:XX:XX:XX:XX  5351463  1868  3016695
 0 0
em1 1500XX:XX:XX:XX:XX:XX  2839738 0  5147702
 0 0
em1 1500  172.16.200. XX:XX:XX:XX:XX:XX  2839738 0  5147702
 0 0
em2 1500XX:XX:XX:XX:XX:XX46977 044135
 0 0
em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX46977 044135
 0 0
em3*150000:e0:67:10:9d:970 00
 0 0
enc0*   00 00
 0 0
pflog0  331360 0   128982
 0 0



On 2020-06-11 20:29, David Gwynne wrote:

Is it consistently Ierrs?
dlg

On 11 Jun 2020, at 10:14 pm, Gabri Tofano  wrote:
#netstat -id
NameMtu   Network Address  Ipkts IdropOpkts 
Odrop Colls
em0 1500XX:XX:XX:XX:XX:XX   266894 0   202813  
   0 0
em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   266894 0   202813  
   0 0
em1 1500XX:XX:XX:XX:XX:XX   170280 0   230226  
   1 0
em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   170280 0   230226  
   1 0
em2 1500XX:XX:XX:XX:XX:XX15788 013249  
   2 0
em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15788 013249  
   2 0
em3*1500XX:XX:XX:XX:XX:XX0 00  
   0 0
enc0*   00 00  
   0 0
pflog0  331360 029771  
   0 0

#netstat -ie
NameMtu   Network Address  Ipkts IerrsOpkts 
Oerrs Colls
em0 1500XX:XX:XX:XX:XX:XX   26971372   205469  
   0 0
em0 1500  XX.XX.XX.XX XX:XX:XX:XX:XX:XX   26971372   205469  
   0 0
em1 1500XX:XX:XX:XX:XX:XX   172137 0   232148  
   0 0
em1 1500  172.16.200. XX:XX:XX:XX:XX:XX   172137 0   232148  
   0 0
em2 1500XX:XX:XX:XX:XX:XX15892 013316  
   0 0
em2 1500  172.16.103/ XX:XX:XX:XX:XX:XX15892 013316  
   0 0
em3*1500XX:XX:XX:XX:XX:XX0 00  
   0 0
enc0*   00 00  
   0 0
pflog0  331360 030174  
   0 0

#systat queues
QUEUE  BW/FL SCH  PKTSBYTES   DROP_P 
  DROP_B QLEN BORROW SUSPEN P/S B/S
main on em0 120M fifo000 
   00
defq   100M fifo   139394 215744110  
  00
voip10M fifo34699  49496350  
  00
games   10M fifo32277  24608070  
  00

Thank you!
Gabri