Re: TSO em(4) problem

2024-01-26 Thread Hrvoje Popovski
On 26.1.2024. 22:47, Alexander Bluhm wrote:
> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly
>> production.
> What helped debugging a similar issue with ixl(4) and TSO was to
> remove all TSO specific code from the driver.  Then only this part
> remains from the original em(4) TSO diff.
> 
> error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE,
>   EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1),
>   EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map);
> 
> The parameters that changed when adding TSO are:
> 
> bus_size_t size:  MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535
> bus_size_t maxsegsz:  MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 4096
> 
> I suspect that this is the cause for the regression as disabling
> TSO did not help.  Would it be possible to run the diff below?  I
> expect that the problem will still be there.  But then we know it
> must be the change of one of the bus_dmamap_create() arguments.
> 
> bluhm

Hi,

with this diff em0 seems happy and em watchdog is gone.

bcbnfw1# uptime
 8:06AM  up 44 mins, 2 users, load averages: 0.00, 0.00, 0.00

bcbnfw1# ifconfig em0 hwfeatures
em0: flags=8b43
mtu 1500

hwfeatures=1b7
hardmtu 9216
lladdr 0c:c4:7a:da:cd:5a
index 3 priority 0 llprio 3
groups: egress
media: Ethernet autoselect (1000baseT full-duplex,master,rxpause)
status: active
inet 10.10.155.234 netmask 0xfff8 broadcast 10.10.155.239


This morning without diff
bcbnfw1# cat /var/log/messages | grep watchdog
Jan 27 07:12:03 bcbnfw1 /bsd: em0: watchdog: head 50 tail 114 TDH 114 TDT 50
Jan 27 07:15:29 bcbnfw1 /bsd: em0: watchdog: head 370 tail 434 TDH 434
TDT 370
Jan 27 07:15:43 bcbnfw1 /bsd: em0: watchdog: head 219 tail 283 TDH 283
TDT 219
Jan 27 07:15:54 bcbnfw1 /bsd: em0: watchdog: head 322 tail 386 TDH 386
TDT 322
Jan 27 07:16:08 bcbnfw1 /bsd: em0: watchdog: head 115 tail 179 TDH 179
TDT 115
Jan 27 07:16:21 bcbnfw1 /bsd: em0: watchdog: head 364 tail 428 TDH 428
TDT 364
Jan 27 07:16:35 bcbnfw1 /bsd: em0: watchdog: head 473 tail 26 TDH 26 TDT 473





Re: TSO em(4) problem

2024-01-26 Thread Hrvoje Popovski
On 26.1.2024. 21:56, Marcus Glocker wrote:
> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
> 
>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly
>> production.
>>
>> Setup is very simple
>>
>> em0 - carp <- uplink
>> em1 - pfsync
>> ix1 - vlans - carp
> 
> Would it be possible that you also share an "ifconfig -a hwfeatures" of
> that box?  You can mask the IPs if it's too sensitive.
> 
> I still try to reproduce the issue here, and for now I can't.
> Maybe in your full ifconfig output I can see some specifics about your
> configuration, which makes it more likely to reproduce the issue here.
> 

Hi,

here's ifconfig from second setup where watchdog is triggered much faster.
Originally in this setup uplink is ix0, I've change that to em0 to see
would the problem be same as in other setup and it is, and that's good
because this is pfsync setup for students and I can do whatever I want
with it :)



bcbnfw1# ifconfig -a hwfeatures
lo0: flags=2008049 mtu 32768

hwfeatures=7187
index 6 priority 0 llprio 3
groups: lo
inet 127.0.0.1 netmask 0xff00
ix0: flags=2008802 mtu 1500

hwfeatures=71b7
hardmtu 9198
lladdr 90:e2:ba:d7:1b:f4
index 1 priority 0 llprio 3
media: Ethernet autoselect (10GbaseSR full-duplex)
status: active
ix1:
flags=2008b43
mtu 1500

hwfeatures=71b7
hardmtu 9198
lladdr 90:e2:ba:d7:1b:f5
index 2 priority 0 llprio 3
media: Ethernet autoselect (10GbaseSR full-duplex,rxpause,txpause)
status: active
em0: flags=8b43
mtu 1500

hwfeatures=31b7
hardmtu 9216
lladdr 0c:c4:7a:da:cd:5a
index 3 priority 0 llprio 3
groups: egress
media: Ethernet autoselect (1000baseT full-duplex,rxpause)
status: active
inet 10.10.155.234 netmask 0xfff8 broadcast 10.10.155.239
em1: flags=8843 mtu 1500

hwfeatures=31b7
hardmtu 9216
lladdr 0c:c4:7a:da:cd:5b
index 4 priority 0 llprio 3
media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause)
status: active
inet 192.168.0.77 netmask 0xfffc broadcast 192.168.0.79
enc0: flags=0<>
hwfeatures=0<>
index 5 priority 0 llprio 3
groups: enc
status: active
carp0: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:01
index 7 priority 15 llprio 3
carp: MASTER carpdev em0 vhid 1 advbase 1 advskew 10
groups: carp
status: master
inet 10.10.155.236 netmask 0x
carp1100: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:12
index 8 priority 15 llprio 3
carp: MASTER carpdev vlan1100 vhid 18 advbase 1 advskew 10
groups: carp
status: master
inet 10.30.16.1 netmask 0x
carp1101: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:16
index 9 priority 15 llprio 3
carp: MASTER carpdev vlan1101 vhid 22 advbase 1 advskew 10
groups: carp
status: master
inet 10.31.16.1 netmask 0x
carp1102: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:19
index 10 priority 15 llprio 3
carp: MASTER carpdev vlan1102 vhid 25 advbase 1 advskew 10
groups: carp
status: master
inet 10.32.16.1 netmask 0x
carp1103: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:1c
index 11 priority 15 llprio 3
carp: MASTER carpdev vlan1103 vhid 28 advbase 1 advskew 10
groups: carp
status: master
inet 10.33.16.1 netmask 0x
carp1130: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:13
index 12 priority 15 llprio 3
carp: MASTER carpdev vlan1130 vhid 19 advbase 1 advskew 10
groups: carp
status: master
inet 10.30.0.1 netmask 0x
carp1131: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:17
index 13 priority 15 llprio 3
carp: MASTER carpdev vlan1131 vhid 23 advbase 1 advskew 10
groups: carp
status: master
inet 10.31.0.1 netmask 0x
carp1132: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:1a
index 14 priority 15 llprio 3
carp: MASTER carpdev vlan1132 vhid 26 advbase 1 advskew 10
groups: carp
status: master
inet 10.32.0.1 netmask 0x
carp1133: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:1d
index 15 priority 15 llprio 3
carp: MASTER carpdev vlan1133 vhid 29 advbase 1 advskew 10
groups: carp
status: master
inet 10.33.0.1 netmask 0x
carp1150: flags=8843 mtu 1500

hwfeatures=3187
hardmtu 1500
lladdr 00:00:5e:00:01:14
index 16 priority 15 llprio 3
carp: MASTER carpdev vlan1150 

Re: TSO em(4) problem

2024-01-26 Thread Alexander Bluhm
On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:
> I've manage to reproduce TSO em problem on anoter setup, unfortunatly
> production.

What helped debugging a similar issue with ixl(4) and TSO was to
remove all TSO specific code from the driver.  Then only this part
remains from the original em(4) TSO diff.

error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE,
EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1),
EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map);

The parameters that changed when adding TSO are:

bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535
bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 4096

I suspect that this is the cause for the regression as disabling
TSO did not help.  Would it be possible to run the diff below?  I
expect that the problem will still be there.  But then we know it
must be the change of one of the bus_dmamap_create() arguments.

bluhm

Index: dev/pci/if_em.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v
diff -u -p -r1.370 if_em.c
--- dev/pci/if_em.c 31 Dec 2023 08:42:33 -  1.370
+++ dev/pci/if_em.c 26 Jan 2024 21:32:08 -
@@ -291,8 +291,6 @@ void em_receive_checksum(struct em_softc
 struct mbuf *);
 u_int  em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int,
u_int32_t *, u_int32_t *);
-u_int  em_tso_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
-   u_int32_t *);
 u_int  em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *,
u_int32_t *);
 void em_iff(struct em_softc *);
@@ -1238,15 +1236,7 @@ em_encap(struct em_queue *que, struct mb
}
 
if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
-   if (ISSET(m->m_pkthdr.csum_flags, M_TCP_TSO)) {
-   used += em_tso_setup(que, m, head, _upper,
-   _lower);
-   if (!used)
-   return (used);
-   } else {
-   used += em_tx_ctx_setup(que, m, head, _upper,
-   _lower);
-   }
+   used += em_tx_ctx_setup(que, m, head, _upper, _lower);
} else if (sc->hw.mac_type >= em_82543) {
used += em_transmit_checksum_setup(que, m, head,
_upper, _lower);
@@ -1579,21 +1569,6 @@ em_update_link_status(struct em_softc *s
ifp->if_link_state = link_state;
if_link_state_change(ifp);
}
-
-   /* Disable TSO for 10/100 speeds to avoid some hardware issues */
-   switch (sc->link_speed) {
-   case SPEED_10:
-   case SPEED_100:
-   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
-   ifp->if_capabilities &= ~IFCAP_TSOv4;
-   ifp->if_capabilities &= ~IFCAP_TSOv6;
-   }
-   break;
-   case SPEED_1000:
-   if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210)
-   ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
-   break;
-   }
 }
 
 /*
@@ -2013,7 +1988,6 @@ em_setup_interface(struct em_softc *sc)
if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) {
ifp->if_capabilities |= IFCAP_CSUM_IPv4;
ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6;
-   ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6;
}
 
/* 
@@ -2429,81 +2403,6 @@ em_free_transmit_structures(struct em_so
0, que->tx.sc_tx_dma.dma_map->dm_mapsize,
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
}
-}
-
-u_int
-em_tso_setup(struct em_queue *que, struct mbuf *mp, u_int head,
-u_int32_t *olinfo_status, u_int32_t *cmd_type_len)
-{
-   struct ether_extracted ext;
-   struct e1000_adv_tx_context_desc *TD;
-   uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0;
-   uint32_t paylen = 0;
-   uint8_t iphlen = 0;
-
-   *olinfo_status = 0;
-   *cmd_type_len = 0;
-   TD = (struct e1000_adv_tx_context_desc *)>tx.sc_tx_desc_ring[head];
-
-#if NVLAN > 0
-   if (ISSET(mp->m_flags, M_VLANTAG)) {
-   uint32_t vtag = mp->m_pkthdr.ether_vtag;
-   vlan_macip_lens |= vtag << E1000_ADVTXD_VLAN_SHIFT;
-   *cmd_type_len |= E1000_ADVTXD_DCMD_VLE;
-   }
-#endif
-
-   ether_extract_headers(mp, );
-   if (ext.tcp == NULL)
-   goto out;
-
-   vlan_macip_lens |= (sizeof(*ext.eh) << E1000_ADVTXD_MACLEN_SHIFT);
-
-   if (ext.ip4) {
-   iphlen = ext.ip4->ip_hl << 2;
-
-   type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4;
-   *olinfo_status |= 

Re: TSO em(4) problem

2024-01-26 Thread Marcus Glocker
On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote:

> I've manage to reproduce TSO em problem on anoter setup, unfortunatly
> production.
> 
> Setup is very simple
> 
> em0 - carp <- uplink
> em1 - pfsync
> ix1 - vlans - carp

Would it be possible that you also share an "ifconfig -a hwfeatures" of
that box?  You can mask the IPs if it's too sensitive.

I still try to reproduce the issue here, and for now I can't.
Maybe in your full ifconfig output I can see some specifics about your
configuration, which makes it more likely to reproduce the issue here.



Re: TSO em(4) problem

2024-01-26 Thread Hrvoje Popovski
I've manage to reproduce TSO em problem on anoter setup, unfortunatly
production.

Setup is very simple

em0 - carp <- uplink
em1 - pfsync
ix1 - vlans - carp




Jan 26 11:19:23 bcbnfw1 /bsd: em0: watchdog: head 34 tail 98 TDH 98 TDT 34
Jan 26 11:19:33 bcbnfw1 /bsd: em0: watchdog: head 345 tail 409 TDH 409
TDT 345
Jan 26 11:19:54 bcbnfw1 /bsd: em0: watchdog: head 259 tail 323 TDH 323
TDT 259
Jan 26 11:20:08 bcbnfw1 /bsd: em0: watchdog: head 343 tail 407 TDH 407
TDT 343
Jan 26 11:20:24 bcbnfw1 /bsd: em0: watchdog: head 20 tail 85 TDH 85 TDT 20
Jan 26 11:20:47 bcbnfw1 /bsd: em0: watchdog: head 388 tail 452 TDH 452
TDT 388
Jan 26 11:21:09 bcbnfw1 /bsd: em0: watchdog: head 25 tail 89 TDH 89 TDT 25
Jan 26 11:21:32 bcbnfw1 /bsd: em0: watchdog: head 105 tail 169 TDH 169
TDT 105
Jan 26 11:21:52 bcbnfw1 /bsd: em0: watchdog: head 23 tail 88 TDH 88 TDT 23




647470: Jan 26 11:19:25: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647474: Jan 26 11:19:29: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647478: Jan 26 11:19:35: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647483: Jan 26 11:19:39: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647487: Jan 26 11:19:56: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647491: Jan 26 11:19:59: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647495: Jan 26 11:20:10: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647499: Jan 26 11:20:13: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647504: Jan 26 11:20:26: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647508: Jan 26 11:20:29: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647512: Jan 26 11:20:49: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647516: Jan 26 11:20:52: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647520: Jan 26 11:21:11: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647524: Jan 26 11:21:14: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647528: Jan 26 11:21:34: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647532: Jan 26 11:21:36: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up

647536: Jan 26 11:21:54: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to down
647540: Jan 26 11:21:56: %LINK-SP-3-UPDOWN: Interface
GigabitEthernet4/48, changed state to up




bcbnfw1# kstat em0:::
em0:0:em-stats:0
 rx crc errs: 0 packets
   rx align errs: 0 packets
   rx align errs: 0 packets
 rx errs: 0 packets
   rx missed: 0 packets
  tx single coll: 0 packets
  tx excess coll: 0 packets
   tx multi coll: 0 packets
tx late coll: 0 packets
 tx coll: 0
   tx defers: 0
   tx no CRS: 0 packets
seq errs: 0
   carr ext errs: 0 packets
 rx len errs: 0 packets
  rx xon: 0 packets
  tx xon: 0 packets
 rx xoff: 0 packets
 tx xoff: 0 packets
  FC unsupported: 0 packets
  rx 64B: 6555 packets
  rx 65-127B: 11144 packets
 rx 128-255B: 6264 packets
 rx 256-511B: 2390 packets
rx 512-1023B: 3706 packets
rx 1024-maxB: 87987 packets
 rx good: 118046 packets
rx bcast: 3 packets
rx mcast: 82 packets
 tx good: 56796 packets
 rx good: 132532686 bytes
 tx good: 13691390 bytes
   rx no buffers: 0 packets
rx undersize: 0 packets
rx fragments: 0 packets
 rx oversize: 0 packets
  rx jabbers: 0 packets
 rx mgmt: 0 packets
   rx mgmt drops: 0 packets
 tx mgmt: 0 packets
rx total: 132532686 bytes
tx total: 13691390 bytes
rx total: 118046 packets
tx total: 56796 packets
  tx 64B: 11861 packets
  tx 65-127B: 28718 packets
 tx 128-255B: 7202 packets
 tx 256-511B: 1834 packets
tx 512-1023B: 2059 packets
tx 1024-maxB: 5122 packets
tx mcast: 18 packets
tx bcast: 2 packets
em0:0:rxq:0
 packets: 1009629 packets
   bytes: 1172569417 bytes
  fdrops: 0 packets
  qdrops: 0 packets
  errors: 0 packets
qlen: 0 packets
enqueues: 348100
dequeues: 348031
em0:0:txq:0
 packets: 465709 packets
   bytes: 103430590 bytes
  qdrops: 53674 packets
  errors: 0 packets
qlen: 0 packets
 maxqlen: 511 packets
 oactive: false
oactives: 9




OpenBSD 7.4-current (GENERIC.MP) #1626: Thu Jan 25 20:05:01 MST 2024
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34224844800 (32639MB)
avail mem = 33166430208 (31629MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xec9b0 (62