Re: TSO em(4) problem
On 1.2.2024. 18:42, Alexander Bluhm wrote: > On Tue, Jan 30, 2024 at 02:32:24PM +0100, Hrvoje Popovski wrote: >> yes, and forwarding only without pf. >> I'm sending traffic from host connected to vlan/ix0 and forward through >> em5 to other host. >> I'm sending 1Gbps of traffic with cisco t-rex > I cannot reproduce. > > ix0 at pci6 dev 0 function 0 "Intel 82599" rev 0x01, msix, 8 queues, address > 90:e2:ba:d6:23:68 > em1 at pci7 dev 0 function 1 "Intel I350" rev 0x01: msi, address > a0:36:9f:0a:4a:c5 > > root@ot42:.../~# ifconfig ix0 hwfeatures > ix0: flags=2008843 mtu 1500 > > hwfeatures=71b7 > hardmtu 9198 > lladdr 90:e2:ba:d6:23:68 > description: Intel 82599 > index 5 priority 0 llprio 3 > media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause) > status: active > > root@ot42:.../~# ifconfig em1 hwfeatures > em1: flags=8c43 mtu 1500 > > hwfeatures=31b7 > hardmtu 9216 > lladdr a0:36:9f:0a:4a:c5 > description: Intel I350 > index 8 priority 0 llprio 3 > media: Ethernet autoselect (1000baseT full-duplex,master) > status: active > inet 10.10.22.3 netmask 0xff00 broadcast 10.10.22.255 > > root@ot42:.../~# ifconfig vlan0 hwfeatures > vlan0: flags=8843 mtu 1500 > > hwfeatures=3187 > hardmtu 9198 > lladdr 90:e2:ba:d6:23:68 > index 24 priority 0 llprio 3 > encap: vnetid 221 parent ix0 txprio packet rxprio outer > groups: vlan > media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause) > status: active > inet 10.10.21.2 netmask 0xff00 broadcast 10.10.21.255 > > root@ot42:.../~# pfctl -si > Status: Disabled for 0 days 00:03:42 Debug: err > > Running tcpbench -n100 from Linux via OpenBSD forwarding to Linux. > Simultaneous udpbench to create traffic mixture. > > root@ot42:.../~# netstat -ss | egrep 'TSO|LRO' > 1188 output TSO packets software chopped > 33086906 output TSO packets hardware processed > 265855748 output TSO packets generated > 31090975 input LRO generated packets from hardware > 176482178 input LRO coalesced packets by network device > > Lot of LRO and TSO. Running diff below, which reverts em TSO backout > and adds sparc64 fix. > > Hrvoje: What is different in your lab? I think I found it. It's lldp. If I enable lldpd I'm getting watchdog on em, when disabled only one watchdog at the beginning of testing.
Re: TSO em(4) problem
On Tue, Jan 30, 2024 at 02:32:24PM +0100, Hrvoje Popovski wrote: > yes, and forwarding only without pf. > I'm sending traffic from host connected to vlan/ix0 and forward through > em5 to other host. > I'm sending 1Gbps of traffic with cisco t-rex I cannot reproduce. ix0 at pci6 dev 0 function 0 "Intel 82599" rev 0x01, msix, 8 queues, address 90:e2:ba:d6:23:68 em1 at pci7 dev 0 function 1 "Intel I350" rev 0x01: msi, address a0:36:9f:0a:4a:c5 root@ot42:.../~# ifconfig ix0 hwfeatures ix0: flags=2008843 mtu 1500 hwfeatures=71b7 hardmtu 9198 lladdr 90:e2:ba:d6:23:68 description: Intel 82599 index 5 priority 0 llprio 3 media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause) status: active root@ot42:.../~# ifconfig em1 hwfeatures em1: flags=8c43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr a0:36:9f:0a:4a:c5 description: Intel I350 index 8 priority 0 llprio 3 media: Ethernet autoselect (1000baseT full-duplex,master) status: active inet 10.10.22.3 netmask 0xff00 broadcast 10.10.22.255 root@ot42:.../~# ifconfig vlan0 hwfeatures vlan0: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 9198 lladdr 90:e2:ba:d6:23:68 index 24 priority 0 llprio 3 encap: vnetid 221 parent ix0 txprio packet rxprio outer groups: vlan media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause) status: active inet 10.10.21.2 netmask 0xff00 broadcast 10.10.21.255 root@ot42:.../~# pfctl -si Status: Disabled for 0 days 00:03:42 Debug: err Running tcpbench -n100 from Linux via OpenBSD forwarding to Linux. Simultaneous udpbench to create traffic mixture. root@ot42:.../~# netstat -ss | egrep 'TSO|LRO' 1188 output TSO packets software chopped 33086906 output TSO packets hardware processed 265855748 output TSO packets generated 31090975 input LRO generated packets from hardware 176482178 input LRO coalesced packets by network device Lot of LRO and TSO. Running diff below, which reverts em TSO backout and adds sparc64 fix. Hrvoje: What is different in your lab? bluhm Index: dev/pci/if_em.c === RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v diff -u -p -r1.371 if_em.c --- dev/pci/if_em.c 28 Jan 2024 18:42:58 - 1.371 +++ dev/pci/if_em.c 29 Jan 2024 14:37:36 - @@ -291,6 +291,8 @@ void em_receive_checksum(struct em_softc struct mbuf *); u_int em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); +u_int em_tso_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, + u_int32_t *); u_int em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); void em_iff(struct em_softc *); @@ -1188,7 +1190,7 @@ em_flowstatus(struct em_softc *sc) * * This routine maps the mbufs to tx descriptors. * - * return 0 on success, positive on failure + * return 0 on failure, positive on success **/ u_int em_encap(struct em_queue *que, struct mbuf *m) @@ -1236,7 +1238,15 @@ em_encap(struct em_queue *que, struct mb } if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { - used += em_tx_ctx_setup(que, m, head, _upper, _lower); + if (ISSET(m->m_pkthdr.csum_flags, M_TCP_TSO)) { + used += em_tso_setup(que, m, head, _upper, + _lower); + if (!used) + return (used); + } else { + used += em_tx_ctx_setup(que, m, head, _upper, + _lower); + } } else if (sc->hw.mac_type >= em_82543) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); @@ -1569,6 +1579,21 @@ em_update_link_status(struct em_softc *s ifp->if_link_state = link_state; if_link_state_change(ifp); } + + /* Disable TSO for 10/100 speeds to avoid some hardware issues */ + switch (sc->link_speed) { + case SPEED_10: + case SPEED_100: + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { + ifp->if_capabilities &= ~IFCAP_TSOv4; + ifp->if_capabilities &= ~IFCAP_TSOv6; + } + break; + case SPEED_1000: + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; + break; + } } /* @@ -1988,6 +2013,7 @@
Re: TSO em(4) problem
On 30.1.2024. 13:33, Alexander Bluhm wrote: > On Tue, Jan 30, 2024 at 12:07:08PM +0100, Hrvoje Popovski wrote: >> On 30.1.2024. 9:27, Hrvoje Popovski wrote: >>> I will prepare one box for this kind of traffic and will contact you and >>> marcus >>> In theory when going through vlan interface it should remove M_VLANTAG. But something must be wrong and I wonder what. bluhm >> >> Hi, >> >> I've managed to trigger watchdog in lab. It couldn't be possible without >> bluhm@ information about ix vlan, thank you. > > Great, now we can debug the details. > > I have to know how ix and em are connected. > > Do you have any bridge or veb? Where are your vlan trunks? > Any aggr, trunk, carp? no, only vlan on ix0. > Is my understanding of your setup corect? > > ix -> vlan -> forward -> em yes, and forwarding only without pf. I'm sending traffic from host connected to vlan/ix0 and forward through em5 to other host. I'm sending 1Gbps of traffic with cisco t-rex > Can something more happen, like > > ix -> forward -> em > In setup without vlan on ix I've got only one watchdog at the begging of testing and that's it. With vlan I'm getting around 6 or 7 watchdogs per minute which means 6 or 7 links going up/down. without vlan smc4# netstat -sp tcp | grep TSO 0 output TSO packets software chopped 268 output TSO packets hardware processed 0 output TSO packets generated 0 output TSO packets dropped smc4# netstat -sp tcp | grep LRO 0 input LRO packets passed through pseudo device 7666573 input LRO generated packets from hardware 21667579 input LRO coalesced packets by network device 0 input bad LRO packets dropped
Re: TSO em(4) problem
On Tue, Jan 30, 2024 at 12:07:08PM +0100, Hrvoje Popovski wrote: > On 30.1.2024. 9:27, Hrvoje Popovski wrote: > > I will prepare one box for this kind of traffic and will contact you and > > marcus > > > >> In theory when going through vlan interface it should remove > >> M_VLANTAG. But something must be wrong and I wonder what. > >> > >> bluhm > > Hi, > > I've managed to trigger watchdog in lab. It couldn't be possible without > bluhm@ information about ix vlan, thank you. Great, now we can debug the details. I have to know how ix and em are connected. Do you have any bridge or veb? Where are your vlan trunks? Any aggr, trunk, carp? Is my understanding of your setup corect? ix -> vlan -> forward -> em Can something more happen, like ix -> forward -> em bluhm > Jan 30 12:01:09 smc4 /bsd: em5: watchdog: head 123 tail 187 TDH 187 TDT 123 > Jan 30 12:01:18 smc4 /bsd: em5: watchdog: head 243 tail 307 TDH 307 TDT 243 > Jan 30 12:01:28 smc4 /bsd: em5: watchdog: head 463 tail 15 TDH 15 TDT 463 > Jan 30 12:01:37 smc4 /bsd: em5: watchdog: head 413 tail 477 TDH 477 TDT 413 > Jan 30 12:01:46 smc4 /bsd: em5: watchdog: head 195 tail 259 TDH 259 TDT 195 > Jan 30 12:01:55 smc4 /bsd: em5: watchdog: head 259 tail 323 TDH 323 TDT 259 > Jan 30 12:02:05 smc4 /bsd: em5: watchdog: head 333 tail 397 TDH 397 TDT 333 > Jan 30 12:02:14 smc4 /bsd: em5: watchdog: head 33 tail 97 TDH 97 TDT 33 > Jan 30 12:02:24 smc4 /bsd: em5: watchdog: head 459 tail 11 TDH 11 TDT 459 > Jan 30 12:02:33 smc4 /bsd: em5: watchdog: head 447 tail 511 TDH 511 TDT 447 > > > em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01: msi, address > 00:1b:21:61:8a:94 > em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01: msi, address > 00:1b:21:61:8a:95 > em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address > 00:25:90:5d:c9:98 > em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address > 00:25:90:5d:c9:99 > em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01: msi, address > 00:25:90:5d:c9:9a > em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01: msi, address > 00:25:90:5d:c9:9b > em6 at pci12 dev 0 function 2 "Intel I350" rev 0x01: msi, address > 00:25:90:5d:c9:9c > em7 at pci12 dev 0 function 3 "Intel I350" rev 0x01: msi, address > 00:25:90:5d:c9:9d > > > smc4# netstat -sp tcp | grep LRO > 0 input LRO packets passed through pseudo device > 4696315 input LRO generated packets from hardware > 13205047 input LRO coalesced packets by network device > 0 input bad LRO packets dropped > smc4# netstat -sp tcp | grep TSO > 0 output TSO packets software chopped > 3672 output TSO packets hardware processed > 0 output TSO packets generated > 0 output TSO packets dropped > > > > > smc4# ifconfig em5 hwfeatures > em5: flags=8c43 mtu 1500 > > hwfeatures=31b7 > hardmtu 9216 > lladdr 00:25:90:5d:c9:9b > index 8 priority 0 llprio 3 > media: Ethernet autoselect (1000baseT > full-duplex,master,rxpause,txpause) > status: active > inet 192.168.20.1 netmask 0xff00 broadcast 192.168.20.255 >
Re: TSO em(4) problem
On 30.1.2024. 9:27, Hrvoje Popovski wrote: > I will prepare one box for this kind of traffic and will contact you and > marcus > >> In theory when going through vlan interface it should remove >> M_VLANTAG. But something must be wrong and I wonder what. >> >> bluhm Hi, I've managed to trigger watchdog in lab. It couldn't be possible without bluhm@ information about ix vlan, thank you. Jan 30 12:01:09 smc4 /bsd: em5: watchdog: head 123 tail 187 TDH 187 TDT 123 Jan 30 12:01:18 smc4 /bsd: em5: watchdog: head 243 tail 307 TDH 307 TDT 243 Jan 30 12:01:28 smc4 /bsd: em5: watchdog: head 463 tail 15 TDH 15 TDT 463 Jan 30 12:01:37 smc4 /bsd: em5: watchdog: head 413 tail 477 TDH 477 TDT 413 Jan 30 12:01:46 smc4 /bsd: em5: watchdog: head 195 tail 259 TDH 259 TDT 195 Jan 30 12:01:55 smc4 /bsd: em5: watchdog: head 259 tail 323 TDH 323 TDT 259 Jan 30 12:02:05 smc4 /bsd: em5: watchdog: head 333 tail 397 TDH 397 TDT 333 Jan 30 12:02:14 smc4 /bsd: em5: watchdog: head 33 tail 97 TDH 97 TDT 33 Jan 30 12:02:24 smc4 /bsd: em5: watchdog: head 459 tail 11 TDH 11 TDT 459 Jan 30 12:02:33 smc4 /bsd: em5: watchdog: head 447 tail 511 TDH 511 TDT 447 em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01: msi, address 00:1b:21:61:8a:94 em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01: msi, address 00:1b:21:61:8a:95 em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:25:90:5d:c9:98 em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:25:90:5d:c9:99 em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9a em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9b em6 at pci12 dev 0 function 2 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9c em7 at pci12 dev 0 function 3 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9d smc4# netstat -sp tcp | grep LRO 0 input LRO packets passed through pseudo device 4696315 input LRO generated packets from hardware 13205047 input LRO coalesced packets by network device 0 input bad LRO packets dropped smc4# netstat -sp tcp | grep TSO 0 output TSO packets software chopped 3672 output TSO packets hardware processed 0 output TSO packets generated 0 output TSO packets dropped smc4# ifconfig em5 hwfeatures em5: flags=8c43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 00:25:90:5d:c9:9b index 8 priority 0 llprio 3 media: Ethernet autoselect (1000baseT full-duplex,master,rxpause,txpause) status: active inet 192.168.20.1 netmask 0xff00 broadcast 192.168.20.255
Re: TSO em(4) problem
On 29.1.2024. 15:29, Alexander Bluhm wrote: > On Sat, Jan 27, 2024 at 08:08:35AM +0100, Hrvoje Popovski wrote: >> On 26.1.2024. 22:47, Alexander Bluhm wrote: >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: I've manage to reproduce TSO em problem on anoter setup, unfortunatly production. >>> What helped debugging a similar issue with ixl(4) and TSO was to >>> remove all TSO specific code from the driver. Then only this part >>> remains from the original em(4) TSO diff. >>> >>> error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, >>> EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), >>> EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); >>> >>> The parameters that changed when adding TSO are: >>> >>> bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535 >>> bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE >>> 4096 >>> >>> I suspect that this is the cause for the regression as disabling >>> TSO did not help. Would it be possible to run the diff below? I >>> expect that the problem will still be there. But then we know it >>> must be the change of one of the bus_dmamap_create() arguments. >>> >>> bluhm >> >> Hi, >> >> with this diff em0 seems happy and em watchdog is gone. > > This is very interesting. That means that the bus_dmamap_create() > argument does not cause the regression. > > Did you see anywhere "output TSO packets hardware processed in" > netstat -s. In some iteration of testing you turned TSO off with > sysctl net.inet.tcp.tso=0, but it did not help. So no TSO packets > from the stack. > > In another mail you mentioned > >> Setup is very simple >> em0 - carp <- uplink >> em1 - pfsync >> ix1 - vlans - carp > > ix supports LRO. If you forward from ix1 to em0 the LRO packets > from ix hardware are split by TSO on em hardware. And the ix does > vlan offloading + LRO, so em must do vlan offloading properly with > TSO. Or do you use a vlan interface? > > Does it help to disable LRO, ifconfig ix1 -tcplro ? Yes, it helps... Thank you uplink em0: flags=8b43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,master,rxpause) status: active vlans are on ix1 - I've disabled LRO ix1: flags=8b43 mtu 1500 lladdr 90:e2:ba:d7:1b:f5 index 2 priority 0 llprio 3 media: Ethernet autoselect (10GbaseSR full-duplex,rxpause,txpause) status: active before I've disabled LRO on ix1 I've got lot of watchdog on em0 bcbnfw1# uptime 9:25AM up 8 mins, 1 user, load averages: 0.14, 0.13, 0.06 bcbnfw1# cat /var/log/messages| grep watchdog Jan 30 09:18:51 bcbnfw1 /bsd: em0: watchdog: head 148 tail 213 TDH 213 TDT 148 Jan 30 09:19:01 bcbnfw1 /bsd: em0: watchdog: head 160 tail 224 TDH 224 TDT 160 Jan 30 09:19:12 bcbnfw1 /bsd: em0: watchdog: head 163 tail 228 TDH 228 TDT 163 Jan 30 09:19:22 bcbnfw1 /bsd: em0: watchdog: head 128 tail 192 TDH 192 TDT 128 Jan 30 09:19:32 bcbnfw1 /bsd: em0: watchdog: head 309 tail 373 TDH 373 TDT 309 Jan 30 09:19:41 bcbnfw1 /bsd: em0: watchdog: head 113 tail 177 TDH 177 TDT 113 Jan 30 09:19:51 bcbnfw1 /bsd: em0: watchdog: head 402 tail 466 TDH 466 TDT 402 Jan 30 09:20:01 bcbnfw1 /bsd: em0: watchdog: head 114 tail 178 TDH 178 TDT 114 Jan 30 09:20:16 bcbnfw1 /bsd: em0: watchdog: head 111 tail 175 TDH 175 TDT 111 Jan 30 09:20:26 bcbnfw1 /bsd: em0: watchdog: head 199 tail 263 TDH 263 TDT 199 without LRO on ix1 everything seems to work just fine ... > > I see this vlan code with mac_type checks. Can we end in a > configuration where we enable TSO but cannot do VLAN offloading? > > #if NVLAN > 0 > /* Find out if we are in VLAN mode */ > if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 || > sc->hw.mac_type > em_i210)) { > /* Set the VLAN id */ > desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag); > > /* Tell hardware to add tag */ > desc->lower.data |= htole32(E1000_TXD_CMD_VLE); > } > #endif > > Hrvoje, I know you do great tests in your lab. Did you try this > setup: > > Send bulk TCP traffic in vlan that will trigger LRO. > Do VLAN + LRO offloading in ix. > Forward it to em with TSO. > I will prepare one box for this kind of traffic and will contact you and marcus > In theory when going through vlan interface it should remove > M_VLANTAG. But something must be wrong and I wonder what. > > bluhm >
Re: TSO em(4) problem
On Sun, Jan 28, 2024 at 07:46:29PM +0100, Marcus Glocker wrote: > Anyway, the TSO support just has been backed out. Thanks again for all > your testing! I am still interested to get em with TSO working if possible. Most use cases work fine. If there is a bug in our driver, we may fix it. If it is hardware bug, we should identitfy the broken chip revisions. Here is the backed out em TSO diff together with the TCP header diff for sparc64. Kurt, could you still test this in your next sparc64 build? bluhm Index: dev/pci/if_em.c === RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v diff -u -p -r1.371 if_em.c --- dev/pci/if_em.c 28 Jan 2024 18:42:58 - 1.371 +++ dev/pci/if_em.c 29 Jan 2024 14:37:36 - @@ -291,6 +291,8 @@ void em_receive_checksum(struct em_softc struct mbuf *); u_int em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); +u_int em_tso_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, + u_int32_t *); u_int em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); void em_iff(struct em_softc *); @@ -1188,7 +1190,7 @@ em_flowstatus(struct em_softc *sc) * * This routine maps the mbufs to tx descriptors. * - * return 0 on success, positive on failure + * return 0 on failure, positive on success **/ u_int em_encap(struct em_queue *que, struct mbuf *m) @@ -1236,7 +1238,15 @@ em_encap(struct em_queue *que, struct mb } if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { - used += em_tx_ctx_setup(que, m, head, _upper, _lower); + if (ISSET(m->m_pkthdr.csum_flags, M_TCP_TSO)) { + used += em_tso_setup(que, m, head, _upper, + _lower); + if (!used) + return (used); + } else { + used += em_tx_ctx_setup(que, m, head, _upper, + _lower); + } } else if (sc->hw.mac_type >= em_82543) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); @@ -1569,6 +1579,21 @@ em_update_link_status(struct em_softc *s ifp->if_link_state = link_state; if_link_state_change(ifp); } + + /* Disable TSO for 10/100 speeds to avoid some hardware issues */ + switch (sc->link_speed) { + case SPEED_10: + case SPEED_100: + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { + ifp->if_capabilities &= ~IFCAP_TSOv4; + ifp->if_capabilities &= ~IFCAP_TSOv6; + } + break; + case SPEED_1000: + if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; + break; + } } /* @@ -1988,6 +2013,7 @@ em_setup_interface(struct em_softc *sc) if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; } /* @@ -2231,9 +2257,9 @@ em_setup_transmit_structures(struct em_s for (i = 0; i < sc->sc_tx_slots; i++) { pkt = >tx.sc_tx_pkts_ring[i]; - error = bus_dmamap_create(sc->sc_dmat, MAX_JUMBO_FRAME_SIZE, + error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), - MAX_JUMBO_FRAME_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); + EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); if (error != 0) { printf("%s: Unable to create TX DMA map\n", DEVNAME(sc)); @@ -2403,6 +2429,81 @@ em_free_transmit_structures(struct em_so 0, que->tx.sc_tx_dma.dma_map->dm_mapsize, BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); } +} + +u_int +em_tso_setup(struct em_queue *que, struct mbuf *mp, u_int head, +u_int32_t *olinfo_status, u_int32_t *cmd_type_len) +{ + struct ether_extracted ext; + struct e1000_adv_tx_context_desc *TD; + uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0; + uint32_t paylen = 0; + uint8_t iphlen = 0; + + *olinfo_status = 0; + *cmd_type_len = 0; + TD = (struct e1000_adv_tx_context_desc
Re: TSO em(4) problem
On Sat, Jan 27, 2024 at 08:08:35AM +0100, Hrvoje Popovski wrote: > On 26.1.2024. 22:47, Alexander Bluhm wrote: > > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly > >> production. > > What helped debugging a similar issue with ixl(4) and TSO was to > > remove all TSO specific code from the driver. Then only this part > > remains from the original em(4) TSO diff. > > > > error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, > > EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), > > EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); > > > > The parameters that changed when adding TSO are: > > > > bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535 > > bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE > > 4096 > > > > I suspect that this is the cause for the regression as disabling > > TSO did not help. Would it be possible to run the diff below? I > > expect that the problem will still be there. But then we know it > > must be the change of one of the bus_dmamap_create() arguments. > > > > bluhm > > Hi, > > with this diff em0 seems happy and em watchdog is gone. This is very interesting. That means that the bus_dmamap_create() argument does not cause the regression. Did you see anywhere "output TSO packets hardware processed in" netstat -s. In some iteration of testing you turned TSO off with sysctl net.inet.tcp.tso=0, but it did not help. So no TSO packets from the stack. In another mail you mentioned > Setup is very simple > em0 - carp <- uplink > em1 - pfsync > ix1 - vlans - carp ix supports LRO. If you forward from ix1 to em0 the LRO packets from ix hardware are split by TSO on em hardware. And the ix does vlan offloading + LRO, so em must do vlan offloading properly with TSO. Or do you use a vlan interface? Does it help to disable LRO, ifconfig ix1 -tcplro ? I see this vlan code with mac_type checks. Can we end in a configuration where we enable TSO but cannot do VLAN offloading? #if NVLAN > 0 /* Find out if we are in VLAN mode */ if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 || sc->hw.mac_type > em_i210)) { /* Set the VLAN id */ desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag); /* Tell hardware to add tag */ desc->lower.data |= htole32(E1000_TXD_CMD_VLE); } #endif Hrvoje, I know you do great tests in your lab. Did you try this setup: Send bulk TCP traffic in vlan that will trigger LRO. Do VLAN + LRO offloading in ix. Forward it to em with TSO. In theory when going through vlan interface it should remove M_VLANTAG. But something must be wrong and I wonder what. bluhm
Re: TSO em(4) problem
On Sun, Jan 28, 2024 at 02:01:29PM +0100, Hrvoje Popovski wrote: > Hi, > > > with this diff I still see TSOv4 and TSOv6 on i350 is this ok ? Nope, the intention was to disable TSO support for the i350 ... Anyway, the TSO support just has been backed out. Thanks again for all your testing!
Re: TSO em(4) problem
On Sun, Jan 28, 2024 at 02:11:37PM +0100, Mark Kettenis wrote: > > Date: Sun, 28 Jan 2024 10:44:25 +0100 > > From: Marcus Glocker > > > > On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote: > > > > > On 27.1.2024. 21:01, Marcus Glocker wrote: > > > > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: > > > > > > > >> On 26.1.2024. 21:56, Marcus Glocker wrote: > > > >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > > > >>> > > > I've manage to reproduce TSO em problem on anoter setup, unfortunatly > > > production. > > > > > > Setup is very simple > > > > > > em0 - carp <- uplink > > > em1 - pfsync > > > ix1 - vlans - carp > > > >>> Would it be possible that you also share an "ifconfig -a hwfeatures" > > > >>> of > > > >>> that box? You can mask the IPs if it's too sensitive. > > > >>> > > > >>> I still try to reproduce the issue here, and for now I can't. > > > >>> Maybe in your full ifconfig output I can see some specifics about your > > > >>> configuration, which makes it more likely to reproduce the issue here. > > > >>> > > > >> Hi, > > > >> > > > >> here's ifconfig from second setup where watchdog is triggered much > > > >> faster. > > > >> Originally in this setup uplink is ix0, I've change that to em0 to see > > > >> would the problem be same as in other setup and it is, and that's good > > > >> because this is pfsync setup for students and I can do whatever I want > > > >> with it :) > > > > Thanks. > > > > > > > > But still, I can do whatever I want on my em(4) I210 box, carp(4), > > > > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which > > > > you are seeing :-( I'm not sure if this is something related to your > > > > I350. > > > > > > > > Also, I can't understand why the watchdog still triggers when you > > > > disable > > > > TSO by setting net.inet.tcp.tso=0. > > > > > > > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, > > > > while EM_TSO_SIZE (65535) is one byte less, can you please apply this > > > > diff to -current and test it? I doubt it will make a difference, but > > > > I'm running a bit out of ideas here. > > > > > > > > > Hi, > > > > > > with this diff I'm still getting em watchdog > > > > > > Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 > > > TDT 120 > > > > Thanks for testing again. > > > > I think we might have a generic problem with TSO with the current em(4) > > code and some chips. Referring to this recent FreeBSD commit. > > > > e1000: disable TSO on lem(4) and em(4): > > Disable TSO on lem(4) and em(4) until a ring stall can be debugged. > > https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180 > > > > Can you try this diff to specifically disable TSO for I350 please? > > > > We will need to discuss internally which way to go. I see those > > options currently: > > > > - Entirely pull out the TSO diff. > > - Leave the TSO code in but disable TSO for now (what FreeBSD did). > > - Leave the TSO code in but disable TSO only for chips we see issues > > with (this diff). > > Frankly, I think it is time to just pull the diff. Between this issue > and the sparc64 unaligned access thing there is just too much breakage > for relatively little gain (since this is only a gigabit Ethernet). > > Cheers, > > Mark OK. I'll pull it out a bit later today.
Re: TSO em(4) problem
> Date: Sun, 28 Jan 2024 10:44:25 +0100 > From: Marcus Glocker > > On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote: > > > On 27.1.2024. 21:01, Marcus Glocker wrote: > > > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: > > > > > >> On 26.1.2024. 21:56, Marcus Glocker wrote: > > >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > > >>> > > I've manage to reproduce TSO em problem on anoter setup, unfortunatly > > production. > > > > Setup is very simple > > > > em0 - carp <- uplink > > em1 - pfsync > > ix1 - vlans - carp > > >>> Would it be possible that you also share an "ifconfig -a hwfeatures" of > > >>> that box? You can mask the IPs if it's too sensitive. > > >>> > > >>> I still try to reproduce the issue here, and for now I can't. > > >>> Maybe in your full ifconfig output I can see some specifics about your > > >>> configuration, which makes it more likely to reproduce the issue here. > > >>> > > >> Hi, > > >> > > >> here's ifconfig from second setup where watchdog is triggered much > > >> faster. > > >> Originally in this setup uplink is ix0, I've change that to em0 to see > > >> would the problem be same as in other setup and it is, and that's good > > >> because this is pfsync setup for students and I can do whatever I want > > >> with it :) > > > Thanks. > > > > > > But still, I can do whatever I want on my em(4) I210 box, carp(4), > > > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which > > > you are seeing :-( I'm not sure if this is something related to your > > > I350. > > > > > > Also, I can't understand why the watchdog still triggers when you disable > > > TSO by setting net.inet.tcp.tso=0. > > > > > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, > > > while EM_TSO_SIZE (65535) is one byte less, can you please apply this > > > diff to -current and test it? I doubt it will make a difference, but > > > I'm running a bit out of ideas here. > > > > > > Hi, > > > > with this diff I'm still getting em watchdog > > > > Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 > > TDT 120 > > Thanks for testing again. > > I think we might have a generic problem with TSO with the current em(4) > code and some chips. Referring to this recent FreeBSD commit. > > e1000: disable TSO on lem(4) and em(4): > Disable TSO on lem(4) and em(4) until a ring stall can be debugged. > https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180 > > Can you try this diff to specifically disable TSO for I350 please? > > We will need to discuss internally which way to go. I see those > options currently: > > - Entirely pull out the TSO diff. > - Leave the TSO code in but disable TSO for now (what FreeBSD did). > - Leave the TSO code in but disable TSO only for chips we see issues > with (this diff). Frankly, I think it is time to just pull the diff. Between this issue and the sparc64 unaligned access thing there is just too much breakage for relatively little gain (since this is only a gigabit Ethernet). Cheers, Mark > Index: if_em.c > === > RCS file: /cvs/src/sys/dev/pci/if_em.c,v > diff -u -p -u -p -r1.370 if_em.c > --- if_em.c 31 Dec 2023 08:42:33 - 1.370 > +++ if_em.c 28 Jan 2024 09:30:59 - > @@ -2013,7 +2013,9 @@ em_setup_interface(struct em_softc *sc) > if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { > ifp->if_capabilities |= IFCAP_CSUM_IPv4; > ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; > - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; > + /* XXX: Enabling TSO on I350 causes watchdogs */ > + if (sc->hw.mac_type != em_i350) > + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; > } > > /* > >
Re: TSO em(4) problem
On 28.1.2024. 10:44, Marcus Glocker wrote: > On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote: > >> On 27.1.2024. 21:01, Marcus Glocker wrote: >>> On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: >>> On 26.1.2024. 21:56, Marcus Glocker wrote: > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >> production. >> >> Setup is very simple >> >> em0 - carp <- uplink >> em1 - pfsync >> ix1 - vlans - carp > Would it be possible that you also share an "ifconfig -a hwfeatures" of > that box? You can mask the IPs if it's too sensitive. > > I still try to reproduce the issue here, and for now I can't. > Maybe in your full ifconfig output I can see some specifics about your > configuration, which makes it more likely to reproduce the issue here. > Hi, here's ifconfig from second setup where watchdog is triggered much faster. Originally in this setup uplink is ix0, I've change that to em0 to see would the problem be same as in other setup and it is, and that's good because this is pfsync setup for students and I can do whatever I want with it :) >>> Thanks. >>> >>> But still, I can do whatever I want on my em(4) I210 box, carp(4), >>> vlan(4), creating a lot of traffic, I can't reproduce the watchdog which >>> you are seeing :-( I'm not sure if this is something related to your >>> I350. >>> >>> Also, I can't understand why the watchdog still triggers when you disable >>> TSO by setting net.inet.tcp.tso=0. >>> >>> Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, >>> while EM_TSO_SIZE (65535) is one byte less, can you please apply this >>> diff to -current and test it? I doubt it will make a difference, but >>> I'm running a bit out of ideas here. >> >> >> Hi, >> >> with this diff I'm still getting em watchdog >> >> Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 >> TDT 120 > > Thanks for testing again. > > I think we might have a generic problem with TSO with the current em(4) > code and some chips. Referring to this recent FreeBSD commit. > > e1000: disable TSO on lem(4) and em(4): > Disable TSO on lem(4) and em(4) until a ring stall can be debugged. > https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180 > > Can you try this diff to specifically disable TSO for I350 please? > > We will need to discuss internally which way to go. I see those > options currently: > > - Entirely pull out the TSO diff. > - Leave the TSO code in but disable TSO for now (what FreeBSD did). > - Leave the TSO code in but disable TSO only for chips we see issues > with (this diff). > Hi, with this diff I still see TSOv4 and TSOv6 on i350 is this ok ? em0 watchgod is triggered with or without net.inet.tcp.tso=1/0 em0: flags=8b43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,master,rxpause) status: active em0 at pci7 dev 0 function 0 "Intel I350" rev 0x01: msi, address Jan 28 13:18:45 bcbnfw1 /bsd: em0: watchdog: head 89 tail 153 TDH 153 TDT 89 Jan 28 13:41:19 bcbnfw1 /bsd: em0: watchdog: head 336 tail 400 TDH 400 TDT 336 Jan 28 13:58:13 bcbnfw1 /bsd: em0: watchdog: head 172 tail 236 TDH 236 TDT 172 > > Index: if_em.c > === > RCS file: /cvs/src/sys/dev/pci/if_em.c,v > diff -u -p -u -p -r1.370 if_em.c > --- if_em.c 31 Dec 2023 08:42:33 - 1.370 > +++ if_em.c 28 Jan 2024 09:30:59 - > @@ -2013,7 +2013,9 @@ em_setup_interface(struct em_softc *sc) > if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { > ifp->if_capabilities |= IFCAP_CSUM_IPv4; > ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; > - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; > + /* XXX: Enabling TSO on I350 causes watchdogs */ > + if (sc->hw.mac_type != em_i350) > + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; > } > > /* >
Re: TSO em(4) problem
On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote: > On 27.1.2024. 21:01, Marcus Glocker wrote: > > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: > > > >> On 26.1.2024. 21:56, Marcus Glocker wrote: > >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > >>> > I've manage to reproduce TSO em problem on anoter setup, unfortunatly > production. > > Setup is very simple > > em0 - carp <- uplink > em1 - pfsync > ix1 - vlans - carp > >>> Would it be possible that you also share an "ifconfig -a hwfeatures" of > >>> that box? You can mask the IPs if it's too sensitive. > >>> > >>> I still try to reproduce the issue here, and for now I can't. > >>> Maybe in your full ifconfig output I can see some specifics about your > >>> configuration, which makes it more likely to reproduce the issue here. > >>> > >> Hi, > >> > >> here's ifconfig from second setup where watchdog is triggered much faster. > >> Originally in this setup uplink is ix0, I've change that to em0 to see > >> would the problem be same as in other setup and it is, and that's good > >> because this is pfsync setup for students and I can do whatever I want > >> with it :) > > Thanks. > > > > But still, I can do whatever I want on my em(4) I210 box, carp(4), > > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which > > you are seeing :-( I'm not sure if this is something related to your > > I350. > > > > Also, I can't understand why the watchdog still triggers when you disable > > TSO by setting net.inet.tcp.tso=0. > > > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, > > while EM_TSO_SIZE (65535) is one byte less, can you please apply this > > diff to -current and test it? I doubt it will make a difference, but > > I'm running a bit out of ideas here. > > > Hi, > > with this diff I'm still getting em watchdog > > Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 > TDT 120 Thanks for testing again. I think we might have a generic problem with TSO with the current em(4) code and some chips. Referring to this recent FreeBSD commit. e1000: disable TSO on lem(4) and em(4): Disable TSO on lem(4) and em(4) until a ring stall can be debugged. https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180 Can you try this diff to specifically disable TSO for I350 please? We will need to discuss internally which way to go. I see those options currently: - Entirely pull out the TSO diff. - Leave the TSO code in but disable TSO for now (what FreeBSD did). - Leave the TSO code in but disable TSO only for chips we see issues with (this diff). Index: if_em.c === RCS file: /cvs/src/sys/dev/pci/if_em.c,v diff -u -p -u -p -r1.370 if_em.c --- if_em.c 31 Dec 2023 08:42:33 - 1.370 +++ if_em.c 28 Jan 2024 09:30:59 - @@ -2013,7 +2013,9 @@ em_setup_interface(struct em_softc *sc) if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; + /* XXX: Enabling TSO on I350 causes watchdogs */ + if (sc->hw.mac_type != em_i350) + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; } /*
Re: TSO em(4) problem
On 27.1.2024. 21:01, Marcus Glocker wrote: > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: > >> On 26.1.2024. 21:56, Marcus Glocker wrote: >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: >>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly production. Setup is very simple em0 - carp <- uplink em1 - pfsync ix1 - vlans - carp >>> Would it be possible that you also share an "ifconfig -a hwfeatures" of >>> that box? You can mask the IPs if it's too sensitive. >>> >>> I still try to reproduce the issue here, and for now I can't. >>> Maybe in your full ifconfig output I can see some specifics about your >>> configuration, which makes it more likely to reproduce the issue here. >>> >> Hi, >> >> here's ifconfig from second setup where watchdog is triggered much faster. >> Originally in this setup uplink is ix0, I've change that to em0 to see >> would the problem be same as in other setup and it is, and that's good >> because this is pfsync setup for students and I can do whatever I want >> with it :) > Thanks. > > But still, I can do whatever I want on my em(4) I210 box, carp(4), > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which > you are seeing :-( I'm not sure if this is something related to your > I350. > > Also, I can't understand why the watchdog still triggers when you disable > TSO by setting net.inet.tcp.tso=0. > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, > while EM_TSO_SIZE (65535) is one byte less, can you please apply this > diff to -current and test it? I doubt it will make a difference, but > I'm running a bit out of ideas here. Hi, with this diff I'm still getting em watchdog Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 TDT 120
Re: TSO em(4) problem
On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: > On 26.1.2024. 21:56, Marcus Glocker wrote: > > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > > > >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly > >> production. > >> > >> Setup is very simple > >> > >> em0 - carp <- uplink > >> em1 - pfsync > >> ix1 - vlans - carp > > > > Would it be possible that you also share an "ifconfig -a hwfeatures" of > > that box? You can mask the IPs if it's too sensitive. > > > > I still try to reproduce the issue here, and for now I can't. > > Maybe in your full ifconfig output I can see some specifics about your > > configuration, which makes it more likely to reproduce the issue here. > > > > Hi, > > here's ifconfig from second setup where watchdog is triggered much faster. > Originally in this setup uplink is ix0, I've change that to em0 to see > would the problem be same as in other setup and it is, and that's good > because this is pfsync setup for students and I can do whatever I want > with it :) Thanks. But still, I can do whatever I want on my em(4) I210 box, carp(4), vlan(4), creating a lot of traffic, I can't reproduce the watchdog which you are seeing :-( I'm not sure if this is something related to your I350. Also, I can't understand why the watchdog still triggers when you disable TSO by setting net.inet.tcp.tso=0. Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, while EM_TSO_SIZE (65535) is one byte less, can you please apply this diff to -current and test it? I doubt it will make a difference, but I'm running a bit out of ideas here. Index: if_em.c === RCS file: /cvs/src/sys/dev/pci/if_em.c,v diff -u -p -u -p -r1.370 if_em.c --- if_em.c 31 Dec 2023 08:42:33 - 1.370 +++ if_em.c 20 Jan 2024 21:16:57 - @@ -2257,7 +2257,7 @@ em_setup_transmit_structures(struct em_s for (i = 0; i < sc->sc_tx_slots; i++) { pkt = >tx.sc_tx_pkts_ring[i]; - error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, + error = bus_dmamap_create(sc->sc_dmat, MAXMCLBYTES, EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); if (error != 0) {
Re: TSO em(4) problem
On 26.1.2024. 22:47, Alexander Bluhm wrote: > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >> production. > What helped debugging a similar issue with ixl(4) and TSO was to > remove all TSO specific code from the driver. Then only this part > remains from the original em(4) TSO diff. > > error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, > EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), > EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); > > The parameters that changed when adding TSO are: > > bus_size_t size: MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535 > bus_size_t maxsegsz: MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 4096 > > I suspect that this is the cause for the regression as disabling > TSO did not help. Would it be possible to run the diff below? I > expect that the problem will still be there. But then we know it > must be the change of one of the bus_dmamap_create() arguments. > > bluhm Hi, with this diff em0 seems happy and em watchdog is gone. bcbnfw1# uptime 8:06AM up 44 mins, 2 users, load averages: 0.00, 0.00, 0.00 bcbnfw1# ifconfig em0 hwfeatures em0: flags=8b43 mtu 1500 hwfeatures=1b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,master,rxpause) status: active inet 10.10.155.234 netmask 0xfff8 broadcast 10.10.155.239 This morning without diff bcbnfw1# cat /var/log/messages | grep watchdog Jan 27 07:12:03 bcbnfw1 /bsd: em0: watchdog: head 50 tail 114 TDH 114 TDT 50 Jan 27 07:15:29 bcbnfw1 /bsd: em0: watchdog: head 370 tail 434 TDH 434 TDT 370 Jan 27 07:15:43 bcbnfw1 /bsd: em0: watchdog: head 219 tail 283 TDH 283 TDT 219 Jan 27 07:15:54 bcbnfw1 /bsd: em0: watchdog: head 322 tail 386 TDH 386 TDT 322 Jan 27 07:16:08 bcbnfw1 /bsd: em0: watchdog: head 115 tail 179 TDH 179 TDT 115 Jan 27 07:16:21 bcbnfw1 /bsd: em0: watchdog: head 364 tail 428 TDH 428 TDT 364 Jan 27 07:16:35 bcbnfw1 /bsd: em0: watchdog: head 473 tail 26 TDH 26 TDT 473
Re: TSO em(4) problem
On 26.1.2024. 21:56, Marcus Glocker wrote: > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >> production. >> >> Setup is very simple >> >> em0 - carp <- uplink >> em1 - pfsync >> ix1 - vlans - carp > > Would it be possible that you also share an "ifconfig -a hwfeatures" of > that box? You can mask the IPs if it's too sensitive. > > I still try to reproduce the issue here, and for now I can't. > Maybe in your full ifconfig output I can see some specifics about your > configuration, which makes it more likely to reproduce the issue here. > Hi, here's ifconfig from second setup where watchdog is triggered much faster. Originally in this setup uplink is ix0, I've change that to em0 to see would the problem be same as in other setup and it is, and that's good because this is pfsync setup for students and I can do whatever I want with it :) bcbnfw1# ifconfig -a hwfeatures lo0: flags=2008049 mtu 32768 hwfeatures=7187 index 6 priority 0 llprio 3 groups: lo inet 127.0.0.1 netmask 0xff00 ix0: flags=2008802 mtu 1500 hwfeatures=71b7 hardmtu 9198 lladdr 90:e2:ba:d7:1b:f4 index 1 priority 0 llprio 3 media: Ethernet autoselect (10GbaseSR full-duplex) status: active ix1: flags=2008b43 mtu 1500 hwfeatures=71b7 hardmtu 9198 lladdr 90:e2:ba:d7:1b:f5 index 2 priority 0 llprio 3 media: Ethernet autoselect (10GbaseSR full-duplex,rxpause,txpause) status: active em0: flags=8b43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,rxpause) status: active inet 10.10.155.234 netmask 0xfff8 broadcast 10.10.155.239 em1: flags=8843 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5b index 4 priority 0 llprio 3 media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause) status: active inet 192.168.0.77 netmask 0xfffc broadcast 192.168.0.79 enc0: flags=0<> hwfeatures=0<> index 5 priority 0 llprio 3 groups: enc status: active carp0: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:01 index 7 priority 15 llprio 3 carp: MASTER carpdev em0 vhid 1 advbase 1 advskew 10 groups: carp status: master inet 10.10.155.236 netmask 0x carp1100: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:12 index 8 priority 15 llprio 3 carp: MASTER carpdev vlan1100 vhid 18 advbase 1 advskew 10 groups: carp status: master inet 10.30.16.1 netmask 0x carp1101: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:16 index 9 priority 15 llprio 3 carp: MASTER carpdev vlan1101 vhid 22 advbase 1 advskew 10 groups: carp status: master inet 10.31.16.1 netmask 0x carp1102: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:19 index 10 priority 15 llprio 3 carp: MASTER carpdev vlan1102 vhid 25 advbase 1 advskew 10 groups: carp status: master inet 10.32.16.1 netmask 0x carp1103: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:1c index 11 priority 15 llprio 3 carp: MASTER carpdev vlan1103 vhid 28 advbase 1 advskew 10 groups: carp status: master inet 10.33.16.1 netmask 0x carp1130: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:13 index 12 priority 15 llprio 3 carp: MASTER carpdev vlan1130 vhid 19 advbase 1 advskew 10 groups: carp status: master inet 10.30.0.1 netmask 0x carp1131: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:17 index 13 priority 15 llprio 3 carp: MASTER carpdev vlan1131 vhid 23 advbase 1 advskew 10 groups: carp status: master inet 10.31.0.1 netmask 0x carp1132: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:1a index 14 priority 15 llprio 3 carp: MASTER carpdev vlan1132 vhid 26 advbase 1 advskew 10 groups: carp status: master inet 10.32.0.1 netmask 0x carp1133: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:1d index 15 priority 15 llprio 3 carp: MASTER carpdev vlan1133 vhid 29 advbase 1 advskew 10 groups: carp status: master inet 10.33.0.1 netmask 0x carp1150: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:14 index 16 priority 15 llprio 3 carp: MASTER carpdev vlan1150
Re: TSO em(4) problem
On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > I've manage to reproduce TSO em problem on anoter setup, unfortunatly > production. What helped debugging a similar issue with ixl(4) and TSO was to remove all TSO specific code from the driver. Then only this part remains from the original em(4) TSO diff. error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); The parameters that changed when adding TSO are: bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535 bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 4096 I suspect that this is the cause for the regression as disabling TSO did not help. Would it be possible to run the diff below? I expect that the problem will still be there. But then we know it must be the change of one of the bus_dmamap_create() arguments. bluhm Index: dev/pci/if_em.c === RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_em.c,v diff -u -p -r1.370 if_em.c --- dev/pci/if_em.c 31 Dec 2023 08:42:33 - 1.370 +++ dev/pci/if_em.c 26 Jan 2024 21:32:08 - @@ -291,8 +291,6 @@ void em_receive_checksum(struct em_softc struct mbuf *); u_int em_transmit_checksum_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); -u_int em_tso_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, - u_int32_t *); u_int em_tx_ctx_setup(struct em_queue *, struct mbuf *, u_int, u_int32_t *, u_int32_t *); void em_iff(struct em_softc *); @@ -1238,15 +1236,7 @@ em_encap(struct em_queue *que, struct mb } if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { - if (ISSET(m->m_pkthdr.csum_flags, M_TCP_TSO)) { - used += em_tso_setup(que, m, head, _upper, - _lower); - if (!used) - return (used); - } else { - used += em_tx_ctx_setup(que, m, head, _upper, - _lower); - } + used += em_tx_ctx_setup(que, m, head, _upper, _lower); } else if (sc->hw.mac_type >= em_82543) { used += em_transmit_checksum_setup(que, m, head, _upper, _lower); @@ -1579,21 +1569,6 @@ em_update_link_status(struct em_softc *s ifp->if_link_state = link_state; if_link_state_change(ifp); } - - /* Disable TSO for 10/100 speeds to avoid some hardware issues */ - switch (sc->link_speed) { - case SPEED_10: - case SPEED_100: - if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { - ifp->if_capabilities &= ~IFCAP_TSOv4; - ifp->if_capabilities &= ~IFCAP_TSOv6; - } - break; - case SPEED_1000: - if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; - break; - } } /* @@ -2013,7 +1988,6 @@ em_setup_interface(struct em_softc *sc) if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { ifp->if_capabilities |= IFCAP_CSUM_IPv4; ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; } /* @@ -2429,81 +2403,6 @@ em_free_transmit_structures(struct em_so 0, que->tx.sc_tx_dma.dma_map->dm_mapsize, BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); } -} - -u_int -em_tso_setup(struct em_queue *que, struct mbuf *mp, u_int head, -u_int32_t *olinfo_status, u_int32_t *cmd_type_len) -{ - struct ether_extracted ext; - struct e1000_adv_tx_context_desc *TD; - uint32_t vlan_macip_lens = 0, type_tucmd_mlhl = 0, mss_l4len_idx = 0; - uint32_t paylen = 0; - uint8_t iphlen = 0; - - *olinfo_status = 0; - *cmd_type_len = 0; - TD = (struct e1000_adv_tx_context_desc *)>tx.sc_tx_desc_ring[head]; - -#if NVLAN > 0 - if (ISSET(mp->m_flags, M_VLANTAG)) { - uint32_t vtag = mp->m_pkthdr.ether_vtag; - vlan_macip_lens |= vtag << E1000_ADVTXD_VLAN_SHIFT; - *cmd_type_len |= E1000_ADVTXD_DCMD_VLE; - } -#endif - - ether_extract_headers(mp, ); - if (ext.tcp == NULL) - goto out; - - vlan_macip_lens |= (sizeof(*ext.eh) << E1000_ADVTXD_MACLEN_SHIFT); - - if (ext.ip4) { - iphlen = ext.ip4->ip_hl << 2; - - type_tucmd_mlhl |= E1000_ADVTXD_TUCMD_IPV4; - *olinfo_status |=
Re: TSO em(4) problem
On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > I've manage to reproduce TSO em problem on anoter setup, unfortunatly > production. > > Setup is very simple > > em0 - carp <- uplink > em1 - pfsync > ix1 - vlans - carp Would it be possible that you also share an "ifconfig -a hwfeatures" of that box? You can mask the IPs if it's too sensitive. I still try to reproduce the issue here, and for now I can't. Maybe in your full ifconfig output I can see some specifics about your configuration, which makes it more likely to reproduce the issue here.
Re: TSO em(4) problem
I've manage to reproduce TSO em problem on anoter setup, unfortunatly production. Setup is very simple em0 - carp <- uplink em1 - pfsync ix1 - vlans - carp Jan 26 11:19:23 bcbnfw1 /bsd: em0: watchdog: head 34 tail 98 TDH 98 TDT 34 Jan 26 11:19:33 bcbnfw1 /bsd: em0: watchdog: head 345 tail 409 TDH 409 TDT 345 Jan 26 11:19:54 bcbnfw1 /bsd: em0: watchdog: head 259 tail 323 TDH 323 TDT 259 Jan 26 11:20:08 bcbnfw1 /bsd: em0: watchdog: head 343 tail 407 TDH 407 TDT 343 Jan 26 11:20:24 bcbnfw1 /bsd: em0: watchdog: head 20 tail 85 TDH 85 TDT 20 Jan 26 11:20:47 bcbnfw1 /bsd: em0: watchdog: head 388 tail 452 TDH 452 TDT 388 Jan 26 11:21:09 bcbnfw1 /bsd: em0: watchdog: head 25 tail 89 TDH 89 TDT 25 Jan 26 11:21:32 bcbnfw1 /bsd: em0: watchdog: head 105 tail 169 TDH 169 TDT 105 Jan 26 11:21:52 bcbnfw1 /bsd: em0: watchdog: head 23 tail 88 TDH 88 TDT 23 647470: Jan 26 11:19:25: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647474: Jan 26 11:19:29: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647478: Jan 26 11:19:35: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647483: Jan 26 11:19:39: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647487: Jan 26 11:19:56: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647491: Jan 26 11:19:59: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647495: Jan 26 11:20:10: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647499: Jan 26 11:20:13: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647504: Jan 26 11:20:26: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647508: Jan 26 11:20:29: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647512: Jan 26 11:20:49: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647516: Jan 26 11:20:52: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647520: Jan 26 11:21:11: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647524: Jan 26 11:21:14: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647528: Jan 26 11:21:34: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647532: Jan 26 11:21:36: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up 647536: Jan 26 11:21:54: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to down 647540: Jan 26 11:21:56: %LINK-SP-3-UPDOWN: Interface GigabitEthernet4/48, changed state to up bcbnfw1# kstat em0::: em0:0:em-stats:0 rx crc errs: 0 packets rx align errs: 0 packets rx align errs: 0 packets rx errs: 0 packets rx missed: 0 packets tx single coll: 0 packets tx excess coll: 0 packets tx multi coll: 0 packets tx late coll: 0 packets tx coll: 0 tx defers: 0 tx no CRS: 0 packets seq errs: 0 carr ext errs: 0 packets rx len errs: 0 packets rx xon: 0 packets tx xon: 0 packets rx xoff: 0 packets tx xoff: 0 packets FC unsupported: 0 packets rx 64B: 6555 packets rx 65-127B: 11144 packets rx 128-255B: 6264 packets rx 256-511B: 2390 packets rx 512-1023B: 3706 packets rx 1024-maxB: 87987 packets rx good: 118046 packets rx bcast: 3 packets rx mcast: 82 packets tx good: 56796 packets rx good: 132532686 bytes tx good: 13691390 bytes rx no buffers: 0 packets rx undersize: 0 packets rx fragments: 0 packets rx oversize: 0 packets rx jabbers: 0 packets rx mgmt: 0 packets rx mgmt drops: 0 packets tx mgmt: 0 packets rx total: 132532686 bytes tx total: 13691390 bytes rx total: 118046 packets tx total: 56796 packets tx 64B: 11861 packets tx 65-127B: 28718 packets tx 128-255B: 7202 packets tx 256-511B: 1834 packets tx 512-1023B: 2059 packets tx 1024-maxB: 5122 packets tx mcast: 18 packets tx bcast: 2 packets em0:0:rxq:0 packets: 1009629 packets bytes: 1172569417 bytes fdrops: 0 packets qdrops: 0 packets errors: 0 packets qlen: 0 packets enqueues: 348100 dequeues: 348031 em0:0:txq:0 packets: 465709 packets bytes: 103430590 bytes qdrops: 53674 packets errors: 0 packets qlen: 0 packets maxqlen: 511 packets oactive: false oactives: 9 OpenBSD 7.4-current (GENERIC.MP) #1626: Thu Jan 25 20:05:01 MST 2024 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 34224844800 (32639MB) avail mem = 33166430208 (31629MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0xec9b0 (62
TSO em(4) problem
Hi all, in production I have simple carp pfsync setup with em0 - carp <- uplink em1 - pfsync ix0 - vlan - carp <- internal networks ix1 - not used and for vpn I have wireguard and people connects to em0 carp address. There's no bridges or tunnels or any exotic pf feature in this setup. Until this snapshot OpenBSD 7.4-current (GENERIC.MP) #1587: Sat Dec 30 22:44:51 MST 2023 every this was fine, but with and after OpenBSD 7.4-current (GENERIC.MP) #1588: Thu Jan 4 20:58:35 MST 2024 em0 starts to go up/down spontaneously and em0 watchdog logs start to appear in messages em0: watchdog: head 113 tail 178 TDH 178 TDT 113 carp1: state transition: BACKUP -> MASTER even with net.inet.tcp.tso=0 When reverting em TSO diffs if_em.c to r1.369 and if_em.h to r1.80 firewall starts to work normally and em0 is fine. After rebooting firewall and promote it to carp master I've started to collect kstat em0::: after em0 watchdog log 1) Jan 22 08:01:01 fw2 /bsd: em0: watchdog: head 473 tail 25 TDH 25 TDT 473 kstat em0::: - em0-1.txt 2) Jan 22 08:07:11 fw2 /bsd: em0: watchdog: head 114 tail 178 TDH 178 TDT 114 3) Jan 22 08:08:16 fw2 /bsd: em0: watchdog: head 61 tail 126 TDH 126 TDT 61 kstat em0::: - em0-3.txt 4) Jan 22 08:21:23 fw2 /bsd: em0: watchdog: head 452 tail 5 TDH 5 TDT 452 5) Jan 22 08:33:48 fw2 /bsd: em0: watchdog: head 352 tail 416 TDH 416 TDT 352 6) Jan 22 08:36:20 fw2 /bsd: em0: watchdog: head 446 tail 510 TDH 510 TDT 446 kstat em0::: - em0-6.txt 7) Jan 22 08:42:16 fw1 /bsd: em0: watchdog: head 385 tail 450 TDH 450 TDT 385 kstat em0::: - em0-7.txt in the attachment you can find em0 txt kstat output and kstat-all.txt which is kstat of all interfaces with TSO diff after 7th time em0 watchdog log >From logs it seems that em0:0:txq:0 oactives counter, em0 watchdog and em0 going up/down is somehow connected because every time I see em0 watchdog, oactives counter is increased by one log on switch I 01/22/24 08:01:01 00077 ports: port 2 is now off-line I 01/22/24 08:01:05 00076 ports: port 2 is now on-line I 01/22/24 08:07:11 00077 ports: port 2 is now off-line I 01/22/24 08:07:14 00076 ports: port 2 is now on-line I 01/22/24 08:08:16 00077 ports: port 2 is now off-line I 01/22/24 08:08:20 00076 ports: port 2 is now on-line I 01/22/24 08:21:23 00077 ports: port 2 is now off-line I 01/22/24 08:21:26 00076 ports: port 2 is now on-line I 01/22/24 08:33:47 00077 ports: port 2 is now off-line I 01/22/24 08:33:51 00076 ports: port 2 is now on-line I 01/22/24 08:36:20 00077 ports: port 2 is now off-line I 01/22/24 08:36:24 00076 ports: port 2 is now on-line I 01/22/24 08:42:16 00077 ports: port 2 is now off-line I 01/22/24 08:42:20 00076 ports: port 2 is now on-line em0 is connected to port 2 ix0 is connected to port 6 and it's up whole the time... Packet processing and some little pressure need to be over em0 to trigger em0 watchdog and only carp master is affected. Over night there are 2 or 3 em0 watchdogs. Firewalls are more than underutilized cca 5k states and under 100Mbps To rule out em hardware problem, I've sysupdate second firewall and problem was the same as on first one. I am willing to debug this further but I don't know what to look any more ... And of course, thank you guys for carp and pfsync, without it this would be a problem but it's not :) kstat em0::: after day without TSO diffs fw2# uptime 12:36AM up 1 day, 13:38, 2 users, load averages: 0.35, 0.23, 0.23 fw2# kstat em0::: em0:0:em-stats:0 rx crc errs: 0 packets rx align errs: 0 packets rx align errs: 0 packets rx errs: 0 packets rx missed: 0 packets tx single coll: 0 packets tx excess coll: 0 packets tx multi coll: 0 packets tx late coll: 0 packets tx coll: 0 tx defers: 0 tx no CRS: 0 packets seq errs: 0 carr ext errs: 0 packets rx len errs: 0 packets rx xon: 0 packets tx xon: 0 packets rx xoff: 0 packets tx xoff: 0 packets FC unsupported: 0 packets rx 64B: 6361422 packets rx 65-127B: 19106140 packets rx 128-255B: 4430154 packets rx 256-511B: 5116503 packets rx 512-1023B: 7665843 packets rx 1024-maxB: 86778341 packets rx good: 129458403 packets rx bcast: 147 packets rx mcast: 112979 packets tx good: 67968976 packets rx good: 134077827262 bytes tx good: 32314161469 bytes rx no buffers: 4 packets rx undersize: 0 packets rx fragments: 0 packets rx oversize: 0 packets rx jabbers: 0 packets rx mgmt: 0 packets rx mgmt drops: 0 packets tx mgmt: 0 packets rx total: 134077827262 bytes tx total: 32314161469 bytes rx total: 129458403 packets tx total: 67968976 packets tx 64B: 8932448 packets tx 65-127B: 31092764 packets tx 128-255B: 3930861 packets tx 256-511B: 2126737 packets tx 512-1023B: 4009214