Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/19/2018 03:32 PM, Andreas Schwab wrote: > On Jun 19 2018, Eric Dumazet wrote: > >> diff --git a/drivers/net/ethernet/sun/sungem.c >> b/drivers/net/ethernet/sun/sungem.c >> index >> 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 >> 100644 >> --- a/drivers/net/ethernet/sun/sungem.c >> +++ b/drivers/net/ethernet/sun/sungem.c >> @@ -60,8 +60,7 @@ >> #include >> #include "sungem.h" >> >> -/* Stripping FCS is causing problems, disabled for now */ >> -#undef STRIP_FCS >> +#define STRIP_FCS >> >> #define DEFAULT_MSG(NETIF_MSG_DRV | \ >> NETIF_MSG_PROBE| \ >> @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp) >> writel(desc_dma & 0x, gp->regs + RXDMA_DBLOW); >> writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); >> val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | >> - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); >> + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); >> writel(val, gp->regs + RXDMA_CFG); >> if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) >> writel(((5 & RXDMA_BLANK_IPKTS) | >> @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do) >> >> csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ >> 0x); >> skb->csum = csum_unfold(csum); >> + { >> + __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - >> ETH_HLEN, 0); >> + if (csum != csum_fold(rsum) && net_ratelimit()) >> + pr_err("sungem wrong csum : %x/%x, len %u bytes\n", >> + csum, csum_fold(rsum), len); >> + print_hex_dump(KERN_ERR, "raw data: ", >> DUMP_PREFIX_OFFSET, >> + 16, 1, skb->data, len, true); >> + } >> skb->ip_summed = CHECKSUM_COMPLETE; >> skb->protocol = eth_type_trans(skb, gp->dev); >> >> @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp) >> writel(0, gp->regs + TXDMA_KICK); >> >> val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | >> - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); >> + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); >> writel(val, gp->regs + RXDMA_CFG); >> >> writel(desc_dma >> 32, gp->regs + RXDMA_DBHI); > > With that patch I still get the wrong csum messages, but no longer the > hw csum failure messages (tested on a PowerMac G5). > > [ 662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, > c001fee9cc02 > [ 662.659775] raw data: : 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd > 61 01 ...C.b.=~LH...a. > [ 662.659778] raw data: 0010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 > 00 00 ... .@ ..b.. > [ 662.659780] raw data: 0020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 > 00 00 .8 ..b.. > [ 662.659783] raw data: 0030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea > ea 44 ~..D > [ 662.659785] raw data: 0040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a > 59 68 .JD...Yh > [ 662.659788] raw data: 0050: ba e2 0e bb ac ae > .. > > Andreas. > Note that 8359 and 7ca6 are the same really (a missing ~ to invert csum_partial()) So the bug was that : 1) Driver programmed a wrong start offset for the csum (7 bytes instead of 14 bytes to skip Ethernet Header) 2) FCS was not stripped. Basically CHECKSUM_COMPLETE support never worked, but this was hidden by the fact that linux stack had to throw away this CHECKSUM_COMPLETE because the FCS had to be removed. Thanks !
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On Jun 19 2018, Eric Dumazet wrote: > diff --git a/drivers/net/ethernet/sun/sungem.c > b/drivers/net/ethernet/sun/sungem.c > index > 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 > 100644 > --- a/drivers/net/ethernet/sun/sungem.c > +++ b/drivers/net/ethernet/sun/sungem.c > @@ -60,8 +60,7 @@ > #include > #include "sungem.h" > > -/* Stripping FCS is causing problems, disabled for now */ > -#undef STRIP_FCS > +#define STRIP_FCS > > #define DEFAULT_MSG(NETIF_MSG_DRV | \ > NETIF_MSG_PROBE| \ > @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp) > writel(desc_dma & 0x, gp->regs + RXDMA_DBLOW); > writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); > val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | > - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); > + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); > writel(val, gp->regs + RXDMA_CFG); > if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) > writel(((5 & RXDMA_BLANK_IPKTS) | > @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do) > > csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ > 0x); > skb->csum = csum_unfold(csum); > + { > + __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - > ETH_HLEN, 0); > + if (csum != csum_fold(rsum) && net_ratelimit()) > + pr_err("sungem wrong csum : %x/%x, len %u bytes\n", > + csum, csum_fold(rsum), len); > + print_hex_dump(KERN_ERR, "raw data: ", > DUMP_PREFIX_OFFSET, > + 16, 1, skb->data, len, true); > + } > skb->ip_summed = CHECKSUM_COMPLETE; > skb->protocol = eth_type_trans(skb, gp->dev); > > @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp) > writel(0, gp->regs + TXDMA_KICK); > > val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | > - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); > + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); > writel(val, gp->regs + RXDMA_CFG); > > writel(desc_dma >> 32, gp->regs + RXDMA_DBHI); With that patch I still get the wrong csum messages, but no longer the hw csum failure messages (tested on a PowerMac G5). [ 662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, c001fee9cc02 [ 662.659775] raw data: : 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01 ...C.b.=~LH...a. [ 662.659778] raw data: 0010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 00 00 ... .@ ..b.. [ 662.659780] raw data: 0020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00 .8 ..b.. [ 662.659783] raw data: 0030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea ea 44 ~..D [ 662.659785] raw data: 0040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a 59 68 .JD...Yh [ 662.659788] raw data: 0050: ba e2 0e bb ac ae .. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/19/2018 01:10 PM, Eric Dumazet wrote: > > > On 06/19/2018 12:10 PM, Andreas Schwab wrote: >> On Jun 18 2018, Eric Dumazet wrote: >> >>> DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, >>> or crossing page boundaries) >> >> DUMP_PREFIX_ADDRESS is useless for that purpose. >> >> Here are some samples of broken csums: >> >> [ 853.849225] sungem: sungem wrong csum : 9886/07be, len 94 bytes, >> c001fa187e02 >> [ 853.849232] raw data: : 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 >> 45 10 ...C.b...QE. >> [ 853.849235] raw data: 0010: 00 4c cb a0 40 00 40 11 d9 97 c0 a8 0a 01 >> c0 a8 .L..@.@. >> [ 853.849237] raw data: 0020: 0a 07 00 7b 00 7b 00 38 69 e1 1c 03 0c f7 >> 00 00 ...{.{.8i... >> [ 853.849240] raw data: 0030: 08 f0 00 00 15 f0 c0 35 67 67 de d3 ca c9 >> d9 5b ...5gg.[ >> [ 853.849242] raw data: 0040: 1f ff de d3 d2 86 8f 67 fa f2 de d3 d2 86 >> 8f 38 ...g...8 >> [ 853.849244] raw data: 0050: 2f ff de d3 d2 86 8f 3b ff ff d1 93 bc 50 >>/..;.P > > Thanks. > > 4 bytes in excess. > > Might be the FCS, and it does not look like provided csum has a relation with > it. > > For some reason FCS stripping was disabled by : > > commit 3e32011d4da6424b3bc65b1e1a047e30ac9882c7 > Author: Benjamin Herrenschmidt > Date: Mon May 19 09:39:11 2003 -0700 > > [SUNGEM]: Updates from PowerPC people. > > Support more chips and split out all the complex PHY > handling into a seperate file. > > > Since this NIC never had CHECKSUM_COMPLETE support (since we have to trim > each skb, > thus were forcing ip_summed to CHECKSUM_NONE) we probably should remove it > and be happy. > > Unless you guys find a way to let the NIC strip the FCS, and double check the > csum is a real csum ;) > > diff --git a/drivers/net/ethernet/sun/sungem.c > b/drivers/net/ethernet/sun/sungem.c > index > 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..85439308375e95c3854e4a1561697d69ec85399b > 100644 > --- a/drivers/net/ethernet/sun/sungem.c > +++ b/drivers/net/ethernet/sun/sungem.c > @@ -760,7 +760,6 @@ static int gem_rx(struct gem *gp, int work_to_do) > struct net_device *dev = gp->dev; > int entry, drops, work_done = 0; > u32 done; > - __sum16 csum; > > if (netif_msg_rx_status(gp)) > printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n", > @@ -855,9 +854,6 @@ static int gem_rx(struct gem *gp, int work_to_do) > skb = copy_skb; > } > > - csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ > 0x); > - skb->csum = csum_unfold(csum); > - skb->ip_summed = CHECKSUM_COMPLETE; > skb->protocol = eth_type_trans(skb, gp->dev); > > napi_gro_receive(>napi, skb); > If you have time, you also could check if changing the suspect (14 / 2) to ETH_HLEN would help (and also enabling STRIP_FCS) diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -60,8 +60,7 @@ #include #include "sungem.h" -/* Stripping FCS is causing problems, disabled for now */ -#undef STRIP_FCS +#define STRIP_FCS #define DEFAULT_MSG(NETIF_MSG_DRV | \ NETIF_MSG_PROBE| \ @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp) writel(desc_dma & 0x, gp->regs + RXDMA_DBLOW); writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) writel(((5 & RXDMA_BLANK_IPKTS) | @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do) csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0x); skb->csum = csum_unfold(csum); + { + __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0); + if (csum != csum_fold(rsum) && net_ratelimit()) + pr_err("sungem wrong csum : %x/%x, len %u bytes\n", + csum, csum_fold(rsum), len); + print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET, + 16, 1, skb->data, len, true); + } skb->ip_summed = CHECKSUM_COMPLETE; skb->protocol = eth_type_trans(skb, gp->dev); @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp) writel(0, gp->regs + TXDMA_KICK); val =
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/19/2018 12:10 PM, Andreas Schwab wrote: > On Jun 18 2018, Eric Dumazet wrote: > >> DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, >> or crossing page boundaries) > > DUMP_PREFIX_ADDRESS is useless for that purpose. > > Here are some samples of broken csums: > > [ 853.849225] sungem: sungem wrong csum : 9886/07be, len 94 bytes, > c001fa187e02 > [ 853.849232] raw data: : 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 > 45 10 ...C.b...QE. > [ 853.849235] raw data: 0010: 00 4c cb a0 40 00 40 11 d9 97 c0 a8 0a 01 > c0 a8 .L..@.@. > [ 853.849237] raw data: 0020: 0a 07 00 7b 00 7b 00 38 69 e1 1c 03 0c f7 > 00 00 ...{.{.8i... > [ 853.849240] raw data: 0030: 08 f0 00 00 15 f0 c0 35 67 67 de d3 ca c9 > d9 5b ...5gg.[ > [ 853.849242] raw data: 0040: 1f ff de d3 d2 86 8f 67 fa f2 de d3 d2 86 > 8f 38 ...g...8 > [ 853.849244] raw data: 0050: 2f ff de d3 d2 86 8f 3b ff ff d1 93 bc 50 > /..;.P Thanks. 4 bytes in excess. Might be the FCS, and it does not look like provided csum has a relation with it. For some reason FCS stripping was disabled by : commit 3e32011d4da6424b3bc65b1e1a047e30ac9882c7 Author: Benjamin Herrenschmidt Date: Mon May 19 09:39:11 2003 -0700 [SUNGEM]: Updates from PowerPC people. Support more chips and split out all the complex PHY handling into a seperate file. Since this NIC never had CHECKSUM_COMPLETE support (since we have to trim each skb, thus were forcing ip_summed to CHECKSUM_NONE) we probably should remove it and be happy. Unless you guys find a way to let the NIC strip the FCS, and double check the csum is a real csum ;) diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..85439308375e95c3854e4a1561697d69ec85399b 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -760,7 +760,6 @@ static int gem_rx(struct gem *gp, int work_to_do) struct net_device *dev = gp->dev; int entry, drops, work_done = 0; u32 done; - __sum16 csum; if (netif_msg_rx_status(gp)) printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n", @@ -855,9 +854,6 @@ static int gem_rx(struct gem *gp, int work_to_do) skb = copy_skb; } - csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0x); - skb->csum = csum_unfold(csum); - skb->ip_summed = CHECKSUM_COMPLETE; skb->protocol = eth_type_trans(skb, gp->dev); napi_gro_receive(>napi, skb);
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On Jun 18 2018, Eric Dumazet wrote: > DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, or > crossing page boundaries) DUMP_PREFIX_ADDRESS is useless for that purpose. Here are some samples of broken csums: [ 853.849225] sungem: sungem wrong csum : 9886/07be, len 94 bytes, c001fa187e02 [ 853.849232] raw data: : 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 45 10 ...C.b...QE. [ 853.849235] raw data: 0010: 00 4c cb a0 40 00 40 11 d9 97 c0 a8 0a 01 c0 a8 .L..@.@. [ 853.849237] raw data: 0020: 0a 07 00 7b 00 7b 00 38 69 e1 1c 03 0c f7 00 00 ...{.{.8i... [ 853.849240] raw data: 0030: 08 f0 00 00 15 f0 c0 35 67 67 de d3 ca c9 d9 5b ...5gg.[ [ 853.849242] raw data: 0040: 1f ff de d3 d2 86 8f 67 fa f2 de d3 d2 86 8f 38 ...g...8 [ 853.849244] raw data: 0050: 2f ff de d3 d2 86 8f 3b ff ff d1 93 bc 50 /..;.P [ 857.883052] sungem: sungem wrong csum : dbb4/c48f, len 94 bytes, c001fa185882 [ 857.883058] raw data: : 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 45 00 ...C.b...QE. [ 857.883070] raw data: 0010: 00 4c a1 97 40 00 3a 11 ce ed d9 5b 2c 11 c0 a8 .L..@.:[,... [ 857.883080] raw data: 0020: 0a 07 00 7b 00 7b 00 38 14 4b 24 02 06 ea 00 00 ...{.{.8.K$. [ 857.883085] raw data: 0030: 00 0b 00 00 02 99 c0 a8 64 09 de d3 d2 5a 36 e4 dZ6. [ 857.883090] raw data: 0040: bc f5 de d3 d2 8a 8f 2c 17 44 de d3 d2 8a 93 8b ...,.D.. [ 857.883094] raw data: 0050: d7 b7 de d3 d2 8a 93 97 69 6e 39 7b d2 5a in9{.Z [ 858.124689] sungem: sungem wrong csum : 1f4f/02d0, len 118 bytes, c001fa185602 [ 858.124700] raw data: : 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01 ...C.b.=~LH...a. [ 858.124705] raw data: 0010: 1e b1 00 3c 06 40 20 01 0a 62 17 11 88 01 00 00 ...<.@ ..b.. [ 858.124709] raw data: 0020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00 .8 ..b.. [ 858.124714] raw data: 0030: 00 00 00 00 00 07 94 b4 00 16 86 f5 29 e8 36 cb ).6. [ 858.124718] raw data: 0040: 50 49 80 18 05 93 9a 53 00 00 01 01 08 0a 58 b2 PI.S..X. [ 858.124723] raw data: 0050: de 54 61 5f 2f 3c 00 00 00 10 cc 08 55 f7 da 21 .Ta_/<..U..! [ 858.124727] raw data: 0060: f4 60 0a 6b 3c aa b9 b3 7e 61 10 b8 c2 be 9a 0b .`.k<...~a.. [ 858.124731] raw data: 0070: c7 e9 5b 97 1b ac ..[... [ 858.126522] sungem: sungem wrong csum : 0836/19e9, len 90 bytes, c001fa185382 [ 858.126530] raw data: : 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01 ...C.b.=~LH...a. [ 858.126532] raw data: 0010: 1e b1 00 20 06 40 20 01 0a 62 17 11 88 01 00 00 ... .@ ..b.. [ 858.126535] raw data: 0020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00 .8 ..b.. [ 858.126537] raw data: 0030: 00 00 00 00 00 07 94 b4 00 16 86 f5 2a 04 36 cb *.6. [ 858.126540] raw data: 0040: 50 65 80 10 05 93 3e 56 00 00 01 01 08 0a 58 b2 Pe>V..X. [ 858.126542] raw data: 0050: de 56 61 5f 30 4d 1d 58 42 d2 .Va_0M.XB. [ 858.131559] sungem: sungem wrong csum : 5891/c98d, len 90 bytes, c001fa185102 [ 858.131567] raw data: : 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01 ...C.b.=~LH...a. [ 858.131570] raw data: 0010: 1e b1 00 20 06 40 20 01 0a 62 17 11 88 01 00 00 ... .@ ..b.. [ 858.131572] raw data: 0020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00 .8 ..b.. [ 858.131574] raw data: 0030: 00 00 00 00 00 07 94 b4 00 16 86 f5 2a 04 36 cb *.6. [ 858.131577] raw data: 0040: 50 a1 80 10 05 93 3e 10 00 00 01 01 08 0a 58 b2 P.>...X. [ 858.131579] raw data: 0050: de 5b 61 5f 30 52 3f ea 70 9b .[a_0R?.p. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/18/2018 04:29 PM, Eric Dumazet wrote: > > > On 06/18/2018 11:45 AM, Mathieu Malaterre wrote: > >> >> Here is what I get on my side >> >> [ 53.628847] sungem: sungem wrong csum : 4e04/f97, len 64 bytes >> [ 53.667063] sungem: sungem wrong csum : eea8/6eec, len 149 bytes >> [ 58.648952] sungem: sungem wrong csum : 2095/3d06, len 64 bytes >> [ 58.669414] sungem: sungem wrong csum : 5245/b50, len 149 bytes >> [ 63.674451] sungem: sungem wrong csum : 2d8/5abd, len 149 bytes >> [ 68.678233] sungem: sungem wrong csum : b8fc/a498, len 149 bytes >> [ 73.685771] sungem: sungem wrong csum : 374/5a21, len 149 bytes >> [ 78.689089] sungem: sungem wrong csum : d81/5014, len 149 bytes >> [ 83.683261] sungem: sungem wrong csum : 4e04/f97, len 64 bytes >> [ 83.690193] sungem: sungem wrong csum : c2f7/9a9d, len 149 bytes >> [ 88.692511] sungem: sungem wrong csum : f4d8/68bc, len 149 bytes >> [ 93.699915] sungem: sungem wrong csum : 1370/4a25, len 149 bytes >> [ 98.703136] sungem: sungem wrong csum : e1b5/7bdf, len 149 bytes >> [ 103.704230] sungem: sungem wrong csum : 5321/a74, len 149 bytes >> [ 108.688912] sungem: sungem wrong csum : 2095/3d06, len 64 bytes >> [ 108.706559] sungem: sungem wrong csum : ddbc/7fd8, len 149 bytes >> [ 113.713189] sungem: sungem wrong csum : 5a65/330, len 149 bytes >> [ 113.891697] sungem: sungem wrong csum : 4e04/f97, len 64 bytes >> [ 118.717151] sungem: sungem wrong csum : f7c8/65cc, len 149 bytes >> [ 123.722680] sungem: sungem wrong csum : 3d7a/201b, len 149 bytes >> [ 128.726524] sungem: sungem wrong csum : c8fd/9497, len 149 bytes >> [ 133.732045] sungem: sungem wrong csum : de0d/7f87, len 149 bytes >> [ 135.529163] sungem: sungem wrong csum : 3089/b6dd, len 96 bytes >> [ 135.529208] eth0: hw csum failure >> [ 135.529220] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #7 >> [ 135.529226] Call Trace: >> [ 135.529243] [dffedbe0] [c069ddac] >> __skb_checksum_complete+0xf0/0x108 (unreliable) > > Thanks, then I guess next step would be to dump the content of the frames > having a wrong checksum, hoping we find an easy way to discard the > CHECKSUM_COMPLETE > in a selective way. > > Otherwise, we will need to remove CHECKSUM_COMPLETE setting in this driver. > > diff --git a/drivers/net/ethernet/sun/sungem.c > b/drivers/net/ethernet/sun/sungem.c > index > 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..77a761f95be788bb86c8d917f613c9084818f826 > 100644 > --- a/drivers/net/ethernet/sun/sungem.c > +++ b/drivers/net/ethernet/sun/sungem.c > @@ -857,6 +857,14 @@ static int gem_rx(struct gem *gp, int work_to_do) > > csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ > 0x); > skb->csum = csum_unfold(csum); > + { > + __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - > ETH_HLEN, 0); > + if (csum != csum_fold(rsum) && net_ratelimit()) > + pr_err("sungem wrong csum : %04x/%04x, len %u > bytes\n", > + csum, csum_fold(rsum), len); > + print_hex_dump(KERN_ERR, "raw data: ", > DUMP_PREFIX_OFFSET, DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, or crossing page boundaries) > + 16, 1, skb->data, len, true); > + } > skb->ip_summed = CHECKSUM_COMPLETE; > skb->protocol = eth_type_trans(skb, gp->dev); > >
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/18/2018 11:45 AM, Mathieu Malaterre wrote: > > Here is what I get on my side > > [ 53.628847] sungem: sungem wrong csum : 4e04/f97, len 64 bytes > [ 53.667063] sungem: sungem wrong csum : eea8/6eec, len 149 bytes > [ 58.648952] sungem: sungem wrong csum : 2095/3d06, len 64 bytes > [ 58.669414] sungem: sungem wrong csum : 5245/b50, len 149 bytes > [ 63.674451] sungem: sungem wrong csum : 2d8/5abd, len 149 bytes > [ 68.678233] sungem: sungem wrong csum : b8fc/a498, len 149 bytes > [ 73.685771] sungem: sungem wrong csum : 374/5a21, len 149 bytes > [ 78.689089] sungem: sungem wrong csum : d81/5014, len 149 bytes > [ 83.683261] sungem: sungem wrong csum : 4e04/f97, len 64 bytes > [ 83.690193] sungem: sungem wrong csum : c2f7/9a9d, len 149 bytes > [ 88.692511] sungem: sungem wrong csum : f4d8/68bc, len 149 bytes > [ 93.699915] sungem: sungem wrong csum : 1370/4a25, len 149 bytes > [ 98.703136] sungem: sungem wrong csum : e1b5/7bdf, len 149 bytes > [ 103.704230] sungem: sungem wrong csum : 5321/a74, len 149 bytes > [ 108.688912] sungem: sungem wrong csum : 2095/3d06, len 64 bytes > [ 108.706559] sungem: sungem wrong csum : ddbc/7fd8, len 149 bytes > [ 113.713189] sungem: sungem wrong csum : 5a65/330, len 149 bytes > [ 113.891697] sungem: sungem wrong csum : 4e04/f97, len 64 bytes > [ 118.717151] sungem: sungem wrong csum : f7c8/65cc, len 149 bytes > [ 123.722680] sungem: sungem wrong csum : 3d7a/201b, len 149 bytes > [ 128.726524] sungem: sungem wrong csum : c8fd/9497, len 149 bytes > [ 133.732045] sungem: sungem wrong csum : de0d/7f87, len 149 bytes > [ 135.529163] sungem: sungem wrong csum : 3089/b6dd, len 96 bytes > [ 135.529208] eth0: hw csum failure > [ 135.529220] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #7 > [ 135.529226] Call Trace: > [ 135.529243] [dffedbe0] [c069ddac] > __skb_checksum_complete+0xf0/0x108 (unreliable) Thanks, then I guess next step would be to dump the content of the frames having a wrong checksum, hoping we find an easy way to discard the CHECKSUM_COMPLETE in a selective way. Otherwise, we will need to remove CHECKSUM_COMPLETE setting in this driver. diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..77a761f95be788bb86c8d917f613c9084818f826 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -857,6 +857,14 @@ static int gem_rx(struct gem *gp, int work_to_do) csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0x); skb->csum = csum_unfold(csum); + { + __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0); + if (csum != csum_fold(rsum) && net_ratelimit()) + pr_err("sungem wrong csum : %04x/%04x, len %u bytes\n", + csum, csum_fold(rsum), len); + print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET, + 16, 1, skb->data, len, true); + } skb->ip_summed = CHECKSUM_COMPLETE; skb->protocol = eth_type_trans(skb, gp->dev);
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On Mon, Jun 18, 2018 at 8:18 PM Eric Dumazet wrote: > > > > On 06/18/2018 10:54 AM, Andreas Schwab wrote: > > On Jun 17 2018, Eric Dumazet wrote: > > > >> Oh this is silly, please try : > >> > >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >> index > >> c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe > >> 100644 > >> --- a/net/core/skbuff.c > >> +++ b/net/core/skbuff.c > >> @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim); > >> int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len) > >> { > >> if (skb->ip_summed == CHECKSUM_COMPLETE) { > >> - int delta = skb->len - len; > >> + __wsum csumdiff = skb_checksum(skb, len, skb->len - len, > >> 0); > >> > >> - skb->csum = csum_sub(skb->csum, > >> -skb_checksum(skb, len, delta, 0)); > >> + skb->csum = csum_block_sub(skb->csum, csumdiff, len); > >> } > >> return __pskb_trim(skb, len); > >> } > > > > That doesn't help either. > > > > Andreas. > > > > Then maybe NIC provided csum is not correct. > > It does not compute a checksum on all the frame, but part of it. > > You could use this patch to double check. > > diff --git a/drivers/net/ethernet/sun/sungem.c > b/drivers/net/ethernet/sun/sungem.c > index > 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..277859ea73e35271a10b02011120fca248ec8e71 > 100644 > --- a/drivers/net/ethernet/sun/sungem.c > +++ b/drivers/net/ethernet/sun/sungem.c > @@ -857,6 +857,11 @@ static int gem_rx(struct gem *gp, int work_to_do) > > csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ > 0x); > skb->csum = csum_unfold(csum); > + { > + __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - > ETH_HLEN, 0); > + if (csum != csum_fold(rsum)) > + pr_err_ratelimited("sungem wrong csum : %x/%x, len %u > bytes\n", csum, csum_fold(rsum), len); > + } > skb->ip_summed = CHECKSUM_COMPLETE; > skb->protocol = eth_type_trans(skb, gp->dev); > > Here is what I get on my side [ 53.628847] sungem: sungem wrong csum : 4e04/f97, len 64 bytes [ 53.667063] sungem: sungem wrong csum : eea8/6eec, len 149 bytes [ 58.648952] sungem: sungem wrong csum : 2095/3d06, len 64 bytes [ 58.669414] sungem: sungem wrong csum : 5245/b50, len 149 bytes [ 63.674451] sungem: sungem wrong csum : 2d8/5abd, len 149 bytes [ 68.678233] sungem: sungem wrong csum : b8fc/a498, len 149 bytes [ 73.685771] sungem: sungem wrong csum : 374/5a21, len 149 bytes [ 78.689089] sungem: sungem wrong csum : d81/5014, len 149 bytes [ 83.683261] sungem: sungem wrong csum : 4e04/f97, len 64 bytes [ 83.690193] sungem: sungem wrong csum : c2f7/9a9d, len 149 bytes [ 88.692511] sungem: sungem wrong csum : f4d8/68bc, len 149 bytes [ 93.699915] sungem: sungem wrong csum : 1370/4a25, len 149 bytes [ 98.703136] sungem: sungem wrong csum : e1b5/7bdf, len 149 bytes [ 103.704230] sungem: sungem wrong csum : 5321/a74, len 149 bytes [ 108.688912] sungem: sungem wrong csum : 2095/3d06, len 64 bytes [ 108.706559] sungem: sungem wrong csum : ddbc/7fd8, len 149 bytes [ 113.713189] sungem: sungem wrong csum : 5a65/330, len 149 bytes [ 113.891697] sungem: sungem wrong csum : 4e04/f97, len 64 bytes [ 118.717151] sungem: sungem wrong csum : f7c8/65cc, len 149 bytes [ 123.722680] sungem: sungem wrong csum : 3d7a/201b, len 149 bytes [ 128.726524] sungem: sungem wrong csum : c8fd/9497, len 149 bytes [ 133.732045] sungem: sungem wrong csum : de0d/7f87, len 149 bytes [ 135.529163] sungem: sungem wrong csum : 3089/b6dd, len 96 bytes [ 135.529208] eth0: hw csum failure [ 135.529220] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #7 [ 135.529226] Call Trace: [ 135.529243] [dffedbe0] [c069ddac] __skb_checksum_complete+0xf0/0x108 (unreliable) > > >
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On Mon, Jun 18, 2018 at 7:54 PM Andreas Schwab wrote: > > On Jun 17 2018, Eric Dumazet wrote: > > > Oh this is silly, please try : > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index > > c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe > > 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim); > > int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len) > > { > > if (skb->ip_summed == CHECKSUM_COMPLETE) { > > - int delta = skb->len - len; > > + __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0); > > > > - skb->csum = csum_sub(skb->csum, > > -skb_checksum(skb, len, delta, 0)); > > + skb->csum = csum_block_sub(skb->csum, csumdiff, len); > > } > > return __pskb_trim(skb, len); > > } > > That doesn't help either. seconded (setup g4+sungem): [ 100.272676] eth0: hw csum failure [ 100.272710] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #6 [ 100.272716] Call Trace: [ 100.272733] [dffedbd0] [c069ddb8] __skb_checksum_complete+0xf0/0x108 (unreliable) [ 100.272752] [dffedbf0] [c078ea28] __udp4_lib_rcv+0x238/0xf98 [ 100.272767] [dffedc70] [c0731630] ip_local_deliver_finish+0xa8/0x3c4 [ 100.272777] [dffedcb0] [c073243c] ip_local_deliver+0xf0/0x154 [ 100.272786] [dffedcf0] [c07328e8] ip_rcv+0x448/0x774 [ 100.272800] [dffedd50] [c06aeaec] __netif_receive_skb_core+0x5e8/0x1184 [ 100.272811] [dffedde0] [c06bba2c] napi_gro_receive+0x160/0x22c [ 100.272828] [dffede10] [e1571590] gem_poll+0x7fc/0x1ac0 [sungem] [ 100.272837] [dffedee0] [c06bacfc] net_rx_action+0x34c/0x618 [ 100.272849] [dffedf60] [c07fd28c] __do_softirq+0x16c/0x5f0 [ 100.272863] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8 [ 100.272877] [dffedff0] [c0016170] call_do_irq+0x24/0x3c [ 100.272890] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0 [ 100.272900] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14 [ 100.272911] --- interrupt: 501 at arch_cpu_idle+0x30/0x78 LR = arch_cpu_idle+0x30/0x78 [ 100.272920] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable) [ 100.272935] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158 [ 100.272944] [c0cf7fb0] [c00a3ab4] cpu_startup_entry+0x24/0x28 [ 100.272955] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490 [ 100.272963] [c0cf7ff0] [3444] 0x3444 > Andreas. > > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > "And now for something completely different."
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/18/2018 10:54 AM, Andreas Schwab wrote: > On Jun 17 2018, Eric Dumazet wrote: > >> Oh this is silly, please try : >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index >> c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe >> 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim); >> int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len) >> { >> if (skb->ip_summed == CHECKSUM_COMPLETE) { >> - int delta = skb->len - len; >> + __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0); >> >> - skb->csum = csum_sub(skb->csum, >> -skb_checksum(skb, len, delta, 0)); >> + skb->csum = csum_block_sub(skb->csum, csumdiff, len); >> } >> return __pskb_trim(skb, len); >> } > > That doesn't help either. > > Andreas. > Then maybe NIC provided csum is not correct. It does not compute a checksum on all the frame, but part of it. You could use this patch to double check. diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..277859ea73e35271a10b02011120fca248ec8e71 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -857,6 +857,11 @@ static int gem_rx(struct gem *gp, int work_to_do) csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0x); skb->csum = csum_unfold(csum); + { + __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0); + if (csum != csum_fold(rsum)) + pr_err_ratelimited("sungem wrong csum : %x/%x, len %u bytes\n", csum, csum_fold(rsum), len); + } skb->ip_summed = CHECKSUM_COMPLETE; skb->protocol = eth_type_trans(skb, gp->dev);
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On Jun 17 2018, Eric Dumazet wrote: > Oh this is silly, please try : > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index > c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe > 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim); > int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len) > { > if (skb->ip_summed == CHECKSUM_COMPLETE) { > - int delta = skb->len - len; > + __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0); > > - skb->csum = csum_sub(skb->csum, > -skb_checksum(skb, len, delta, 0)); > + skb->csum = csum_block_sub(skb->csum, csumdiff, len); > } > return __pskb_trim(skb, len); > } That doesn't help either. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/17/2018 03:27 AM, Andreas Schwab wrote: > > That doesn't change anything. OK, thanks ! Oh this is silly, please try : diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c642304f178ce0a4e1358d59e45032a39f76fb3f..54dd9c18ecad817812898d6f335e1794a07dabbe 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1845,10 +1845,9 @@ EXPORT_SYMBOL(___pskb_trim); int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len) { if (skb->ip_summed == CHECKSUM_COMPLETE) { - int delta = skb->len - len; + __wsum csumdiff = skb_checksum(skb, len, skb->len - len, 0); - skb->csum = csum_sub(skb->csum, -skb_checksum(skb, len, delta, 0)); + skb->csum = csum_block_sub(skb->csum, csumdiff, len); } return __pskb_trim(skb, len); }
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On Jun 16 2018, Eric Dumazet wrote: > I would try something like : > > Basically do not bother using CHECKSUM_COMPLETE for small frames that might > have been padded. > > Since we need to bring one cache line in eth_type_trans() and further header > processing, > computing the checksum in software will be almost free anyway. > > diff --git a/drivers/net/ethernet/sun/sungem.c > b/drivers/net/ethernet/sun/sungem.c > index > 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..071039f211a8a33153e888bd4014314ba5e686a4 > 100644 > --- a/drivers/net/ethernet/sun/sungem.c > +++ b/drivers/net/ethernet/sun/sungem.c > @@ -855,9 +855,11 @@ static int gem_rx(struct gem *gp, int work_to_do) > skb = copy_skb; > } > > - csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ > 0x); > - skb->csum = csum_unfold(csum); > - skb->ip_summed = CHECKSUM_COMPLETE; > + if (len > ETH_ZLEN) { > + csum = (__force __sum16)htons((status & > RXDCTRL_TCPCSUM) ^ 0x); > + skb->csum = csum_unfold(csum); > + skb->ip_summed = CHECKSUM_COMPLETE; > + } > skb->protocol = eth_type_trans(skb, gp->dev); > > napi_gro_receive(>napi, skb); That doesn't change anything. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On Jun 16 2018, Mathieu Malaterre wrote: > That's odd since it seems to only affect g4+sungem user. None of the > ppc64 seems to be having it. I'm also seeing it on a PowerMac G5. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
On 06/16/2018 12:14 AM, Mathieu Malaterre wrote: > Hi Eric, > > On Fri, Jun 15, 2018 at 9:14 PM Eric Dumazet wrote: >> >> >> >> On 06/15/2018 11:56 AM, Mathieu Malaterre wrote: >>> This reverts commit 88078d98d1bb085d72af8437707279e203524fa5. >>> >>> It causes regressions for people using chips driven by the sungem >>> driver. Suspicion is that the skb->csum value isn't being adjusted >>> properly. >>> >>> Symptoms as seen on G4+sungem are: >>> >>> [ 34.023281] eth0: hw csum failure >>> [ 34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2 >>> [ 34.023618] Call Trace: >>> [ 34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 >>> (unreliable) >>> [ 34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00 >>> [ 34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4 >>> [ 34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154 >>> [ 34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774 >>> [ 34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184 >>> [ 34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c >>> [ 34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem] >>> [ 34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618 >>> [ 34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0 >>> [ 34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8 >>> [ 34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c >>> [ 34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0 >>> [ 34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14 >>> [ 34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78 >>>LR = arch_cpu_idle+0x30/0x78 >>> [ 34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable) >>> [ 34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158 >>> [ 34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28 >>> [ 34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490 >>> [ 34.027181] [c0cf7ff0] [3444] 0x3444 >>> >>> See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more >>> adequately in pskb_trim_rcsum()."") for previous reference. >> >> This fix seems to hide a bug in csum functions on this architecture. > > That's odd since it seems to only affect g4+sungem user. None of the > ppc64 seems to be having it. And some ppc32 users are not even seeing > it. > >> Or a bug on this NIC when receiving a small packet (less than 60 bytes). >> Maybe the padding bytes are not included in NIC provided csum, and not 0. > > Ok in that case the bug is located in > ./drivers/net/ethernet/sun/sungem.c that seems more likely. I'll try > to understand that code, then. I would try something like : Basically do not bother using CHECKSUM_COMPLETE for small frames that might have been padded. Since we need to bring one cache line in eth_type_trans() and further header processing, computing the checksum in software will be almost free anyway. diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..071039f211a8a33153e888bd4014314ba5e686a4 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -855,9 +855,11 @@ static int gem_rx(struct gem *gp, int work_to_do) skb = copy_skb; } - csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0x); - skb->csum = csum_unfold(csum); - skb->ip_summed = CHECKSUM_COMPLETE; + if (len > ETH_ZLEN) { + csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0x); + skb->csum = csum_unfold(csum); + skb->ip_summed = CHECKSUM_COMPLETE; + } skb->protocol = eth_type_trans(skb, gp->dev); napi_gro_receive(>napi, skb);
Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
Hi Eric, On Fri, Jun 15, 2018 at 9:14 PM Eric Dumazet wrote: > > > > On 06/15/2018 11:56 AM, Mathieu Malaterre wrote: > > This reverts commit 88078d98d1bb085d72af8437707279e203524fa5. > > > > It causes regressions for people using chips driven by the sungem > > driver. Suspicion is that the skb->csum value isn't being adjusted > > properly. > > > > Symptoms as seen on G4+sungem are: > > > > [ 34.023281] eth0: hw csum failure > > [ 34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2 > > [ 34.023618] Call Trace: > > [ 34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 > > (unreliable) > > [ 34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00 > > [ 34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4 > > [ 34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154 > > [ 34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774 > > [ 34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184 > > [ 34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c > > [ 34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem] > > [ 34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618 > > [ 34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0 > > [ 34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8 > > [ 34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c > > [ 34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0 > > [ 34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14 > > [ 34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78 > >LR = arch_cpu_idle+0x30/0x78 > > [ 34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable) > > [ 34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158 > > [ 34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28 > > [ 34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490 > > [ 34.027181] [c0cf7ff0] [3444] 0x3444 > > > > See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more > > adequately in pskb_trim_rcsum()."") for previous reference. > > This fix seems to hide a bug in csum functions on this architecture. That's odd since it seems to only affect g4+sungem user. None of the ppc64 seems to be having it. And some ppc32 users are not even seeing it. > Or a bug on this NIC when receiving a small packet (less than 60 bytes). > Maybe the padding bytes are not included in NIC provided csum, and not 0. Ok in that case the bug is located in ./drivers/net/ethernet/sun/sungem.c that seems more likely. I'll try to understand that code, then. Thanks > > >