Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 31.10.2018 05:08, Andre Tomt wrote: On 30.10.2018 12:04, Andre Tomt wrote: On 30.10.2018 11:58, Andre Tomt wrote: On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 No. I've applied it now to 4.19 and will report back if anything shows up. Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. Only a basic stateless nftables ruleset and a vlan netdev (unlikely to be the one triggering this I guess; it has only v4 traffic). I'm currently testing 4.19 with the recomended commit added, plus these to sort out some GRO issues (on a hunch, unsure if related): https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 and I *think* it is behaving better now? it's not conclusive as it could take a while to trip in this environment but some of the test servers have not shown anything bad in almost 24h. Sorry, s/some of the/none of the I think it is fairly safe to say 4.19 + mlx4 + these 4 commits is OK. At least for my workload. Servers are now 51-61 hours in, no splats. I also added ntp pool traffic to one of them to make things a little more exciting. Not sure what is needed for 4.18, I dont have the mental bandwidth to test that right now. Also no idea about the similar looking mlx5 splats reported elsewhere. As expected conntrack/nat + vlan + forwarding still splats. sch_cake, IFB and VRF was removed from this setup. Here is a conntrack splat without IFB/VRF/Cake inteference: [34458.506346] wanib: hw csum failure [34458.506371] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.0-1 #1 [34458.506374] Hardware name: Supermicro Super Server/X10SDV-4C-TLN2F, BIOS 2.0 06/13/2018 [34458.506377] Call Trace: [34458.506381] [34458.506388] dump_stack+0x5c/0x80 [34458.506392] __skb_checksum_complete+0xac/0xc0 [34458.506402] icmp_error+0x1c8/0x1f0 [nf_conntrack] [34458.506406] ? skb_copy_bits+0x13d/0x220 [34458.506411] nf_conntrack_in+0xd8/0x390 [nf_conntrack] [34458.506416] ? ___pskb_trim+0x192/0x330 [34458.506421] nf_hook_slow+0x43/0xc0 [34458.506426] ip_rcv+0x90/0xb0 [34458.506430] ? ip_rcv_finish_core.isra.0+0x310/0x310 [34458.506435] __netif_receive_skb_one_core+0x42/0x50 [34458.506438] netif_receive_skb_internal+0x24/0xb0 [34458.506441] napi_gro_frags+0x177/0x210 [34458.506446] mlx4_en_process_rx_cq+0x8df/0xb50 [mlx4_en] [34458.506459] ? mlx4_eq_int+0x38f/0xcb0 [mlx4_core] [34458.506463] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] [34458.506466] net_rx_action+0xe1/0x2c0 [34458.506469] __do_softirq+0xe7/0x2d3 [34458.506475] irq_exit+0x96/0xd0 [34458.506478] do_IRQ+0x85/0xd0 [34458.506483] common_interrupt+0xf/0xf [34458.506486] [34458.506491] RIP: 0010:cpuidle_enter_state+0xb9/0x320 [34458.506495] Code: e8 3c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 5e fb c0 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3 [34458.506497] RSP: 0018:978d41943ea8 EFLAGS: 0246 ORIG_RAX: ffdb [34458.506500] RAX: 8d8f6fa60fc0 RBX: 1f56ff07af28 RCX: 001f [34458.506501] RDX: 1f56ff07af28 RSI: 3a2e90d6 RDI: [34458.506503] RBP: 8d8f6fa698c0 R08: 0002 R09: 00020840 [34458.506504] R10: 0004ea58f2899595 R11: 8d8f6fa601e8 R12: 0001 [34458.506505] R13: 8a0ac638 R14: 0001 R15: [34458.506509] ? cpuidle_enter_state+0x94/0x320 [34458.506512] do_idle+0x1e4/0x220 [34458.506515] cpu_startup_entry+0x5f/0x70 [34458.506519] start_secondary+0x185/0x1a0 [34458.506521] secondary_startup_64+0xa4/0xb0 Stateless filtered non-forwarding host still looks like it has been fixed (the udp6_gro_* splats are still all gone). Also seems fine when moving the traffic over a vlan device. These fixes went into 4.19.1-rc1 (checksum_complete + unlink gro packets on overflow fixes)
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 30.10.2018 12:04, Andre Tomt wrote: On 30.10.2018 11:58, Andre Tomt wrote: On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 No. I've applied it now to 4.19 and will report back if anything shows up. Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. Only a basic stateless nftables ruleset and a vlan netdev (unlikely to be the one triggering this I guess; it has only v4 traffic). I'm currently testing 4.19 with the recomended commit added, plus these to sort out some GRO issues (on a hunch, unsure if related): https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 and I *think* it is behaving better now? it's not conclusive as it could take a while to trip in this environment but some of the test servers have not shown anything bad in almost 24h. Sorry, s/some of the/none of the I think it is fairly safe to say 4.19 + mlx4 + these 4 commits is OK. At least for my workload. Servers are now 51-61 hours in, no splats. I also added ntp pool traffic to one of them to make things a little more exciting. Not sure what is needed for 4.18, I dont have the mental bandwidth to test that right now. Also no idea about the similar looking mlx5 splats reported elsewhere.
Re: Fw: [Bug 201423] New: eth0: hw csum failure
> On 10/16/2018 06:00 AM, Eric Dumazet wrote: > > On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: > >> > >> On 15.10.2018 17:41, Eric Dumazet wrote: > >>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger > Something is changed between 4.17.12 and 4.18, after bisecting the > problem I > got the following first bad commit: > > commit 88078d98d1bb085d72af8437707279e203524fa5 > Author: Eric Dumazet > Date: Wed Apr 18 11:43:15 2018 -0700 > > net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends > > After working on IP defragmentation lately, I found that some large > packets defeat CHECKSUM_COMPLETE optimization because of NIC adding > zero paddings on the last (small) fragment. > > While removing the padding with pskb_trim_rcsum(), we set > skb->ip_summed > to CHECKSUM_NONE, forcing a full csum validation, even if all prior > fragments had CHECKSUM_COMPLETE set. > > We can instead compute the checksum of the part we are trimming, > usually smaller than the part we keep. > > Signed-off-by: Eric Dumazet > Signed-off-by: David S. Miller > > >>> > >>> Thanks for bisecting ! > >>> > >>> This commit is known to expose some NIC/driver bugs. > >>> > >>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f > >>> ("net: sungem: fix rx checksum support") for one driver needing a fix. > >>> > >>> I assume SKY2_HW_NEW_LE is not set on your NIC ? > >>> > >> > >> I've seen similar on several systems with mlx4 cards when using 4.18.x - > >> that is hw csum failure followed by some backtrace. > >> > >> Only seems to happen on systems dealing with quite a bit of UDP. > >> > > > > Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE, > > but CHECKSUM_UNNECESSARY > > > > I would be nice to track this a bit further, maybe by providing the > > full packet content. > > > >> Example from 4.18.10: > >>> [635607.740574] p0xe0: hw csum failure > >>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 > >>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS > >>> 2.0b 05/02/2017 > >>> [635607.740599] Call Trace: > >>> [635607.740602] > >>> [635607.740611] dump_stack+0x5c/0x7b > >>> [635607.740617] __skb_gro_checksum_complete+0x9a/0xa0 > >>> [635607.740621] udp6_gro_receive+0x211/0x290 > >>> [635607.740624] ipv6_gro_receive+0x1a8/0x390 > >>> [635607.740627] dev_gro_receive+0x33e/0x550 > >>> [635607.740628] napi_gro_frags+0xa2/0x210 > >>> [635607.740635] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] > >>> [635607.740648] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] > >>> [635607.740654] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] > >>> [635607.740657] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] > >>> [635607.740658] net_rx_action+0xe0/0x2e0 > >>> [635607.740662] __do_softirq+0xd8/0x2e5 > >>> [635607.740666] irq_exit+0xb4/0xc0 > >>> [635607.740667] do_IRQ+0x85/0xd0 > >>> [635607.740670] common_interrupt+0xf/0xf > >>> [635607.740671] > >>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0 > >>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 > >>> 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 > >>> 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 > >>> [635607.740701] RSP: 0018:a5c206353ea8 EFLAGS: 0246 ORIG_RAX: > >>> ffd9 > >>> [635607.740703] RAX: 8d72ffd20f00 RBX: 00024214f597c5b0 RCX: > >>> 001f > >>> [635607.740703] RDX: 00024214f597c5b0 RSI: 00020780 RDI: > >>> > >>> [635607.740704] RBP: 0004 R08: 002542bfbefa99fa R09: > >>> > >>> [635607.740705] R10: a5c206353e88 R11: 00c5 R12: > >>> af0aaf78 > >>> [635607.740706] R13: 8d72ffd297d8 R14: R15: > >>> 00024214f58c2ed5 > >>> [635607.740709] ? cpuidle_enter_state+0x91/0x2a0 > >>> [635607.740712] do_idle+0x1d0/0x240 > >>> [635607.740715] cpu_startup_entry+0x5f/0x70 > >>> [635607.740719] start_secondary+0x185/0x1a0 > >>> [635607.740722] secondary_startup_64+0xa5/0xb0 > >>> [635607.740731] p0xe0: hw csum failure > >>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 > >>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS > >>> 2.0b 05/02/2017 > >>> [635607.740746] Call Trace: > >>> [635607.740747] > >>> [635607.740750] dump_stack+0x5c/0x7b > >>> [635607.740755] __skb_checksum_complete+0xb8/0xd0 > >>> [635607.740760] __udp6_lib_rcv+0xa6b/0xa70 > >>> [635607.740767] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] > >>> [635607.740770] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] > >>> [635607.740774] ip6_input_finish+0xc0/0x460 > >>> [635607.740776] ip6_input+0x2b/0x90 > >>> [635607.740778] ? ip6_rcv_finish+0x110/0x110 > >>> [635607.740780] ipv6_rcv+0x2cd/0x4b0 > >>>
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 30.10.2018 11:58, Andre Tomt wrote: On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 No. I've applied it now to 4.19 and will report back if anything shows up. Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. Only a basic stateless nftables ruleset and a vlan netdev (unlikely to be the one triggering this I guess; it has only v4 traffic). I'm currently testing 4.19 with the recomended commit added, plus these to sort out some GRO issues (on a hunch, unsure if related): https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 and I *think* it is behaving better now? it's not conclusive as it could take a while to trip in this environment but some of the test servers have not shown anything bad in almost 24h. Sorry, s/some of the/none of the
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 27.10.2018 23:41, Andre Tomt wrote: On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 No. I've applied it now to 4.19 and will report back if anything shows up. Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. Only a basic stateless nftables ruleset and a vlan netdev (unlikely to be the one triggering this I guess; it has only v4 traffic). I'm currently testing 4.19 with the recomended commit added, plus these to sort out some GRO issues (on a hunch, unsure if related): https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=a8305bff685252e80b7c60f4f5e7dd2e63e38218 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=992cba7e276d438ac8b0a8c17b147b37c8c286f7 https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=ece23711dd956cd5053c9cb03e9fe0668f9c8894 and I *think* it is behaving better now? it's not conclusive as it could take a while to trip in this environment but some of the test servers have not shown anything bad in almost 24h.
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 No. I've applied it now to 4.19 and will report back if anything shows up. Just hit it on the simpler server; no VRF, no tunnels, no nat/conntrack. Only a basic stateless nftables ruleset and a vlan netdev (unlikely to be the one triggering this I guess; it has only v4 traffic). On 4.19 + above commit: [158269.360271] p0xe0: hw csum failure [158269.360286] CPU: 3 PID: 0 Comm: swapper/3 Tainted: P O 4.19.0-1 #1 [158269.360287] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 [158269.360288] Call Trace: [158269.360290] [158269.360295] dump_stack+0x5c/0x7b [158269.360299] __skb_gro_checksum_complete+0x9a/0xa0 [158269.360301] udp6_gro_receive+0x211/0x290 [158269.360303] ipv6_gro_receive+0x1b1/0x3a0 [158269.360306] ? ip_sublist_rcv_finish+0x70/0x70 [158269.360307] dev_gro_receive+0x3a0/0x620 [158269.360309] ? __build_skb+0x25/0xe0 [158269.360310] napi_gro_frags+0xa8/0x220 [158269.360314] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] [158269.360322] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] [158269.360325] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] [158269.360327] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] [158269.360329] net_rx_action+0xe0/0x2e0 [158269.360330] __do_softirq+0xd8/0x2ff [158269.360333] irq_exit+0xbd/0xd0 [158269.360334] do_IRQ+0x85/0xd0 [158269.360336] common_interrupt+0xf/0xf [158269.360337] [158269.360339] RIP: 0010:cpuidle_enter_state+0xb3/0x310 [158269.360340] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 [158269.360341] RSP: 0018:af28c634bea8 EFLAGS: 0246 ORIG_RAX: ffd9 [158269.360342] RAX: 9a9f7fae0fc0 RBX: 8ff1f4ff622a RCX: 001f [158269.360343] RDX: 8ff1f4ff622a RSI: 22983893 RDI: [158269.360343] RBP: 0001 R08: 0002 R09: 00020840 [158269.360344] R10: af28c634be88 R11: 0036 R12: 9a9f7fae9aa8 [158269.360344] R13: aa0ac638 R14: R15: 8ff1f4f09d43 [158269.360347] ? cpuidle_enter_state+0x90/0x310 [158269.360349] do_idle+0x1d0/0x240 [158269.360351] cpu_startup_entry+0x5f/0x70 [158269.360352] start_secondary+0x185/0x1a0 [158269.360354] secondary_startup_64+0xa4/0xb0
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 26.10.2018 14:59, Eric Dumazet wrote: On Fri, Oct 26, 2018 at 5:38 AM Andre Tomt wrote: And it tripped again with that commit; however on another box with a much more complicated setup (VRFs, sch_cake, ifb, conntrack/nat, 6in4 tunnel, VF device on mlx4) [ 8197.348260] wanib: hw csum failure [ 8197.348288] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-1 #1 [ 8197.348289] Hardware name: Supermicro SYS-5018D-FN8T/X10SDV-TP8F, BIOS 1.3 03/19/2018 [ 8197.348290] Call Trace: [ 8197.348296] [ 8197.348304] dump_stack+0x5c/0x80 [ 8197.348308] __skb_checksum_complete+0xac/0xc0 [ 8197.348318] icmp_error+0x1c8/0x1f0 [nf_conntrack] [ 8197.348325] ? ip_output+0x61/0xc0 [ 8197.348328] ? skb_copy_bits+0x13d/0x220 [ 8197.348334] nf_conntrack_in+0xd8/0x390 [nf_conntrack] [ 8197.348339] ? ___pskb_trim+0x192/0x330 [ 8197.348343] nf_hook_slow+0x43/0xc0 [ 8197.348346] ip_rcv+0x90/0xb0 [ 8197.348349] ? ip_rcv_finish_core.isra.0+0x310/0x310 [ 8197.348354] __netif_receive_skb_one_core+0x42/0x50 [ 8197.348357] netif_receive_skb_internal+0x24/0xb0 [ 8197.348361] ifb_ri_tasklet+0x167/0x260 [ifb] [ 8197.348365] tasklet_action_common.isra.3+0x49/0xb0 [ 8197.348369] __do_softirq+0xe7/0x2d3 [ 8197.348372] irq_exit+0x96/0xd0 [ 8197.348375] do_IRQ+0x85/0xd0 [ 8197.348378] common_interrupt+0xf/0xf [ 8197.348379] [ 8197.348382] RIP: 0010:cpuidle_enter_state+0xb9/0x320 [ 8197.348384] Code: e8 1c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 3e fb c0 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3 [ 8197.348386] RSP: 0018:9f0441953ea8 EFLAGS: 0246 ORIG_RAX: ffd5 [ 8197.348388] RAX: 9759efae0fc0 RBX: 07749807d911 RCX: 001f [ 8197.348390] RDX: 07749807d911 RSI: 3a2e8670 RDI: [ 8197.348393] RBP: 9759efae98a8 R08: 0002 R09: 00020840 [ 8197.348396] R10: 00626b4810384abc R11: 9759efae01e8 R12: 0001 [ 8197.348398] R13: 8d0ac638 R14: 0001 R15: [ 8197.348402] ? cpuidle_enter_state+0x94/0x320 [ 8197.348407] do_idle+0x1e4/0x220 [ 8197.348411] cpu_startup_entry+0x5f/0x70 [ 8197.348415] start_secondary+0x185/0x1a0 [ 8197.348417] secondary_startup_64+0xa4/0xb0 Very different trace , yet another bug to track . If you can, try to remove some components from this setup. Will do. Just remembered I took out the VF stuff a few days ago and that netdev is just a normal vlan device now. Going to eliminate VRF and cake/ifb as well.
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On Fri, Oct 26, 2018 at 5:38 AM Andre Tomt wrote: > > On 26.10.2018 13:45, Andre Tomt wrote: > > On 25.10.2018 19:38, Eric Dumazet wrote: > >> > >> > >> On 10/24/2018 12:41 PM, Andre Tomt wrote: > >>> > >>> It eventually showed up again with mlx4, on 4.18.16 + fix and also on > >>> 4.19. I still do not have a useful packet capture. > >>> > >>> It is running a torrent client serving up various linux distributions. > >>> > >> > >> Have you also applied this fix ? > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 > >> > >> > > > > No. I've applied it now to 4.19 and will report back if anything shows up. > > And it tripped again with that commit; however on another box with a > much more complicated setup (VRFs, sch_cake, ifb, conntrack/nat, 6in4 > tunnel, VF device on mlx4) > > > [ 8197.348260] wanib: hw csum failure > > [ 8197.348288] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-1 #1 > > [ 8197.348289] Hardware name: Supermicro SYS-5018D-FN8T/X10SDV-TP8F, BIOS > > 1.3 03/19/2018 > > [ 8197.348290] Call Trace: > > [ 8197.348296] > > [ 8197.348304] dump_stack+0x5c/0x80 > > [ 8197.348308] __skb_checksum_complete+0xac/0xc0 > > [ 8197.348318] icmp_error+0x1c8/0x1f0 [nf_conntrack] > > [ 8197.348325] ? ip_output+0x61/0xc0 > > [ 8197.348328] ? skb_copy_bits+0x13d/0x220 > > [ 8197.348334] nf_conntrack_in+0xd8/0x390 [nf_conntrack] > > [ 8197.348339] ? ___pskb_trim+0x192/0x330 > > [ 8197.348343] nf_hook_slow+0x43/0xc0 > > [ 8197.348346] ip_rcv+0x90/0xb0 > > [ 8197.348349] ? ip_rcv_finish_core.isra.0+0x310/0x310 > > [ 8197.348354] __netif_receive_skb_one_core+0x42/0x50 > > [ 8197.348357] netif_receive_skb_internal+0x24/0xb0 > > [ 8197.348361] ifb_ri_tasklet+0x167/0x260 [ifb] > > [ 8197.348365] tasklet_action_common.isra.3+0x49/0xb0 > > [ 8197.348369] __do_softirq+0xe7/0x2d3 > > [ 8197.348372] irq_exit+0x96/0xd0 > > [ 8197.348375] do_IRQ+0x85/0xd0 > > [ 8197.348378] common_interrupt+0xf/0xf > > [ 8197.348379] > > [ 8197.348382] RIP: 0010:cpuidle_enter_state+0xb9/0x320 > > [ 8197.348384] Code: e8 1c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 > > 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 3e fb c0 ff fb 66 0f 1f 44 00 00 > > <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3 > > [ 8197.348386] RSP: 0018:9f0441953ea8 EFLAGS: 0246 ORIG_RAX: > > ffd5 > > [ 8197.348388] RAX: 9759efae0fc0 RBX: 07749807d911 RCX: > > 001f > > [ 8197.348390] RDX: 07749807d911 RSI: 3a2e8670 RDI: > > > > [ 8197.348393] RBP: 9759efae98a8 R08: 0002 R09: > > 00020840 > > [ 8197.348396] R10: 00626b4810384abc R11: 9759efae01e8 R12: > > 0001 > > [ 8197.348398] R13: 8d0ac638 R14: 0001 R15: > > > > [ 8197.348402] ? cpuidle_enter_state+0x94/0x320 > > [ 8197.348407] do_idle+0x1e4/0x220 > > [ 8197.348411] cpu_startup_entry+0x5f/0x70 > > [ 8197.348415] start_secondary+0x185/0x1a0 > > [ 8197.348417] secondary_startup_64+0xa4/0xb0 Very different trace , yet another bug to track . If you can, try to remove some components from this setup.
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 26.10.2018 13:45, Andre Tomt wrote: On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 No. I've applied it now to 4.19 and will report back if anything shows up. And it tripped again with that commit; however on another box with a much more complicated setup (VRFs, sch_cake, ifb, conntrack/nat, 6in4 tunnel, VF device on mlx4) [ 8197.348260] wanib: hw csum failure [ 8197.348288] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-1 #1 [ 8197.348289] Hardware name: Supermicro SYS-5018D-FN8T/X10SDV-TP8F, BIOS 1.3 03/19/2018 [ 8197.348290] Call Trace: [ 8197.348296] [ 8197.348304] dump_stack+0x5c/0x80 [ 8197.348308] __skb_checksum_complete+0xac/0xc0 [ 8197.348318] icmp_error+0x1c8/0x1f0 [nf_conntrack] [ 8197.348325] ? ip_output+0x61/0xc0 [ 8197.348328] ? skb_copy_bits+0x13d/0x220 [ 8197.348334] nf_conntrack_in+0xd8/0x390 [nf_conntrack] [ 8197.348339] ? ___pskb_trim+0x192/0x330 [ 8197.348343] nf_hook_slow+0x43/0xc0 [ 8197.348346] ip_rcv+0x90/0xb0 [ 8197.348349] ? ip_rcv_finish_core.isra.0+0x310/0x310 [ 8197.348354] __netif_receive_skb_one_core+0x42/0x50 [ 8197.348357] netif_receive_skb_internal+0x24/0xb0 [ 8197.348361] ifb_ri_tasklet+0x167/0x260 [ifb] [ 8197.348365] tasklet_action_common.isra.3+0x49/0xb0 [ 8197.348369] __do_softirq+0xe7/0x2d3 [ 8197.348372] irq_exit+0x96/0xd0 [ 8197.348375] do_IRQ+0x85/0xd0 [ 8197.348378] common_interrupt+0xf/0xf [ 8197.348379] [ 8197.348382] RIP: 0010:cpuidle_enter_state+0xb9/0x320 [ 8197.348384] Code: e8 1c 16 bc ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3b 02 00 00 31 ff e8 3e fb c0 ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 ba ff ff ff 7f 48 39 c3 [ 8197.348386] RSP: 0018:9f0441953ea8 EFLAGS: 0246 ORIG_RAX: ffd5 [ 8197.348388] RAX: 9759efae0fc0 RBX: 07749807d911 RCX: 001f [ 8197.348390] RDX: 07749807d911 RSI: 3a2e8670 RDI: [ 8197.348393] RBP: 9759efae98a8 R08: 0002 R09: 00020840 [ 8197.348396] R10: 00626b4810384abc R11: 9759efae01e8 R12: 0001 [ 8197.348398] R13: 8d0ac638 R14: 0001 R15: [ 8197.348402] ? cpuidle_enter_state+0x94/0x320 [ 8197.348407] do_idle+0x1e4/0x220 [ 8197.348411] cpu_startup_entry+0x5f/0x70 [ 8197.348415] start_secondary+0x185/0x1a0 [ 8197.348417] secondary_startup_64+0xa4/0xb0
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 25.10.2018 19:38, Eric Dumazet wrote: On 10/24/2018 12:41 PM, Andre Tomt wrote: It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913 No. I've applied it now to 4.19 and will report back if anything shows up.
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 10/24/2018 12:41 PM, Andre Tomt wrote: > > It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I > still do not have a useful packet capture. > > It is running a torrent client serving up various linux distributions. > Have you also applied this fix ? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=db4f1be3ca9b0ef7330763d07bf4ace83ad6f913
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 21.10.2018 15:34, Andre Tomt wrote: On 20.10.2018 00:25, Eric Dumazet wrote: On 10/19/2018 02:58 PM, Eric Dumazet wrote: On 10/16/2018 06:00 AM, Eric Dumazet wrote: On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: I've seen similar on several systems with mlx4 cards when using 4.18.x - that is hw csum failure followed by some backtrace. Only seems to happen on systems dealing with quite a bit of UDP. Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE, but CHECKSUM_UNNECESSARY I would be nice to track this a bit further, maybe by providing the full packet content. As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub() Problems comes from trimming an odd number of bytes. More exactly, trimming bytes starting at an odd offset. No hw csum failures here since I deployed Dimitris fix on top of 4.18.16 32 hours ago. Thanks It eventually showed up again with mlx4, on 4.18.16 + fix and also on 4.19. I still do not have a useful packet capture. It is running a torrent client serving up various linux distributions. [116116.994519] p0xe0: hw csum failure [116116.994550] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.0-1 #1 [116116.994551] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 [116116.994555] Call Trace: [116116.994558] [116116.994567] dump_stack+0x5c/0x7b [116116.994574] __skb_gro_checksum_complete+0x9a/0xa0 [116116.994580] udp6_gro_receive+0x211/0x290 [116116.994585] ipv6_gro_receive+0x1b1/0x3a0 [116116.994588] dev_gro_receive+0x3a0/0x620 [116116.994590] ? __build_skb+0x25/0xe0 [116116.994592] napi_gro_frags+0xa8/0x220 [116116.994598] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] [116116.994611] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] [116116.994621] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] [116116.994629] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] [116116.994635] net_rx_action+0xe0/0x2e0 [116116.994641] __do_softirq+0xd8/0x2ff [116116.994646] irq_exit+0xbd/0xd0 [116116.994650] do_IRQ+0x85/0xd0 [116116.994656] common_interrupt+0xf/0xf [116116.994659] [116116.994665] RIP: 0010:cpuidle_enter_state+0xb3/0x310 [116116.994668] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 [116116.994669] RSP: 0018:924a0635bea8 EFLAGS: 0246 ORIG_RAX: ffda [116116.994671] RAX: 9016ffb60fc0 RBX: 699b9835d616 RCX: 001f [116116.994673] RDX: 699b9835d616 RSI: 229837f7 RDI: [116116.994674] RBP: 0001 R08: 0002 R09: 00020840 [116116.994675] R10: 924a0635be88 R11: 0367 R12: 9016ffb69aa8 [116116.994676] R13: a50ac638 R14: R15: 699b981c63b9 [116116.994680] ? cpuidle_enter_state+0x90/0x310 [116116.994685] do_idle+0x1d0/0x240 [116116.994687] cpu_startup_entry+0x5f/0x70 [116116.994690] start_secondary+0x185/0x1a0 [116116.994693] secondary_startup_64+0xa4/0xb0 [116116.994709] p0xe0: hw csum failure [116116.994739] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.0-1 #1 [116116.994740] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 [116116.994741] Call Trace: [116116.994743] [116116.994746] dump_stack+0x5c/0x7b [116116.994751] __skb_checksum_complete+0xb8/0xd0 [116116.994755] __udp6_lib_rcv+0xa0e/0xa20 [116116.994764] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] [116116.994768] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] [116116.994771] ip6_input_finish+0xc0/0x460 [116116.994774] ip6_input+0x2b/0x90 [116116.994776] ? ip6_make_skb+0x1b0/0x1b0 [116116.994778] ipv6_rcv+0x54/0xb0 [116116.994781] __netif_receive_skb_one_core+0x42/0x50 [116116.994784] netif_receive_skb_internal+0x24/0xb0 [116116.994786] napi_gro_frags+0x171/0x220 [116116.994790] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] [116116.994798] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] [116116.994803] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] [116116.994806] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] [116116.994808] net_rx_action+0xe0/0x2e0 [116116.994810] __do_softirq+0xd8/0x2ff [116116.994812] irq_exit+0xbd/0xd0 [116116.994814] do_IRQ+0x85/0xd0 [116116.994816] common_interrupt+0xf/0xf [116116.994818] [116116.994821] RIP: 0010:cpuidle_enter_state+0xb3/0x310 [116116.994823] Code: 31 ff e8 e0 e0 bb ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3f 02 00 00 31 ff e8 64 cc c0 ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 [116116.994824] RSP: 0018:924a0635bea8 EFLAGS: 0246 ORIG_RAX: ffda [116116.994825] RAX: 9016ffb60fc0 RBX: 699b9835d616 RCX: 001f [116116.994826] RDX: 699b9835d616 RSI: 229837f7 RDI: [116116.994827] RBP: 0001 R08: 0002 R09:
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 20.10.2018 00:25, Eric Dumazet wrote: On 10/19/2018 02:58 PM, Eric Dumazet wrote: On 10/16/2018 06:00 AM, Eric Dumazet wrote: On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: I've seen similar on several systems with mlx4 cards when using 4.18.x - that is hw csum failure followed by some backtrace. Only seems to happen on systems dealing with quite a bit of UDP. Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE, but CHECKSUM_UNNECESSARY I would be nice to track this a bit further, maybe by providing the full packet content. As a matter of fact Dimitris found the issue in the patch and is working on a fix involving csum_block_sub() Problems comes from trimming an odd number of bytes. More exactly, trimming bytes starting at an odd offset. No hw csum failures here since I deployed Dimitris fix on top of 4.18.16 32 hours ago. Thanks
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 10/19/2018 02:58 PM, Eric Dumazet wrote: > > > On 10/16/2018 06:00 AM, Eric Dumazet wrote: >> On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: >>> >>> On 15.10.2018 17:41, Eric Dumazet wrote: On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger > Something is changed between 4.17.12 and 4.18, after bisecting the > problem I > got the following first bad commit: > > commit 88078d98d1bb085d72af8437707279e203524fa5 > Author: Eric Dumazet > Date: Wed Apr 18 11:43:15 2018 -0700 > > net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends > > After working on IP defragmentation lately, I found that some large > packets defeat CHECKSUM_COMPLETE optimization because of NIC adding > zero paddings on the last (small) fragment. > > While removing the padding with pskb_trim_rcsum(), we set > skb->ip_summed > to CHECKSUM_NONE, forcing a full csum validation, even if all prior > fragments had CHECKSUM_COMPLETE set. > > We can instead compute the checksum of the part we are trimming, > usually smaller than the part we keep. > > Signed-off-by: Eric Dumazet > Signed-off-by: David S. Miller > Thanks for bisecting ! This commit is known to expose some NIC/driver bugs. Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ("net: sungem: fix rx checksum support") for one driver needing a fix. I assume SKY2_HW_NEW_LE is not set on your NIC ? >>> >>> I've seen similar on several systems with mlx4 cards when using 4.18.x - >>> that is hw csum failure followed by some backtrace. >>> >>> Only seems to happen on systems dealing with quite a bit of UDP. >>> >> >> Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE, >> but CHECKSUM_UNNECESSARY >> >> I would be nice to track this a bit further, maybe by providing the >> full packet content. >> >>> Example from 4.18.10: [635607.740574] p0xe0: hw csum failure [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 [635607.740599] Call Trace: [635607.740602] [635607.740611] dump_stack+0x5c/0x7b [635607.740617] __skb_gro_checksum_complete+0x9a/0xa0 [635607.740621] udp6_gro_receive+0x211/0x290 [635607.740624] ipv6_gro_receive+0x1a8/0x390 [635607.740627] dev_gro_receive+0x33e/0x550 [635607.740628] napi_gro_frags+0xa2/0x210 [635607.740635] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] [635607.740648] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] [635607.740654] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] [635607.740657] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] [635607.740658] net_rx_action+0xe0/0x2e0 [635607.740662] __do_softirq+0xd8/0x2e5 [635607.740666] irq_exit+0xb4/0xc0 [635607.740667] do_IRQ+0x85/0xd0 [635607.740670] common_interrupt+0xf/0xf [635607.740671] [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0 [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 [635607.740701] RSP: 0018:a5c206353ea8 EFLAGS: 0246 ORIG_RAX: ffd9 [635607.740703] RAX: 8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 001f [635607.740703] RDX: 00024214f597c5b0 RSI: 00020780 RDI: [635607.740704] RBP: 0004 R08: 002542bfbefa99fa R09: [635607.740705] R10: a5c206353e88 R11: 00c5 R12: af0aaf78 [635607.740706] R13: 8d72ffd297d8 R14: R15: 00024214f58c2ed5 [635607.740709] ? cpuidle_enter_state+0x91/0x2a0 [635607.740712] do_idle+0x1d0/0x240 [635607.740715] cpu_startup_entry+0x5f/0x70 [635607.740719] start_secondary+0x185/0x1a0 [635607.740722] secondary_startup_64+0xa5/0xb0 [635607.740731] p0xe0: hw csum failure [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 [635607.740746] Call Trace: [635607.740747] [635607.740750] dump_stack+0x5c/0x7b [635607.740755] __skb_checksum_complete+0xb8/0xd0 [635607.740760] __udp6_lib_rcv+0xa6b/0xa70 [635607.740767] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] [635607.740770] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] [635607.740774] ip6_input_finish+0xc0/0x460 [635607.740776] ip6_input+0x2b/0x90 [635607.740778] ? ip6_rcv_finish+0x110/0x110 [635607.740780] ipv6_rcv+0x2cd/0x4b0 [635607.740783] ? udp6_lib_lookup_skb+0x59/0x80
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 10/16/2018 06:00 AM, Eric Dumazet wrote: > On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: >> >> On 15.10.2018 17:41, Eric Dumazet wrote: >>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger Something is changed between 4.17.12 and 4.18, after bisecting the problem I got the following first bad commit: commit 88078d98d1bb085d72af8437707279e203524fa5 Author: Eric Dumazet Date: Wed Apr 18 11:43:15 2018 -0700 net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends After working on IP defragmentation lately, I found that some large packets defeat CHECKSUM_COMPLETE optimization because of NIC adding zero paddings on the last (small) fragment. While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed to CHECKSUM_NONE, forcing a full csum validation, even if all prior fragments had CHECKSUM_COMPLETE set. We can instead compute the checksum of the part we are trimming, usually smaller than the part we keep. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller >>> >>> Thanks for bisecting ! >>> >>> This commit is known to expose some NIC/driver bugs. >>> >>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f >>> ("net: sungem: fix rx checksum support") for one driver needing a fix. >>> >>> I assume SKY2_HW_NEW_LE is not set on your NIC ? >>> >> >> I've seen similar on several systems with mlx4 cards when using 4.18.x - >> that is hw csum failure followed by some backtrace. >> >> Only seems to happen on systems dealing with quite a bit of UDP. >> > > Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE, > but CHECKSUM_UNNECESSARY > > I would be nice to track this a bit further, maybe by providing the > full packet content. > >> Example from 4.18.10: >>> [635607.740574] p0xe0: hw csum failure >>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 >>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b >>> 05/02/2017 >>> [635607.740599] Call Trace: >>> [635607.740602] >>> [635607.740611] dump_stack+0x5c/0x7b >>> [635607.740617] __skb_gro_checksum_complete+0x9a/0xa0 >>> [635607.740621] udp6_gro_receive+0x211/0x290 >>> [635607.740624] ipv6_gro_receive+0x1a8/0x390 >>> [635607.740627] dev_gro_receive+0x33e/0x550 >>> [635607.740628] napi_gro_frags+0xa2/0x210 >>> [635607.740635] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] >>> [635607.740648] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] >>> [635607.740654] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] >>> [635607.740657] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] >>> [635607.740658] net_rx_action+0xe0/0x2e0 >>> [635607.740662] __do_softirq+0xd8/0x2e5 >>> [635607.740666] irq_exit+0xb4/0xc0 >>> [635607.740667] do_IRQ+0x85/0xd0 >>> [635607.740670] common_interrupt+0xf/0xf >>> [635607.740671] >>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0 >>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 >>> 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 >>> <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 >>> [635607.740701] RSP: 0018:a5c206353ea8 EFLAGS: 0246 ORIG_RAX: >>> ffd9 >>> [635607.740703] RAX: 8d72ffd20f00 RBX: 00024214f597c5b0 RCX: >>> 001f >>> [635607.740703] RDX: 00024214f597c5b0 RSI: 00020780 RDI: >>> >>> [635607.740704] RBP: 0004 R08: 002542bfbefa99fa R09: >>> >>> [635607.740705] R10: a5c206353e88 R11: 00c5 R12: >>> af0aaf78 >>> [635607.740706] R13: 8d72ffd297d8 R14: R15: >>> 00024214f58c2ed5 >>> [635607.740709] ? cpuidle_enter_state+0x91/0x2a0 >>> [635607.740712] do_idle+0x1d0/0x240 >>> [635607.740715] cpu_startup_entry+0x5f/0x70 >>> [635607.740719] start_secondary+0x185/0x1a0 >>> [635607.740722] secondary_startup_64+0xa5/0xb0 >>> [635607.740731] p0xe0: hw csum failure >>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 >>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b >>> 05/02/2017 >>> [635607.740746] Call Trace: >>> [635607.740747] >>> [635607.740750] dump_stack+0x5c/0x7b >>> [635607.740755] __skb_checksum_complete+0xb8/0xd0 >>> [635607.740760] __udp6_lib_rcv+0xa6b/0xa70 >>> [635607.740767] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] >>> [635607.740770] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] >>> [635607.740774] ip6_input_finish+0xc0/0x460 >>> [635607.740776] ip6_input+0x2b/0x90 >>> [635607.740778] ? ip6_rcv_finish+0x110/0x110 >>> [635607.740780] ipv6_rcv+0x2cd/0x4b0 >>> [635607.740783] ? udp6_lib_lookup_skb+0x59/0x80 >>> [635607.740785] __netif_receive_skb_core+0x455/0xb30 >>> [635607.740788] ? ipv6_gro_receive+0x1a8/0x390 >>> [635607.740790] ? netif_receive_skb_internal+0x24/0xb0 >>>
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt wrote: > > On 15.10.2018 17:41, Eric Dumazet wrote: > > On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger > >> Something is changed between 4.17.12 and 4.18, after bisecting the problem > >> I > >> got the following first bad commit: > >> > >> commit 88078d98d1bb085d72af8437707279e203524fa5 > >> Author: Eric Dumazet > >> Date: Wed Apr 18 11:43:15 2018 -0700 > >> > >> net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends > >> > >> After working on IP defragmentation lately, I found that some large > >> packets defeat CHECKSUM_COMPLETE optimization because of NIC adding > >> zero paddings on the last (small) fragment. > >> > >> While removing the padding with pskb_trim_rcsum(), we set > >> skb->ip_summed > >> to CHECKSUM_NONE, forcing a full csum validation, even if all prior > >> fragments had CHECKSUM_COMPLETE set. > >> > >> We can instead compute the checksum of the part we are trimming, > >> usually smaller than the part we keep. > >> > >> Signed-off-by: Eric Dumazet > >> Signed-off-by: David S. Miller > >> > > > > Thanks for bisecting ! > > > > This commit is known to expose some NIC/driver bugs. > > > > Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f > > ("net: sungem: fix rx checksum support") for one driver needing a fix. > > > > I assume SKY2_HW_NEW_LE is not set on your NIC ? > > > > I've seen similar on several systems with mlx4 cards when using 4.18.x - > that is hw csum failure followed by some backtrace. > > Only seems to happen on systems dealing with quite a bit of UDP. > Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE, but CHECKSUM_UNNECESSARY I would be nice to track this a bit further, maybe by providing the full packet content. > Example from 4.18.10: > > [635607.740574] p0xe0: hw csum failure > > [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 > > [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b > > 05/02/2017 > > [635607.740599] Call Trace: > > [635607.740602] > > [635607.740611] dump_stack+0x5c/0x7b > > [635607.740617] __skb_gro_checksum_complete+0x9a/0xa0 > > [635607.740621] udp6_gro_receive+0x211/0x290 > > [635607.740624] ipv6_gro_receive+0x1a8/0x390 > > [635607.740627] dev_gro_receive+0x33e/0x550 > > [635607.740628] napi_gro_frags+0xa2/0x210 > > [635607.740635] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] > > [635607.740648] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] > > [635607.740654] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] > > [635607.740657] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] > > [635607.740658] net_rx_action+0xe0/0x2e0 > > [635607.740662] __do_softirq+0xd8/0x2e5 > > [635607.740666] irq_exit+0xb4/0xc0 > > [635607.740667] do_IRQ+0x85/0xd0 > > [635607.740670] common_interrupt+0xf/0xf > > [635607.740671] > > [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0 > > [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 > > 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 > > <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 > > [635607.740701] RSP: 0018:a5c206353ea8 EFLAGS: 0246 ORIG_RAX: > > ffd9 > > [635607.740703] RAX: 8d72ffd20f00 RBX: 00024214f597c5b0 RCX: > > 001f > > [635607.740703] RDX: 00024214f597c5b0 RSI: 00020780 RDI: > > > > [635607.740704] RBP: 0004 R08: 002542bfbefa99fa R09: > > > > [635607.740705] R10: a5c206353e88 R11: 00c5 R12: > > af0aaf78 > > [635607.740706] R13: 8d72ffd297d8 R14: R15: > > 00024214f58c2ed5 > > [635607.740709] ? cpuidle_enter_state+0x91/0x2a0 > > [635607.740712] do_idle+0x1d0/0x240 > > [635607.740715] cpu_startup_entry+0x5f/0x70 > > [635607.740719] start_secondary+0x185/0x1a0 > > [635607.740722] secondary_startup_64+0xa5/0xb0 > > [635607.740731] p0xe0: hw csum failure > > [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 > > [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b > > 05/02/2017 > > [635607.740746] Call Trace: > > [635607.740747] > > [635607.740750] dump_stack+0x5c/0x7b > > [635607.740755] __skb_checksum_complete+0xb8/0xd0 > > [635607.740760] __udp6_lib_rcv+0xa6b/0xa70 > > [635607.740767] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] > > [635607.740770] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] > > [635607.740774] ip6_input_finish+0xc0/0x460 > > [635607.740776] ip6_input+0x2b/0x90 > > [635607.740778] ? ip6_rcv_finish+0x110/0x110 > > [635607.740780] ipv6_rcv+0x2cd/0x4b0 > > [635607.740783] ? udp6_lib_lookup_skb+0x59/0x80 > > [635607.740785] __netif_receive_skb_core+0x455/0xb30 > > [635607.740788] ? ipv6_gro_receive+0x1a8/0x390 > > [635607.740790] ? netif_receive_skb_internal+0x24/0xb0 > > [635607.740792] netif_receive_skb_internal+0x24/0xb0 > > [635607.740793]
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 15.10.2018 17:41, Eric Dumazet wrote: On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger Something is changed between 4.17.12 and 4.18, after bisecting the problem I got the following first bad commit: commit 88078d98d1bb085d72af8437707279e203524fa5 Author: Eric Dumazet Date: Wed Apr 18 11:43:15 2018 -0700 net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends After working on IP defragmentation lately, I found that some large packets defeat CHECKSUM_COMPLETE optimization because of NIC adding zero paddings on the last (small) fragment. While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed to CHECKSUM_NONE, forcing a full csum validation, even if all prior fragments had CHECKSUM_COMPLETE set. We can instead compute the checksum of the part we are trimming, usually smaller than the part we keep. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Thanks for bisecting ! This commit is known to expose some NIC/driver bugs. Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f ("net: sungem: fix rx checksum support") for one driver needing a fix. I assume SKY2_HW_NEW_LE is not set on your NIC ? I've seen similar on several systems with mlx4 cards when using 4.18.x - that is hw csum failure followed by some backtrace. Only seems to happen on systems dealing with quite a bit of UDP. Example from 4.18.10: [635607.740574] p0xe0: hw csum failure [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 [635607.740599] Call Trace: [635607.740602] [635607.740611] dump_stack+0x5c/0x7b [635607.740617] __skb_gro_checksum_complete+0x9a/0xa0 [635607.740621] udp6_gro_receive+0x211/0x290 [635607.740624] ipv6_gro_receive+0x1a8/0x390 [635607.740627] dev_gro_receive+0x33e/0x550 [635607.740628] napi_gro_frags+0xa2/0x210 [635607.740635] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] [635607.740648] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] [635607.740654] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] [635607.740657] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] [635607.740658] net_rx_action+0xe0/0x2e0 [635607.740662] __do_softirq+0xd8/0x2e5 [635607.740666] irq_exit+0xb4/0xc0 [635607.740667] do_IRQ+0x85/0xd0 [635607.740670] common_interrupt+0xf/0xf [635607.740671] [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0 [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7 [635607.740701] RSP: 0018:a5c206353ea8 EFLAGS: 0246 ORIG_RAX: ffd9 [635607.740703] RAX: 8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 001f [635607.740703] RDX: 00024214f597c5b0 RSI: 00020780 RDI: [635607.740704] RBP: 0004 R08: 002542bfbefa99fa R09: [635607.740705] R10: a5c206353e88 R11: 00c5 R12: af0aaf78 [635607.740706] R13: 8d72ffd297d8 R14: R15: 00024214f58c2ed5 [635607.740709] ? cpuidle_enter_state+0x91/0x2a0 [635607.740712] do_idle+0x1d0/0x240 [635607.740715] cpu_startup_entry+0x5f/0x70 [635607.740719] start_secondary+0x185/0x1a0 [635607.740722] secondary_startup_64+0xa5/0xb0 [635607.740731] p0xe0: hw csum failure [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1 [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 2.0b 05/02/2017 [635607.740746] Call Trace: [635607.740747] [635607.740750] dump_stack+0x5c/0x7b [635607.740755] __skb_checksum_complete+0xb8/0xd0 [635607.740760] __udp6_lib_rcv+0xa6b/0xa70 [635607.740767] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] [635607.740770] ? nft_do_chain_inet+0x7a/0xd0 [nf_tables] [635607.740774] ip6_input_finish+0xc0/0x460 [635607.740776] ip6_input+0x2b/0x90 [635607.740778] ? ip6_rcv_finish+0x110/0x110 [635607.740780] ipv6_rcv+0x2cd/0x4b0 [635607.740783] ? udp6_lib_lookup_skb+0x59/0x80 [635607.740785] __netif_receive_skb_core+0x455/0xb30 [635607.740788] ? ipv6_gro_receive+0x1a8/0x390 [635607.740790] ? netif_receive_skb_internal+0x24/0xb0 [635607.740792] netif_receive_skb_internal+0x24/0xb0 [635607.740793] napi_gro_frags+0x165/0x210 [635607.740796] mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en] [635607.740802] ? mlx4_cq_completion+0x23/0x70 [mlx4_core] [635607.740807] ? mlx4_eq_int+0x373/0xc80 [mlx4_core] [635607.740810] mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en] [635607.740811] net_rx_action+0xe0/0x2e0 [635607.740813] __do_softirq+0xd8/0x2e5 [635607.740816] irq_exit+0xb4/0xc0 [635607.740817] do_IRQ+0x85/0xd0 [635607.740820] common_interrupt+0xf/0xf [635607.740821] [635607.740823] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0 [635607.740823] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On 15 October 2018 17:41:47 CEST, Eric Dumazet wrote: >On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger > wrote: >> >> >> >> Begin forwarded message: >> >> Date: Sun, 14 Oct 2018 10:42:48 + >> From: bugzilla-dae...@bugzilla.kernel.org >> To: step...@networkplumber.org >> Subject: [Bug 201423] New: eth0: hw csum failure >> >> >> https://bugzilla.kernel.org/show_bug.cgi?id=201423 >> >> Bug ID: 201423 >>Summary: eth0: hw csum failure >>Product: Networking >>Version: 2.5 >> Kernel Version: 4.19.0-rc7 >> Hardware: Intel >> OS: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Other >> Assignee: step...@networkplumber.org >> Reporter: ross...@inwind.it >> Regression: No >> >> I have a P6T DELUXE V2 motherboard and using the sky2 driver for the >ethernet >> ports. I get the following error message: >> >> [ 433.727397] eth0: hw csum failure >> [ 433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 >#19 >> [ 433.727406] Hardware name: System manufacturer System Product >Name/P6T >> DELUXE V2, BIOS 120212/22/2010 >> [ 433.727407] Call Trace: >> [ 433.727409] >> [ 433.727415] dump_stack+0x46/0x5b >> [ 433.727419] __skb_checksum_complete+0xb0/0xc0 >> [ 433.727423] tcp_v4_rcv+0x528/0xb60 >> [ 433.727426] ? ipt_do_table+0x2d0/0x400 >> [ 433.727429] ip_local_deliver_finish+0x5a/0x110 >> [ 433.727430] ip_local_deliver+0xe1/0xf0 >> [ 433.727431] ? ip_sublist_rcv_finish+0x60/0x60 >> [ 433.727432] ip_rcv+0xca/0xe0 >> [ 433.727434] ? ip_rcv_finish_core.isra.0+0x300/0x300 >> [ 433.727436] __netif_receive_skb_one_core+0x4b/0x70 >> [ 433.727438] netif_receive_skb_internal+0x4e/0x130 >> [ 433.727439] napi_gro_receive+0x6a/0x80 >> [ 433.727442] sky2_poll+0x707/0xd20 >> [ 433.727446] ? rcu_check_callbacks+0x1b4/0x900 >> [ 433.727447] net_rx_action+0x237/0x380 >> [ 433.727449] __do_softirq+0xdc/0x1e0 >> [ 433.727452] irq_exit+0xa9/0xb0 >> [ 433.727453] do_IRQ+0x45/0xc0 >> [ 433.727455] common_interrupt+0xf/0xf >> [ 433.727456] >> [ 433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200 >> [ 433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e >e8 d1 8f >> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 ><4c> 89 e1 >> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48 >> [ 433.727462] RSP: :c90a3e98 EFLAGS: 0282 ORIG_RAX: >> ffde >> [ 433.727463] RAX: 880237b1f280 RBX: 0004 RCX: >> 001f >> [ 433.727464] RDX: 20c49ba5e353f7cf RSI: 2fe419c1 RDI: >> >> [ 433.727465] RBP: 880237b263a0 R08: 0714 R09: >> 00650512105d >> [ 433.727465] R10: R11: 0342 R12: >> 0064fc2a8b1c >> [ 433.727466] R13: 0064fc25b35f R14: 0004 R15: >> 8204af20 >> [ 433.727468] ? cpuidle_enter_state+0x119/0x200 >> [ 433.727471] do_idle+0x1bf/0x200 >> [ 433.727473] cpu_startup_entry+0x6a/0x70 >> [ 433.727475] start_secondary+0x17f/0x1c0 >> [ 433.727476] secondary_startup_64+0xa4/0xb0 >> [ 441.662954] eth0: hw csum failure >> [ 441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted >4.19.0-rc7 #19 >> [ 441.662960] Hardware name: System manufacturer System Product >Name/P6T >> DELUXE V2, BIOS 120212/22/2010 >> [ 441.662960] Call Trace: >> [ 441.662963] >> [ 441.662968] dump_stack+0x46/0x5b >> [ 441.662972] __skb_checksum_complete+0xb0/0xc0 >> [ 441.662975] tcp_v4_rcv+0x528/0xb60 >> [ 441.662979] ? ipt_do_table+0x2d0/0x400 >> [ 441.662981] ip_local_deliver_finish+0x5a/0x110 >> [ 441.662983] ip_local_deliver+0xe1/0xf0 >> [ 441.662985] ? ip_sublist_rcv_finish+0x60/0x60 >> [ 441.662986] ip_rcv+0xca/0xe0 >> [ 441.662988] ? ip_rcv_finish_core.isra.0+0x300/0x300 >> [ 441.662990] __netif_receive_skb_one_core+0x4b/0x70 >> [ 441.662993] netif_receive_skb_internal+0x4e/0x130 >> [ 441.662994] napi_gro_receive+0x6a/0x80 >> [ 441.662998] sky2_poll+0x707/0xd20 >> [ 441.663000] net_rx_action+0x237/0x380 >> [ 441.663002] __do_softirq+0xdc/0x1e0 >> [ 441.663005] irq_exit+0xa9/0xb0 >> [ 441.663007] do_IRQ+0x45/0xc0 >> [ 441.663009] common_interrupt+0xf/0xf >> [ 441.663010] >> [ 441.663012] RIP: 0010:merge+0x22/0xb0 >> [ 441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 >89 d5 53 >> 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 ><48> 85 c9 >> 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48 >> [ 441.663015] RSP: 0018:c990b988 EFLAGS: 0246 ORIG_RAX: >> ffde >> [ 441.663017] RAX: RBX: 88021ab2d408 RCX: >> 88021ab2d408 >> [ 441.663018] RDX: 88021ab2d388 RSI: a021c440 RDI: >> >> [ 441.663019] RBP: 88021ab2d388 R08: 5ecf
Re: Fw: [Bug 201423] New: eth0: hw csum failure
Hi Eric. On Mon, 15 Oct 2018 at 16:42, Eric Dumazet wrote: > > On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger > wrote: > > > > > > > > Begin forwarded message: > > > > Date: Sun, 14 Oct 2018 10:42:48 + > > From: bugzilla-dae...@bugzilla.kernel.org > > To: step...@networkplumber.org > > Subject: [Bug 201423] New: eth0: hw csum failure > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=201423 > > > > Bug ID: 201423 > >Summary: eth0: hw csum failure > >Product: Networking > >Version: 2.5 > > Kernel Version: 4.19.0-rc7 > > Hardware: Intel > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: normal > > Priority: P1 > > Component: Other > > Assignee: step...@networkplumber.org > > Reporter: ross...@inwind.it > > Regression: No > > > > I have a P6T DELUXE V2 motherboard and using the sky2 driver for the > > ethernet > > ports. I get the following error message: > > > > [ 433.727397] eth0: hw csum failure > > [ 433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19 > > [ 433.727406] Hardware name: System manufacturer System Product Name/P6T > > DELUXE V2, BIOS 120212/22/2010 > > [ 433.727407] Call Trace: > > [ 433.727409] > > [ 433.727415] dump_stack+0x46/0x5b > > [ 433.727419] __skb_checksum_complete+0xb0/0xc0 > > [ 433.727423] tcp_v4_rcv+0x528/0xb60 > > [ 433.727426] ? ipt_do_table+0x2d0/0x400 > > [ 433.727429] ip_local_deliver_finish+0x5a/0x110 > > [ 433.727430] ip_local_deliver+0xe1/0xf0 > > [ 433.727431] ? ip_sublist_rcv_finish+0x60/0x60 > > [ 433.727432] ip_rcv+0xca/0xe0 > > [ 433.727434] ? ip_rcv_finish_core.isra.0+0x300/0x300 > > [ 433.727436] __netif_receive_skb_one_core+0x4b/0x70 > > [ 433.727438] netif_receive_skb_internal+0x4e/0x130 > > [ 433.727439] napi_gro_receive+0x6a/0x80 > > [ 433.727442] sky2_poll+0x707/0xd20 > > [ 433.727446] ? rcu_check_callbacks+0x1b4/0x900 > > [ 433.727447] net_rx_action+0x237/0x380 > > [ 433.727449] __do_softirq+0xdc/0x1e0 > > [ 433.727452] irq_exit+0xa9/0xb0 > > [ 433.727453] do_IRQ+0x45/0xc0 > > [ 433.727455] common_interrupt+0xf/0xf > > [ 433.727456] > > [ 433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200 > > [ 433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 > > 8f > > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> > > 89 e1 > > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48 > > [ 433.727462] RSP: :c90a3e98 EFLAGS: 0282 ORIG_RAX: > > ffde > > [ 433.727463] RAX: 880237b1f280 RBX: 0004 RCX: > > 001f > > [ 433.727464] RDX: 20c49ba5e353f7cf RSI: 2fe419c1 RDI: > > > > [ 433.727465] RBP: 880237b263a0 R08: 0714 R09: > > 00650512105d > > [ 433.727465] R10: R11: 0342 R12: > > 0064fc2a8b1c > > [ 433.727466] R13: 0064fc25b35f R14: 0004 R15: > > 8204af20 > > [ 433.727468] ? cpuidle_enter_state+0x119/0x200 > > [ 433.727471] do_idle+0x1bf/0x200 > > [ 433.727473] cpu_startup_entry+0x6a/0x70 > > [ 433.727475] start_secondary+0x17f/0x1c0 > > [ 433.727476] secondary_startup_64+0xa4/0xb0 > > [ 441.662954] eth0: hw csum failure > > [ 441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19 > > [ 441.662960] Hardware name: System manufacturer System Product Name/P6T > > DELUXE V2, BIOS 120212/22/2010 > > [ 441.662960] Call Trace: > > [ 441.662963] > > [ 441.662968] dump_stack+0x46/0x5b > > [ 441.662972] __skb_checksum_complete+0xb0/0xc0 > > [ 441.662975] tcp_v4_rcv+0x528/0xb60 > > [ 441.662979] ? ipt_do_table+0x2d0/0x400 > > [ 441.662981] ip_local_deliver_finish+0x5a/0x110 > > [ 441.662983] ip_local_deliver+0xe1/0xf0 > > [ 441.662985] ? ip_sublist_rcv_finish+0x60/0x60 > > [ 441.662986] ip_rcv+0xca/0xe0 > > [ 441.662988] ? ip_rcv_finish_core.isra.0+0x300/0x300 > > [ 441.662990] __netif_receive_skb_one_core+0x4b/0x70 > > [ 441.662993] netif_receive_skb_internal+0x4e/0x130 > > [ 441.662994] napi_gro_receive+0x6a/0x80 > > [ 441.662998] sky2_poll+0x707/0xd20 > > [ 441.663000] net_rx_action+0x237/0x380 > > [ 441.663002] __do_softirq+0xdc/0x1e0 > > [ 441.663005] irq_exit+0xa9/0xb0 > > [ 441.663007] do_IRQ+0x45/0xc0 > > [ 441.663009] common_interrupt+0xf/0xf > > [ 441.663010] > > [ 441.663012] RIP: 0010:merge+0x22/0xb0 > > [ 441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 > > 53 > > 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> > > 85 c9 > > 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48 > > [ 441.663015] RSP: 0018:c990b988 EFLAGS: 0246 ORIG_RAX: > > ffde > > [ 441.663017] RAX: RBX: 88021ab2d408 RCX: > > 88021ab2d408 > > [
Re: Fw: [Bug 201423] New: eth0: hw csum failure
On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger wrote: > > > > Begin forwarded message: > > Date: Sun, 14 Oct 2018 10:42:48 + > From: bugzilla-dae...@bugzilla.kernel.org > To: step...@networkplumber.org > Subject: [Bug 201423] New: eth0: hw csum failure > > > https://bugzilla.kernel.org/show_bug.cgi?id=201423 > > Bug ID: 201423 >Summary: eth0: hw csum failure >Product: Networking >Version: 2.5 > Kernel Version: 4.19.0-rc7 > Hardware: Intel > OS: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Other > Assignee: step...@networkplumber.org > Reporter: ross...@inwind.it > Regression: No > > I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet > ports. I get the following error message: > > [ 433.727397] eth0: hw csum failure > [ 433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19 > [ 433.727406] Hardware name: System manufacturer System Product Name/P6T > DELUXE V2, BIOS 120212/22/2010 > [ 433.727407] Call Trace: > [ 433.727409] > [ 433.727415] dump_stack+0x46/0x5b > [ 433.727419] __skb_checksum_complete+0xb0/0xc0 > [ 433.727423] tcp_v4_rcv+0x528/0xb60 > [ 433.727426] ? ipt_do_table+0x2d0/0x400 > [ 433.727429] ip_local_deliver_finish+0x5a/0x110 > [ 433.727430] ip_local_deliver+0xe1/0xf0 > [ 433.727431] ? ip_sublist_rcv_finish+0x60/0x60 > [ 433.727432] ip_rcv+0xca/0xe0 > [ 433.727434] ? ip_rcv_finish_core.isra.0+0x300/0x300 > [ 433.727436] __netif_receive_skb_one_core+0x4b/0x70 > [ 433.727438] netif_receive_skb_internal+0x4e/0x130 > [ 433.727439] napi_gro_receive+0x6a/0x80 > [ 433.727442] sky2_poll+0x707/0xd20 > [ 433.727446] ? rcu_check_callbacks+0x1b4/0x900 > [ 433.727447] net_rx_action+0x237/0x380 > [ 433.727449] __do_softirq+0xdc/0x1e0 > [ 433.727452] irq_exit+0xa9/0xb0 > [ 433.727453] do_IRQ+0x45/0xc0 > [ 433.727455] common_interrupt+0xf/0xf > [ 433.727456] > [ 433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200 > [ 433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 > e1 > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48 > [ 433.727462] RSP: :c90a3e98 EFLAGS: 0282 ORIG_RAX: > ffde > [ 433.727463] RAX: 880237b1f280 RBX: 0004 RCX: > 001f > [ 433.727464] RDX: 20c49ba5e353f7cf RSI: 2fe419c1 RDI: > > [ 433.727465] RBP: 880237b263a0 R08: 0714 R09: > 00650512105d > [ 433.727465] R10: R11: 0342 R12: > 0064fc2a8b1c > [ 433.727466] R13: 0064fc25b35f R14: 0004 R15: > 8204af20 > [ 433.727468] ? cpuidle_enter_state+0x119/0x200 > [ 433.727471] do_idle+0x1bf/0x200 > [ 433.727473] cpu_startup_entry+0x6a/0x70 > [ 433.727475] start_secondary+0x17f/0x1c0 > [ 433.727476] secondary_startup_64+0xa4/0xb0 > [ 441.662954] eth0: hw csum failure > [ 441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19 > [ 441.662960] Hardware name: System manufacturer System Product Name/P6T > DELUXE V2, BIOS 120212/22/2010 > [ 441.662960] Call Trace: > [ 441.662963] > [ 441.662968] dump_stack+0x46/0x5b > [ 441.662972] __skb_checksum_complete+0xb0/0xc0 > [ 441.662975] tcp_v4_rcv+0x528/0xb60 > [ 441.662979] ? ipt_do_table+0x2d0/0x400 > [ 441.662981] ip_local_deliver_finish+0x5a/0x110 > [ 441.662983] ip_local_deliver+0xe1/0xf0 > [ 441.662985] ? ip_sublist_rcv_finish+0x60/0x60 > [ 441.662986] ip_rcv+0xca/0xe0 > [ 441.662988] ? ip_rcv_finish_core.isra.0+0x300/0x300 > [ 441.662990] __netif_receive_skb_one_core+0x4b/0x70 > [ 441.662993] netif_receive_skb_internal+0x4e/0x130 > [ 441.662994] napi_gro_receive+0x6a/0x80 > [ 441.662998] sky2_poll+0x707/0xd20 > [ 441.663000] net_rx_action+0x237/0x380 > [ 441.663002] __do_softirq+0xdc/0x1e0 > [ 441.663005] irq_exit+0xa9/0xb0 > [ 441.663007] do_IRQ+0x45/0xc0 > [ 441.663009] common_interrupt+0xf/0xf > [ 441.663010] > [ 441.663012] RIP: 0010:merge+0x22/0xb0 > [ 441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53 > 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 > c9 > 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48 > [ 441.663015] RSP: 0018:c990b988 EFLAGS: 0246 ORIG_RAX: > ffde > [ 441.663017] RAX: RBX: 88021ab2d408 RCX: > 88021ab2d408 > [ 441.663018] RDX: 88021ab2d388 RSI: a021c440 RDI: > > [ 441.663019] RBP: 88021ab2d388 R08: 5ecf R09: > 8500 > [ 441.663020] R10: ea000877ec00 R11: 880236803500 R12: > a021c440 > [ 441.663021] R13: 88021ab2d448 R14: 0004 R15: >
Fw: [Bug 201423] New: eth0: hw csum failure
Begin forwarded message: Date: Sun, 14 Oct 2018 10:42:48 + From: bugzilla-dae...@bugzilla.kernel.org To: step...@networkplumber.org Subject: [Bug 201423] New: eth0: hw csum failure https://bugzilla.kernel.org/show_bug.cgi?id=201423 Bug ID: 201423 Summary: eth0: hw csum failure Product: Networking Version: 2.5 Kernel Version: 4.19.0-rc7 Hardware: Intel OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Other Assignee: step...@networkplumber.org Reporter: ross...@inwind.it Regression: No I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet ports. I get the following error message: [ 433.727397] eth0: hw csum failure [ 433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19 [ 433.727406] Hardware name: System manufacturer System Product Name/P6T DELUXE V2, BIOS 120212/22/2010 [ 433.727407] Call Trace: [ 433.727409] [ 433.727415] dump_stack+0x46/0x5b [ 433.727419] __skb_checksum_complete+0xb0/0xc0 [ 433.727423] tcp_v4_rcv+0x528/0xb60 [ 433.727426] ? ipt_do_table+0x2d0/0x400 [ 433.727429] ip_local_deliver_finish+0x5a/0x110 [ 433.727430] ip_local_deliver+0xe1/0xf0 [ 433.727431] ? ip_sublist_rcv_finish+0x60/0x60 [ 433.727432] ip_rcv+0xca/0xe0 [ 433.727434] ? ip_rcv_finish_core.isra.0+0x300/0x300 [ 433.727436] __netif_receive_skb_one_core+0x4b/0x70 [ 433.727438] netif_receive_skb_internal+0x4e/0x130 [ 433.727439] napi_gro_receive+0x6a/0x80 [ 433.727442] sky2_poll+0x707/0xd20 [ 433.727446] ? rcu_check_callbacks+0x1b4/0x900 [ 433.727447] net_rx_action+0x237/0x380 [ 433.727449] __do_softirq+0xdc/0x1e0 [ 433.727452] irq_exit+0xa9/0xb0 [ 433.727453] do_IRQ+0x45/0xc0 [ 433.727455] common_interrupt+0xf/0xf [ 433.727456] [ 433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200 [ 433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48 [ 433.727462] RSP: :c90a3e98 EFLAGS: 0282 ORIG_RAX: ffde [ 433.727463] RAX: 880237b1f280 RBX: 0004 RCX: 001f [ 433.727464] RDX: 20c49ba5e353f7cf RSI: 2fe419c1 RDI: [ 433.727465] RBP: 880237b263a0 R08: 0714 R09: 00650512105d [ 433.727465] R10: R11: 0342 R12: 0064fc2a8b1c [ 433.727466] R13: 0064fc25b35f R14: 0004 R15: 8204af20 [ 433.727468] ? cpuidle_enter_state+0x119/0x200 [ 433.727471] do_idle+0x1bf/0x200 [ 433.727473] cpu_startup_entry+0x6a/0x70 [ 433.727475] start_secondary+0x17f/0x1c0 [ 433.727476] secondary_startup_64+0xa4/0xb0 [ 441.662954] eth0: hw csum failure [ 441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19 [ 441.662960] Hardware name: System manufacturer System Product Name/P6T DELUXE V2, BIOS 120212/22/2010 [ 441.662960] Call Trace: [ 441.662963] [ 441.662968] dump_stack+0x46/0x5b [ 441.662972] __skb_checksum_complete+0xb0/0xc0 [ 441.662975] tcp_v4_rcv+0x528/0xb60 [ 441.662979] ? ipt_do_table+0x2d0/0x400 [ 441.662981] ip_local_deliver_finish+0x5a/0x110 [ 441.662983] ip_local_deliver+0xe1/0xf0 [ 441.662985] ? ip_sublist_rcv_finish+0x60/0x60 [ 441.662986] ip_rcv+0xca/0xe0 [ 441.662988] ? ip_rcv_finish_core.isra.0+0x300/0x300 [ 441.662990] __netif_receive_skb_one_core+0x4b/0x70 [ 441.662993] netif_receive_skb_internal+0x4e/0x130 [ 441.662994] napi_gro_receive+0x6a/0x80 [ 441.662998] sky2_poll+0x707/0xd20 [ 441.663000] net_rx_action+0x237/0x380 [ 441.663002] __do_softirq+0xdc/0x1e0 [ 441.663005] irq_exit+0xa9/0xb0 [ 441.663007] do_IRQ+0x45/0xc0 [ 441.663009] common_interrupt+0xf/0xf [ 441.663010] [ 441.663012] RIP: 0010:merge+0x22/0xb0 [ 441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48 [ 441.663015] RSP: 0018:c990b988 EFLAGS: 0246 ORIG_RAX: ffde [ 441.663017] RAX: RBX: 88021ab2d408 RCX: 88021ab2d408 [ 441.663018] RDX: 88021ab2d388 RSI: a021c440 RDI: [ 441.663019] RBP: 88021ab2d388 R08: 5ecf R09: 8500 [ 441.663020] R10: ea000877ec00 R11: 880236803500 R12: a021c440 [ 441.663021] R13: 88021ab2d448 R14: 0004 R15: c990b9e0 [ 441.663048] ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon] [ 441.663063] ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon] [ 441.663065] ? merge+0x57/0xb0 [ 441.663080] ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon] [ 441.663082]