Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-30 Thread Fabio Rossi
> On 10/16/2018 06:00 AM, Eric Dumazet wrote:
> > On Mon, Oct 15, 2018 at 11:30 PM Andre Tomt  wrote:
> >>
> >> On 15.10.2018 17:41, Eric Dumazet wrote:
> >>> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
>  Something is changed between 4.17.12 and 4.18, after bisecting the 
>  problem I
>  got the following first bad commit:
> 
>  commit 88078d98d1bb085d72af8437707279e203524fa5
>  Author: Eric Dumazet 
>  Date:   Wed Apr 18 11:43:15 2018 -0700
> 
>   net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> 
>   After working on IP defragmentation lately, I found that some large
>   packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>   zero paddings on the last (small) fragment.
> 
>   While removing the padding with pskb_trim_rcsum(), we set 
>  skb->ip_summed
>   to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>   fragments had CHECKSUM_COMPLETE set.
> 
>   We can instead compute the checksum of the part we are trimming,
>   usually smaller than the part we keep.
> 
>   Signed-off-by: Eric Dumazet 
>   Signed-off-by: David S. Miller 
> 
> >>>
> >>> Thanks for bisecting !
> >>>
> >>> This commit is known to expose some NIC/driver bugs.
> >>>
> >>> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> >>> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> >>>
> >>> I assume SKY2_HW_NEW_LE is not set on your NIC ?
> >>>
> >>
> >> I've seen similar on several systems with mlx4 cards when using 4.18.x -
> >> that is hw csum failure followed by some backtrace.
> >>
> >> Only seems to happen on systems dealing with quite a bit of UDP.
> >>
> > 
> > Strange, because mlx4 on IPv6+UDP should not use CHECKSUM_COMPLETE,
> > but CHECKSUM_UNNECESSARY
> > 
> > I would be nice to track this a bit further, maybe by providing the
> > full packet content.
> > 
> >> Example from 4.18.10:
> >>> [635607.740574] p0xe0: hw csum failure
> >>> [635607.740598] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> >>> [635607.740599] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 
> >>> 2.0b 05/02/2017
> >>> [635607.740599] Call Trace:
> >>> [635607.740602]  
> >>> [635607.740611]  dump_stack+0x5c/0x7b
> >>> [635607.740617]  __skb_gro_checksum_complete+0x9a/0xa0
> >>> [635607.740621]  udp6_gro_receive+0x211/0x290
> >>> [635607.740624]  ipv6_gro_receive+0x1a8/0x390
> >>> [635607.740627]  dev_gro_receive+0x33e/0x550
> >>> [635607.740628]  napi_gro_frags+0xa2/0x210
> >>> [635607.740635]  mlx4_en_process_rx_cq+0xa01/0xb40 [mlx4_en]
> >>> [635607.740648]  ? mlx4_cq_completion+0x23/0x70 [mlx4_core]
> >>> [635607.740654]  ? mlx4_eq_int+0x373/0xc80 [mlx4_core]
> >>> [635607.740657]  mlx4_en_poll_rx_cq+0x55/0xf0 [mlx4_en]
> >>> [635607.740658]  net_rx_action+0xe0/0x2e0
> >>> [635607.740662]  __do_softirq+0xd8/0x2e5
> >>> [635607.740666]  irq_exit+0xb4/0xc0
> >>> [635607.740667]  do_IRQ+0x85/0xd0
> >>> [635607.740670]  common_interrupt+0xf/0xf
> >>> [635607.740671]  
> >>> [635607.740675] RIP: 0010:cpuidle_enter_state+0xb4/0x2a0
> >>> [635607.740675] Code: 31 ff e8 df a6 ba ff 45 84 f6 74 17 9c 58 0f 1f 44 
> >>> 00 00 f6 c4 02 0f 85 d8 01 00 00 31 ff e8 13 81 bf ff fb 66 0f 1f 44 00 
> >>> 00 <4c> 29 fb 48 ba cf f7 53 e3 a5 9b c4 20 48 89 d8 48 c1 fb 3f 48 f7
> >>> [635607.740701] RSP: 0018:a5c206353ea8 EFLAGS: 0246 ORIG_RAX: 
> >>> ffd9
> >>> [635607.740703] RAX: 8d72ffd20f00 RBX: 00024214f597c5b0 RCX: 
> >>> 001f
> >>> [635607.740703] RDX: 00024214f597c5b0 RSI: 00020780 RDI: 
> >>> 
> >>> [635607.740704] RBP: 0004 R08: 002542bfbefa99fa R09: 
> >>> 
> >>> [635607.740705] R10: a5c206353e88 R11: 00c5 R12: 
> >>> af0aaf78
> >>> [635607.740706] R13: 8d72ffd297d8 R14:  R15: 
> >>> 00024214f58c2ed5
> >>> [635607.740709]  ? cpuidle_enter_state+0x91/0x2a0
> >>> [635607.740712]  do_idle+0x1d0/0x240
> >>> [635607.740715]  cpu_startup_entry+0x5f/0x70
> >>> [635607.740719]  start_secondary+0x185/0x1a0
> >>> [635607.740722]  secondary_startup_64+0xa5/0xb0
> >>> [635607.740731] p0xe0: hw csum failure
> >>> [635607.740745] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.18.0-1 #1
> >>> [635607.740746] Hardware name: Supermicro Super Server/X10SRL-F, BIOS 
> >>> 2.0b 05/02/2017
> >>> [635607.740746] Call Trace:
> >>> [635607.740747]  
> >>> [635607.740750]  dump_stack+0x5c/0x7b
> >>> [635607.740755]  __skb_checksum_complete+0xb8/0xd0
> >>> [635607.740760]  __udp6_lib_rcv+0xa6b/0xa70
> >>> [635607.740767]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> >>> [635607.740770]  ? nft_do_chain_inet+0x7a/0xd0 [nf_tables]
> >>> [635607.740774]  ip6_input_finish+0xc0/0x460
> >>> [635607.740776]  ip6_input+0x2b/0x90
> >>> [635607.740778]  ? ip6_rcv_finish+0x110/0x110
> >>> [635607.740780]  ipv6_rcv+0x2cd/0x4b0
> >>> 

Re: Fw: [Bug 201423] New: eth0: hw csum failure

2018-10-15 Thread Fabio Rossi



On 15 October 2018 17:41:47 CEST, Eric Dumazet  wrote:
>On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> wrote:
>>
>>
>>
>> Begin forwarded message:
>>
>> Date: Sun, 14 Oct 2018 10:42:48 +
>> From: bugzilla-dae...@bugzilla.kernel.org
>> To: step...@networkplumber.org
>> Subject: [Bug 201423] New: eth0: hw csum failure
>>
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=201423
>>
>> Bug ID: 201423
>>Summary: eth0: hw csum failure
>>Product: Networking
>>Version: 2.5
>> Kernel Version: 4.19.0-rc7
>>   Hardware: Intel
>> OS: Linux
>>   Tree: Mainline
>> Status: NEW
>>   Severity: normal
>>   Priority: P1
>>  Component: Other
>>   Assignee: step...@networkplumber.org
>>   Reporter: ross...@inwind.it
>> Regression: No
>>
>> I have a P6T DELUXE V2 motherboard and using the sky2 driver for the
>ethernet
>> ports. I get the following error message:
>>
>> [  433.727397] eth0: hw csum failure
>> [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7
>#19
>> [  433.727406] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 120212/22/2010
>> [  433.727407] Call Trace:
>> [  433.727409]  
>> [  433.727415]  dump_stack+0x46/0x5b
>> [  433.727419]  __skb_checksum_complete+0xb0/0xc0
>> [  433.727423]  tcp_v4_rcv+0x528/0xb60
>> [  433.727426]  ? ipt_do_table+0x2d0/0x400
>> [  433.727429]  ip_local_deliver_finish+0x5a/0x110
>> [  433.727430]  ip_local_deliver+0xe1/0xf0
>> [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  433.727432]  ip_rcv+0xca/0xe0
>> [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
>> [  433.727438]  netif_receive_skb_internal+0x4e/0x130
>> [  433.727439]  napi_gro_receive+0x6a/0x80
>> [  433.727442]  sky2_poll+0x707/0xd20
>> [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
>> [  433.727447]  net_rx_action+0x237/0x380
>> [  433.727449]  __do_softirq+0xdc/0x1e0
>> [  433.727452]  irq_exit+0xa9/0xb0
>> [  433.727453]  do_IRQ+0x45/0xc0
>> [  433.727455]  common_interrupt+0xf/0xf
>> [  433.727456]  
>> [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
>> [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e
>e8 d1 8f
>> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20
><4c> 89 e1
>> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
>> [  433.727462] RSP: :c90a3e98 EFLAGS: 0282 ORIG_RAX:
>> ffde
>> [  433.727463] RAX: 880237b1f280 RBX: 0004 RCX:
>> 001f
>> [  433.727464] RDX: 20c49ba5e353f7cf RSI: 2fe419c1 RDI:
>> 
>> [  433.727465] RBP: 880237b263a0 R08: 0714 R09:
>> 00650512105d
>> [  433.727465] R10:  R11: 0342 R12:
>> 0064fc2a8b1c
>> [  433.727466] R13: 0064fc25b35f R14: 0004 R15:
>> 8204af20
>> [  433.727468]  ? cpuidle_enter_state+0x119/0x200
>> [  433.727471]  do_idle+0x1bf/0x200
>> [  433.727473]  cpu_startup_entry+0x6a/0x70
>> [  433.727475]  start_secondary+0x17f/0x1c0
>> [  433.727476]  secondary_startup_64+0xa4/0xb0
>> [  441.662954] eth0: hw csum failure
>> [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted
>4.19.0-rc7 #19
>> [  441.662960] Hardware name: System manufacturer System Product
>Name/P6T
>> DELUXE V2, BIOS 120212/22/2010
>> [  441.662960] Call Trace:
>> [  441.662963]  
>> [  441.662968]  dump_stack+0x46/0x5b
>> [  441.662972]  __skb_checksum_complete+0xb0/0xc0
>> [  441.662975]  tcp_v4_rcv+0x528/0xb60
>> [  441.662979]  ? ipt_do_table+0x2d0/0x400
>> [  441.662981]  ip_local_deliver_finish+0x5a/0x110
>> [  441.662983]  ip_local_deliver+0xe1/0xf0
>> [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
>> [  441.662986]  ip_rcv+0xca/0xe0
>> [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
>> [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
>> [  441.662993]  netif_receive_skb_internal+0x4e/0x130
>> [  441.662994]  napi_gro_receive+0x6a/0x80
>> [  441.662998]  sky2_poll+0x707/0xd20
>> [  441.663000]  net_rx_action+0x237/0x380
>> [  441.663002]  __do_softirq+0xdc/0x1e0
>> [  441.663005]  irq_exit+0xa9/0xb0
>> [  441.663007]  do_IRQ+0x45/0xc0
>> [  441.663009]  common_interrupt+0xf/0xf
>> [  441.663010]  
>> [  441.663012] RIP: 0010:merge+0x22/0xb0
>> [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48
>89 d5 53
>> 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0
><48> 85 c9
>> 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
>> [  441.663015] RSP: 0018:c990b988 EFLAGS: 0246 ORIG_RAX:
>> ffde
>> [  441.663017] RAX:  RBX: 88021ab2d408 RCX:
>> 88021ab2d408
>> [  441.663018] RDX: 88021ab2d388 RSI: a021c440 RDI:
>> 
>> [  441.663019] RBP: 88021ab2d388 R08: 5ecf