Re: Kernel BUG_ON in stable 4.8

2016-11-22 Thread Francis Deslauriers
This patch fixes our issue on v4.8.10. Thank you!

2016-11-22 14:22 GMT-05:00 Eric Dumazet :
> On Tue, Nov 22, 2016 at 9:57 AM, Eric Dumazet  wrote:
>> On Tue, Nov 22, 2016 at 9:44 AM, Mathieu Desnoyers
>>  wrote:
>>> - On Nov 22, 2016, at 12:01 PM, Francis Deslauriers 
>>> francis.deslauri...@efficios.com wrote:
>>>
 Hi Mathieu,

 Here is a description of the kernel BUG_ON I have encountered. This bug was
 triggered by our continuous integration system tracking the stable 
 branches. I
 was only able to reproduce it on our Lava baremetal x86_64 worker. Running 
 the
 same kernel image on our KVM worker did not trigger the bug.

 The bug occurs at boot time and I believe it's during the configuration of 
 one
 of the network cards.

 See attached the .config used.
 See attached the dmesg output of the crash containing the kernel panic
 output(around line 885 of dmesg.txt)
>>>
>>> Hi guys,
>>>
>>> Upstream commit 34fad54c2 "net: __skb_flow_dissect() must cap its return 
>>> value"
>>> triggers this BUG_ON at boot up on our testing machines. We have observed 
>>> the
>>> BUG_ON on v4.9-rc6, as well as stable kernels v4.8.10 and v4.4.34.
>>>
>>> Relevant bits:
>>>
>>> [   16.841793] kernel BUG at ./include/linux/skbuff.h:1927!
>>> [   16.847101] invalid opcode:  [#1] SMP
>>> [   16.851101] Modules linked in:
>>> [   16.854172] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.10 #1
>>> [   16.860081] Hardware name: Supermicro SYS-6018R-TDW/X10DDW-i, BIOS 1.0a 
>>> 01/14/2015
>>> [   16.867637] task: 8220d500 task.stack: 8220
>>> [   16.873550] RIP: 0010:[]  [] 
>>> eth_type_trans+0xc9/0x110
>>> [   16.881826] RSP: :88103fc03df0  EFLAGS: 00010297
>>> [   16.887130] RAX: 0148 RBX: 881037b2b5c0 RCX: 
>>> 1073
>>> [   16.894254] RDX: 88103221fdc0 RSI: 881038358000 RDI: 
>>> 881037c5bb00
>>> [   16.901377] RBP: 88103fc03df8 R08: 0001 R09: 
>>> 0800
>>> [   16.908503] R10: 88103221fec0 R11: ea0040de7780 R12: 
>>> 881037c5bb00
>>> [   16.915625] R13: 881037458000 R14: 0156 R15: 
>>> 881038358000
>>> [   16.922751] FS:  () GS:88103fc0() 
>>> knlGS:
>>> [   16.930829] CS:  0010 DS:  ES:  CR0: 80050033
>>> [   16.936565] CR2: 88207000 CR3: 02206000 CR4: 
>>> 001406f0
>>> [   16.943689] Stack:
>>> [   16.945702]  815c098f 88103fc03e90 8181c651 
>>> 88103fc0dc20
>>> [   16.953153]   0020 0002 
>>> 88103fc03e38
>>> [   16.960607]  ea0040de7780 ea08 0001 
>>> ea0040de7780
>>> [   16.968061] Call Trace:
>>> [   16.970507]  
>>> [   16.972432]  [] ? 
>>> swiotlb_sync_single_for_device+0xf/0x20
>>> [   16.979575]  [] igb_poll+0x691/0xe60
>>> [   16.984714]  [] net_rx_action+0x1bb/0x2f0
>>> [   16.990291]  [] __do_softirq+0xf6/0x280
>>> [   16.995689]  [] irq_exit+0xdc/0xf0
>>> [   17.000648]  [] do_IRQ+0x54/0xd0
>>> [   17.005433]  [] common_interrupt+0x8c/0x8c
>>> [   17.011083]  
>>> [   17.013014]  [] ? mwait_idle+0x76/0x170
>>> [   17.018596]  [] arch_cpu_idle+0xf/0x20
>>> [   17.023907]  [] default_idle_call+0x2a/0x40
>>> [   17.029648]  [] cpu_startup_entry+0x29a/0x300
>>> [   17.035561]  [] rest_init+0x77/0x80
>>> [   17.040608]  [] start_kernel+0x40b/0x418
>>> [   17.046091]  [] ? early_idt_handler_array+0x120/0x120
>>> [   17.052702]  [] x86_64_start_reservations+0x2a/0x2c
>>> [   17.059132]  [] x86_64_start_kernel+0x13b/0x14a
>>> [   17.065215] Code: 00 04 00 00 c9 c3 48 33 86 58 03 00 00 48 c1 e0 10 48 
>>> 85 c0 0f b6 87 80 00 00 00 75 25 83 e0 f8 83 c8 01 88 87 80 00 00 00 eb 9c 
>>> <0f> 0b 0f b6 87 80 00 00 00 83 e0 f8 83 c8 03 88 87 80 00 00 00
>>> [   17.085165] RIP  [] eth_type_trans+0xc9/0x110
>>> [   17.091094]  RSP 
>>> [   17.094586] ---[ end trace e233c88f3b369632 ]---
>>> [   17.101432] Kernel panic - not syncing: Fatal exception in interrupt
>>> [   17.107790] Kernel Offset: disabled
>>> [   17.113484] ---[ end Kernel panic - not syncing: Fatal exception in 
>>> interrupt
>>>
>>> Do you have clues on what is going on ?
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>
>> Under investigation. Look around other threads. Thanks.
>
> Probable fix is : https://patchwork.ozlabs.org/patch/697891/



-- 
Francis Deslauriers
Software developer
EfficiOS inc.


Re: Kernel BUG_ON in stable 4.8

2016-11-22 Thread Francis Deslauriers
This patch fixes our issue on v4.8.10. Thank you!

2016-11-22 14:22 GMT-05:00 Eric Dumazet :
> On Tue, Nov 22, 2016 at 9:57 AM, Eric Dumazet  wrote:
>> On Tue, Nov 22, 2016 at 9:44 AM, Mathieu Desnoyers
>>  wrote:
>>> - On Nov 22, 2016, at 12:01 PM, Francis Deslauriers 
>>> francis.deslauri...@efficios.com wrote:
>>>
 Hi Mathieu,

 Here is a description of the kernel BUG_ON I have encountered. This bug was
 triggered by our continuous integration system tracking the stable 
 branches. I
 was only able to reproduce it on our Lava baremetal x86_64 worker. Running 
 the
 same kernel image on our KVM worker did not trigger the bug.

 The bug occurs at boot time and I believe it's during the configuration of 
 one
 of the network cards.

 See attached the .config used.
 See attached the dmesg output of the crash containing the kernel panic
 output(around line 885 of dmesg.txt)
>>>
>>> Hi guys,
>>>
>>> Upstream commit 34fad54c2 "net: __skb_flow_dissect() must cap its return 
>>> value"
>>> triggers this BUG_ON at boot up on our testing machines. We have observed 
>>> the
>>> BUG_ON on v4.9-rc6, as well as stable kernels v4.8.10 and v4.4.34.
>>>
>>> Relevant bits:
>>>
>>> [   16.841793] kernel BUG at ./include/linux/skbuff.h:1927!
>>> [   16.847101] invalid opcode:  [#1] SMP
>>> [   16.851101] Modules linked in:
>>> [   16.854172] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.10 #1
>>> [   16.860081] Hardware name: Supermicro SYS-6018R-TDW/X10DDW-i, BIOS 1.0a 
>>> 01/14/2015
>>> [   16.867637] task: 8220d500 task.stack: 8220
>>> [   16.873550] RIP: 0010:[]  [] 
>>> eth_type_trans+0xc9/0x110
>>> [   16.881826] RSP: :88103fc03df0  EFLAGS: 00010297
>>> [   16.887130] RAX: 0148 RBX: 881037b2b5c0 RCX: 
>>> 1073
>>> [   16.894254] RDX: 88103221fdc0 RSI: 881038358000 RDI: 
>>> 881037c5bb00
>>> [   16.901377] RBP: 88103fc03df8 R08: 0001 R09: 
>>> 0800
>>> [   16.908503] R10: 88103221fec0 R11: ea0040de7780 R12: 
>>> 881037c5bb00
>>> [   16.915625] R13: 881037458000 R14: 0156 R15: 
>>> 881038358000
>>> [   16.922751] FS:  () GS:88103fc0() 
>>> knlGS:
>>> [   16.930829] CS:  0010 DS:  ES:  CR0: 80050033
>>> [   16.936565] CR2: 88207000 CR3: 02206000 CR4: 
>>> 001406f0
>>> [   16.943689] Stack:
>>> [   16.945702]  815c098f 88103fc03e90 8181c651 
>>> 88103fc0dc20
>>> [   16.953153]   0020 0002 
>>> 88103fc03e38
>>> [   16.960607]  ea0040de7780 ea08 0001 
>>> ea0040de7780
>>> [   16.968061] Call Trace:
>>> [   16.970507]  
>>> [   16.972432]  [] ? 
>>> swiotlb_sync_single_for_device+0xf/0x20
>>> [   16.979575]  [] igb_poll+0x691/0xe60
>>> [   16.984714]  [] net_rx_action+0x1bb/0x2f0
>>> [   16.990291]  [] __do_softirq+0xf6/0x280
>>> [   16.995689]  [] irq_exit+0xdc/0xf0
>>> [   17.000648]  [] do_IRQ+0x54/0xd0
>>> [   17.005433]  [] common_interrupt+0x8c/0x8c
>>> [   17.011083]  
>>> [   17.013014]  [] ? mwait_idle+0x76/0x170
>>> [   17.018596]  [] arch_cpu_idle+0xf/0x20
>>> [   17.023907]  [] default_idle_call+0x2a/0x40
>>> [   17.029648]  [] cpu_startup_entry+0x29a/0x300
>>> [   17.035561]  [] rest_init+0x77/0x80
>>> [   17.040608]  [] start_kernel+0x40b/0x418
>>> [   17.046091]  [] ? early_idt_handler_array+0x120/0x120
>>> [   17.052702]  [] x86_64_start_reservations+0x2a/0x2c
>>> [   17.059132]  [] x86_64_start_kernel+0x13b/0x14a
>>> [   17.065215] Code: 00 04 00 00 c9 c3 48 33 86 58 03 00 00 48 c1 e0 10 48 
>>> 85 c0 0f b6 87 80 00 00 00 75 25 83 e0 f8 83 c8 01 88 87 80 00 00 00 eb 9c 
>>> <0f> 0b 0f b6 87 80 00 00 00 83 e0 f8 83 c8 03 88 87 80 00 00 00
>>> [   17.085165] RIP  [] eth_type_trans+0xc9/0x110
>>> [   17.091094]  RSP 
>>> [   17.094586] ---[ end trace e233c88f3b369632 ]---
>>> [   17.101432] Kernel panic - not syncing: Fatal exception in interrupt
>>> [   17.107790] Kernel Offset: disabled
>>> [   17.113484] ---[ end Kernel panic - not syncing: Fatal exception in 
>>> interrupt
>>>
>>> Do you have clues on what is going on ?
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>
>> Under investigation. Look around other threads. Thanks.
>
> Probable fix is : https://patchwork.ozlabs.org/patch/697891/



-- 
Francis Deslauriers
Software developer
EfficiOS inc.


Re: Kernel BUG_ON in stable 4.8

2016-11-22 Thread Eric Dumazet
On Tue, Nov 22, 2016 at 9:57 AM, Eric Dumazet  wrote:
> On Tue, Nov 22, 2016 at 9:44 AM, Mathieu Desnoyers
>  wrote:
>> - On Nov 22, 2016, at 12:01 PM, Francis Deslauriers 
>> francis.deslauri...@efficios.com wrote:
>>
>>> Hi Mathieu,
>>>
>>> Here is a description of the kernel BUG_ON I have encountered. This bug was
>>> triggered by our continuous integration system tracking the stable 
>>> branches. I
>>> was only able to reproduce it on our Lava baremetal x86_64 worker. Running 
>>> the
>>> same kernel image on our KVM worker did not trigger the bug.
>>>
>>> The bug occurs at boot time and I believe it's during the configuration of 
>>> one
>>> of the network cards.
>>>
>>> See attached the .config used.
>>> See attached the dmesg output of the crash containing the kernel panic
>>> output(around line 885 of dmesg.txt)
>>
>> Hi guys,
>>
>> Upstream commit 34fad54c2 "net: __skb_flow_dissect() must cap its return 
>> value"
>> triggers this BUG_ON at boot up on our testing machines. We have observed the
>> BUG_ON on v4.9-rc6, as well as stable kernels v4.8.10 and v4.4.34.
>>
>> Relevant bits:
>>
>> [   16.841793] kernel BUG at ./include/linux/skbuff.h:1927!
>> [   16.847101] invalid opcode:  [#1] SMP
>> [   16.851101] Modules linked in:
>> [   16.854172] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.10 #1
>> [   16.860081] Hardware name: Supermicro SYS-6018R-TDW/X10DDW-i, BIOS 1.0a 
>> 01/14/2015
>> [   16.867637] task: 8220d500 task.stack: 8220
>> [   16.873550] RIP: 0010:[]  [] 
>> eth_type_trans+0xc9/0x110
>> [   16.881826] RSP: :88103fc03df0  EFLAGS: 00010297
>> [   16.887130] RAX: 0148 RBX: 881037b2b5c0 RCX: 
>> 1073
>> [   16.894254] RDX: 88103221fdc0 RSI: 881038358000 RDI: 
>> 881037c5bb00
>> [   16.901377] RBP: 88103fc03df8 R08: 0001 R09: 
>> 0800
>> [   16.908503] R10: 88103221fec0 R11: ea0040de7780 R12: 
>> 881037c5bb00
>> [   16.915625] R13: 881037458000 R14: 0156 R15: 
>> 881038358000
>> [   16.922751] FS:  () GS:88103fc0() 
>> knlGS:
>> [   16.930829] CS:  0010 DS:  ES:  CR0: 80050033
>> [   16.936565] CR2: 88207000 CR3: 02206000 CR4: 
>> 001406f0
>> [   16.943689] Stack:
>> [   16.945702]  815c098f 88103fc03e90 8181c651 
>> 88103fc0dc20
>> [   16.953153]   0020 0002 
>> 88103fc03e38
>> [   16.960607]  ea0040de7780 ea08 0001 
>> ea0040de7780
>> [   16.968061] Call Trace:
>> [   16.970507]  
>> [   16.972432]  [] ? 
>> swiotlb_sync_single_for_device+0xf/0x20
>> [   16.979575]  [] igb_poll+0x691/0xe60
>> [   16.984714]  [] net_rx_action+0x1bb/0x2f0
>> [   16.990291]  [] __do_softirq+0xf6/0x280
>> [   16.995689]  [] irq_exit+0xdc/0xf0
>> [   17.000648]  [] do_IRQ+0x54/0xd0
>> [   17.005433]  [] common_interrupt+0x8c/0x8c
>> [   17.011083]  
>> [   17.013014]  [] ? mwait_idle+0x76/0x170
>> [   17.018596]  [] arch_cpu_idle+0xf/0x20
>> [   17.023907]  [] default_idle_call+0x2a/0x40
>> [   17.029648]  [] cpu_startup_entry+0x29a/0x300
>> [   17.035561]  [] rest_init+0x77/0x80
>> [   17.040608]  [] start_kernel+0x40b/0x418
>> [   17.046091]  [] ? early_idt_handler_array+0x120/0x120
>> [   17.052702]  [] x86_64_start_reservations+0x2a/0x2c
>> [   17.059132]  [] x86_64_start_kernel+0x13b/0x14a
>> [   17.065215] Code: 00 04 00 00 c9 c3 48 33 86 58 03 00 00 48 c1 e0 10 48 
>> 85 c0 0f b6 87 80 00 00 00 75 25 83 e0 f8 83 c8 01 88 87 80 00 00 00 eb 9c 
>> <0f> 0b 0f b6 87 80 00 00 00 83 e0 f8 83 c8 03 88 87 80 00 00 00
>> [   17.085165] RIP  [] eth_type_trans+0xc9/0x110
>> [   17.091094]  RSP 
>> [   17.094586] ---[ end trace e233c88f3b369632 ]---
>> [   17.101432] Kernel panic - not syncing: Fatal exception in interrupt
>> [   17.107790] Kernel Offset: disabled
>> [   17.113484] ---[ end Kernel panic - not syncing: Fatal exception in 
>> interrupt
>>
>> Do you have clues on what is going on ?
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>
> Under investigation. Look around other threads. Thanks.

Probable fix is : https://patchwork.ozlabs.org/patch/697891/


Re: Kernel BUG_ON in stable 4.8

2016-11-22 Thread Eric Dumazet
On Tue, Nov 22, 2016 at 9:57 AM, Eric Dumazet  wrote:
> On Tue, Nov 22, 2016 at 9:44 AM, Mathieu Desnoyers
>  wrote:
>> - On Nov 22, 2016, at 12:01 PM, Francis Deslauriers 
>> francis.deslauri...@efficios.com wrote:
>>
>>> Hi Mathieu,
>>>
>>> Here is a description of the kernel BUG_ON I have encountered. This bug was
>>> triggered by our continuous integration system tracking the stable 
>>> branches. I
>>> was only able to reproduce it on our Lava baremetal x86_64 worker. Running 
>>> the
>>> same kernel image on our KVM worker did not trigger the bug.
>>>
>>> The bug occurs at boot time and I believe it's during the configuration of 
>>> one
>>> of the network cards.
>>>
>>> See attached the .config used.
>>> See attached the dmesg output of the crash containing the kernel panic
>>> output(around line 885 of dmesg.txt)
>>
>> Hi guys,
>>
>> Upstream commit 34fad54c2 "net: __skb_flow_dissect() must cap its return 
>> value"
>> triggers this BUG_ON at boot up on our testing machines. We have observed the
>> BUG_ON on v4.9-rc6, as well as stable kernels v4.8.10 and v4.4.34.
>>
>> Relevant bits:
>>
>> [   16.841793] kernel BUG at ./include/linux/skbuff.h:1927!
>> [   16.847101] invalid opcode:  [#1] SMP
>> [   16.851101] Modules linked in:
>> [   16.854172] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.10 #1
>> [   16.860081] Hardware name: Supermicro SYS-6018R-TDW/X10DDW-i, BIOS 1.0a 
>> 01/14/2015
>> [   16.867637] task: 8220d500 task.stack: 8220
>> [   16.873550] RIP: 0010:[]  [] 
>> eth_type_trans+0xc9/0x110
>> [   16.881826] RSP: :88103fc03df0  EFLAGS: 00010297
>> [   16.887130] RAX: 0148 RBX: 881037b2b5c0 RCX: 
>> 1073
>> [   16.894254] RDX: 88103221fdc0 RSI: 881038358000 RDI: 
>> 881037c5bb00
>> [   16.901377] RBP: 88103fc03df8 R08: 0001 R09: 
>> 0800
>> [   16.908503] R10: 88103221fec0 R11: ea0040de7780 R12: 
>> 881037c5bb00
>> [   16.915625] R13: 881037458000 R14: 0156 R15: 
>> 881038358000
>> [   16.922751] FS:  () GS:88103fc0() 
>> knlGS:
>> [   16.930829] CS:  0010 DS:  ES:  CR0: 80050033
>> [   16.936565] CR2: 88207000 CR3: 02206000 CR4: 
>> 001406f0
>> [   16.943689] Stack:
>> [   16.945702]  815c098f 88103fc03e90 8181c651 
>> 88103fc0dc20
>> [   16.953153]   0020 0002 
>> 88103fc03e38
>> [   16.960607]  ea0040de7780 ea08 0001 
>> ea0040de7780
>> [   16.968061] Call Trace:
>> [   16.970507]  
>> [   16.972432]  [] ? 
>> swiotlb_sync_single_for_device+0xf/0x20
>> [   16.979575]  [] igb_poll+0x691/0xe60
>> [   16.984714]  [] net_rx_action+0x1bb/0x2f0
>> [   16.990291]  [] __do_softirq+0xf6/0x280
>> [   16.995689]  [] irq_exit+0xdc/0xf0
>> [   17.000648]  [] do_IRQ+0x54/0xd0
>> [   17.005433]  [] common_interrupt+0x8c/0x8c
>> [   17.011083]  
>> [   17.013014]  [] ? mwait_idle+0x76/0x170
>> [   17.018596]  [] arch_cpu_idle+0xf/0x20
>> [   17.023907]  [] default_idle_call+0x2a/0x40
>> [   17.029648]  [] cpu_startup_entry+0x29a/0x300
>> [   17.035561]  [] rest_init+0x77/0x80
>> [   17.040608]  [] start_kernel+0x40b/0x418
>> [   17.046091]  [] ? early_idt_handler_array+0x120/0x120
>> [   17.052702]  [] x86_64_start_reservations+0x2a/0x2c
>> [   17.059132]  [] x86_64_start_kernel+0x13b/0x14a
>> [   17.065215] Code: 00 04 00 00 c9 c3 48 33 86 58 03 00 00 48 c1 e0 10 48 
>> 85 c0 0f b6 87 80 00 00 00 75 25 83 e0 f8 83 c8 01 88 87 80 00 00 00 eb 9c 
>> <0f> 0b 0f b6 87 80 00 00 00 83 e0 f8 83 c8 03 88 87 80 00 00 00
>> [   17.085165] RIP  [] eth_type_trans+0xc9/0x110
>> [   17.091094]  RSP 
>> [   17.094586] ---[ end trace e233c88f3b369632 ]---
>> [   17.101432] Kernel panic - not syncing: Fatal exception in interrupt
>> [   17.107790] Kernel Offset: disabled
>> [   17.113484] ---[ end Kernel panic - not syncing: Fatal exception in 
>> interrupt
>>
>> Do you have clues on what is going on ?
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>
> Under investigation. Look around other threads. Thanks.

Probable fix is : https://patchwork.ozlabs.org/patch/697891/


Re: Kernel BUG_ON in stable 4.8

2016-11-22 Thread Eric Dumazet
On Tue, Nov 22, 2016 at 9:44 AM, Mathieu Desnoyers
 wrote:
> - On Nov 22, 2016, at 12:01 PM, Francis Deslauriers 
> francis.deslauri...@efficios.com wrote:
>
>> Hi Mathieu,
>>
>> Here is a description of the kernel BUG_ON I have encountered. This bug was
>> triggered by our continuous integration system tracking the stable branches. 
>> I
>> was only able to reproduce it on our Lava baremetal x86_64 worker. Running 
>> the
>> same kernel image on our KVM worker did not trigger the bug.
>>
>> The bug occurs at boot time and I believe it's during the configuration of 
>> one
>> of the network cards.
>>
>> See attached the .config used.
>> See attached the dmesg output of the crash containing the kernel panic
>> output(around line 885 of dmesg.txt)
>
> Hi guys,
>
> Upstream commit 34fad54c2 "net: __skb_flow_dissect() must cap its return 
> value"
> triggers this BUG_ON at boot up on our testing machines. We have observed the
> BUG_ON on v4.9-rc6, as well as stable kernels v4.8.10 and v4.4.34.
>
> Relevant bits:
>
> [   16.841793] kernel BUG at ./include/linux/skbuff.h:1927!
> [   16.847101] invalid opcode:  [#1] SMP
> [   16.851101] Modules linked in:
> [   16.854172] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.10 #1
> [   16.860081] Hardware name: Supermicro SYS-6018R-TDW/X10DDW-i, BIOS 1.0a 
> 01/14/2015
> [   16.867637] task: 8220d500 task.stack: 8220
> [   16.873550] RIP: 0010:[]  [] 
> eth_type_trans+0xc9/0x110
> [   16.881826] RSP: :88103fc03df0  EFLAGS: 00010297
> [   16.887130] RAX: 0148 RBX: 881037b2b5c0 RCX: 
> 1073
> [   16.894254] RDX: 88103221fdc0 RSI: 881038358000 RDI: 
> 881037c5bb00
> [   16.901377] RBP: 88103fc03df8 R08: 0001 R09: 
> 0800
> [   16.908503] R10: 88103221fec0 R11: ea0040de7780 R12: 
> 881037c5bb00
> [   16.915625] R13: 881037458000 R14: 0156 R15: 
> 881038358000
> [   16.922751] FS:  () GS:88103fc0() 
> knlGS:
> [   16.930829] CS:  0010 DS:  ES:  CR0: 80050033
> [   16.936565] CR2: 88207000 CR3: 02206000 CR4: 
> 001406f0
> [   16.943689] Stack:
> [   16.945702]  815c098f 88103fc03e90 8181c651 
> 88103fc0dc20
> [   16.953153]   0020 0002 
> 88103fc03e38
> [   16.960607]  ea0040de7780 ea08 0001 
> ea0040de7780
> [   16.968061] Call Trace:
> [   16.970507]  
> [   16.972432]  [] ? swiotlb_sync_single_for_device+0xf/0x20
> [   16.979575]  [] igb_poll+0x691/0xe60
> [   16.984714]  [] net_rx_action+0x1bb/0x2f0
> [   16.990291]  [] __do_softirq+0xf6/0x280
> [   16.995689]  [] irq_exit+0xdc/0xf0
> [   17.000648]  [] do_IRQ+0x54/0xd0
> [   17.005433]  [] common_interrupt+0x8c/0x8c
> [   17.011083]  
> [   17.013014]  [] ? mwait_idle+0x76/0x170
> [   17.018596]  [] arch_cpu_idle+0xf/0x20
> [   17.023907]  [] default_idle_call+0x2a/0x40
> [   17.029648]  [] cpu_startup_entry+0x29a/0x300
> [   17.035561]  [] rest_init+0x77/0x80
> [   17.040608]  [] start_kernel+0x40b/0x418
> [   17.046091]  [] ? early_idt_handler_array+0x120/0x120
> [   17.052702]  [] x86_64_start_reservations+0x2a/0x2c
> [   17.059132]  [] x86_64_start_kernel+0x13b/0x14a
> [   17.065215] Code: 00 04 00 00 c9 c3 48 33 86 58 03 00 00 48 c1 e0 10 48 85 
> c0 0f b6 87 80 00 00 00 75 25 83 e0 f8 83 c8 01 88 87 80 00 00 00 eb 9c <0f> 
> 0b 0f b6 87 80 00 00 00 83 e0 f8 83 c8 03 88 87 80 00 00 00
> [   17.085165] RIP  [] eth_type_trans+0xc9/0x110
> [   17.091094]  RSP 
> [   17.094586] ---[ end trace e233c88f3b369632 ]---
> [   17.101432] Kernel panic - not syncing: Fatal exception in interrupt
> [   17.107790] Kernel Offset: disabled
> [   17.113484] ---[ end Kernel panic - not syncing: Fatal exception in 
> interrupt
>
> Do you have clues on what is going on ?
>
> Thanks,
>
> Mathieu
>
>

Under investigation. Look around other threads. Thanks.


Re: Kernel BUG_ON in stable 4.8

2016-11-22 Thread Eric Dumazet
On Tue, Nov 22, 2016 at 9:44 AM, Mathieu Desnoyers
 wrote:
> - On Nov 22, 2016, at 12:01 PM, Francis Deslauriers 
> francis.deslauri...@efficios.com wrote:
>
>> Hi Mathieu,
>>
>> Here is a description of the kernel BUG_ON I have encountered. This bug was
>> triggered by our continuous integration system tracking the stable branches. 
>> I
>> was only able to reproduce it on our Lava baremetal x86_64 worker. Running 
>> the
>> same kernel image on our KVM worker did not trigger the bug.
>>
>> The bug occurs at boot time and I believe it's during the configuration of 
>> one
>> of the network cards.
>>
>> See attached the .config used.
>> See attached the dmesg output of the crash containing the kernel panic
>> output(around line 885 of dmesg.txt)
>
> Hi guys,
>
> Upstream commit 34fad54c2 "net: __skb_flow_dissect() must cap its return 
> value"
> triggers this BUG_ON at boot up on our testing machines. We have observed the
> BUG_ON on v4.9-rc6, as well as stable kernels v4.8.10 and v4.4.34.
>
> Relevant bits:
>
> [   16.841793] kernel BUG at ./include/linux/skbuff.h:1927!
> [   16.847101] invalid opcode:  [#1] SMP
> [   16.851101] Modules linked in:
> [   16.854172] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.10 #1
> [   16.860081] Hardware name: Supermicro SYS-6018R-TDW/X10DDW-i, BIOS 1.0a 
> 01/14/2015
> [   16.867637] task: 8220d500 task.stack: 8220
> [   16.873550] RIP: 0010:[]  [] 
> eth_type_trans+0xc9/0x110
> [   16.881826] RSP: :88103fc03df0  EFLAGS: 00010297
> [   16.887130] RAX: 0148 RBX: 881037b2b5c0 RCX: 
> 1073
> [   16.894254] RDX: 88103221fdc0 RSI: 881038358000 RDI: 
> 881037c5bb00
> [   16.901377] RBP: 88103fc03df8 R08: 0001 R09: 
> 0800
> [   16.908503] R10: 88103221fec0 R11: ea0040de7780 R12: 
> 881037c5bb00
> [   16.915625] R13: 881037458000 R14: 0156 R15: 
> 881038358000
> [   16.922751] FS:  () GS:88103fc0() 
> knlGS:
> [   16.930829] CS:  0010 DS:  ES:  CR0: 80050033
> [   16.936565] CR2: 88207000 CR3: 02206000 CR4: 
> 001406f0
> [   16.943689] Stack:
> [   16.945702]  815c098f 88103fc03e90 8181c651 
> 88103fc0dc20
> [   16.953153]   0020 0002 
> 88103fc03e38
> [   16.960607]  ea0040de7780 ea08 0001 
> ea0040de7780
> [   16.968061] Call Trace:
> [   16.970507]  
> [   16.972432]  [] ? swiotlb_sync_single_for_device+0xf/0x20
> [   16.979575]  [] igb_poll+0x691/0xe60
> [   16.984714]  [] net_rx_action+0x1bb/0x2f0
> [   16.990291]  [] __do_softirq+0xf6/0x280
> [   16.995689]  [] irq_exit+0xdc/0xf0
> [   17.000648]  [] do_IRQ+0x54/0xd0
> [   17.005433]  [] common_interrupt+0x8c/0x8c
> [   17.011083]  
> [   17.013014]  [] ? mwait_idle+0x76/0x170
> [   17.018596]  [] arch_cpu_idle+0xf/0x20
> [   17.023907]  [] default_idle_call+0x2a/0x40
> [   17.029648]  [] cpu_startup_entry+0x29a/0x300
> [   17.035561]  [] rest_init+0x77/0x80
> [   17.040608]  [] start_kernel+0x40b/0x418
> [   17.046091]  [] ? early_idt_handler_array+0x120/0x120
> [   17.052702]  [] x86_64_start_reservations+0x2a/0x2c
> [   17.059132]  [] x86_64_start_kernel+0x13b/0x14a
> [   17.065215] Code: 00 04 00 00 c9 c3 48 33 86 58 03 00 00 48 c1 e0 10 48 85 
> c0 0f b6 87 80 00 00 00 75 25 83 e0 f8 83 c8 01 88 87 80 00 00 00 eb 9c <0f> 
> 0b 0f b6 87 80 00 00 00 83 e0 f8 83 c8 03 88 87 80 00 00 00
> [   17.085165] RIP  [] eth_type_trans+0xc9/0x110
> [   17.091094]  RSP 
> [   17.094586] ---[ end trace e233c88f3b369632 ]---
> [   17.101432] Kernel panic - not syncing: Fatal exception in interrupt
> [   17.107790] Kernel Offset: disabled
> [   17.113484] ---[ end Kernel panic - not syncing: Fatal exception in 
> interrupt
>
> Do you have clues on what is going on ?
>
> Thanks,
>
> Mathieu
>
>

Under investigation. Look around other threads. Thanks.