A preliminary analysis of the problem, based in a crash dump collected.

>From dmesg, we have

[28663.018356] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000058 
[28663.026266] IP: [<ffffffffc00ddb21>] ixgbe_xmit_frame_ring+0x81/0xf50 
[ixgbe] 

Using addr2line to validate the line in the ixgbe code, we got:

#nm ixgbe.ko |grep "ixgbe_xmit_frame_ring" 
000000000000aaa0 T ixgbe_xmit_frame_ring 

# printf "%0x\n" $((0xaaa0+0x81)) 
ab21 

# addr2line -fip -e ixgbe.ko -j .text ab21 
ixgbe_xmit_frame_ring at 
[...]/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7403 


Checking the code, it gives us the inlined function ixgbe_maybe_stop_tx(), 
called from ixgbe_xmit_frame_ring(): 

static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size) 
{ 
if (likely(ixgbe_desc_unused(tx_ring) >= size)) 
[...] 
} 


Checking now the inlined function ixgbe_desc_unused(): 
static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring) 
{ 
u16 ntc = ring->next_to_clean; 
u16 ntu = ring->next_to_use; 
[...] 
} 


Using crash, we can validate the offset 0x58 in the struct ixgbe_ring 
(from the null dereference at 0000000000000058): 

crash> struct -ox ixgbe_ring|grep -A1 58 
[0x58] u16 next_to_use; 
[0x5a] u16 next_to_clean; 

It matches what is expected given the ixgbe_desc_unused() code; struct 
ixgbe_ring was null and the function tried to get the value of next_to_use. 

Although C code shows that the value "ring->next_to_clean" should 
trigger the crash before, compiler reordered the instructions as showed 
by the crash disassembly: 

crash> disassemble ixgbe_xmit_frame_ring 
[...] 
0xffffffffc00ddab8 <+24>: mov %rdx,%rbx 
[...] 
0xffffffffc00ddb21 <+129>: movzwl 0x58(%rbx),%eax 
0xffffffffc00ddb25 <+133>: movzwl 0x5a(%rbx),%esi 
[...] 


Finally, from the stack frame information in crash, we can double-validate 
that ixgbe_ring is null: 

crash> bt -f |grep ixgbe_xmit_frame_ring -A7 
[exception RIP: ixgbe_xmit_frame_ring+129] 
RIP: ffffffffc00ddb21 RSP: ffff88103f283d20 RFLAGS: 00010246 
RAX: 00000000000000c2 RBX: 0000000000000000 RCX: 0000000000000001 
RDX: 0000000000000000 RSI: ffff8800538c0840 RDI: ffff881034167ec0 
[...] 

Since the x86-64 ABI calling convention specifies that the parameters 
are passed in registers RDI, RSI, RDX (in that order), the 3rd parameter 
(ixgbe_ring) is in RDX, which is null. 

I'll continue the investigation now to understand why this value was null 
at this point.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to