A preliminary analysis of the problem, based in a crash dump collected.
>From dmesg, we have
[28663.018356] BUG: unable to handle kernel NULL pointer dereference at
0000000000000058
[28663.026266] IP: [<ffffffffc00ddb21>] ixgbe_xmit_frame_ring+0x81/0xf50
[ixgbe]
Using addr2line to validate the line in the ixgbe code, we got:
#nm ixgbe.ko |grep "ixgbe_xmit_frame_ring"
000000000000aaa0 T ixgbe_xmit_frame_ring
# printf "%0x\n" $((0xaaa0+0x81))
ab21
# addr2line -fip -e ixgbe.ko -j .text ab21
ixgbe_xmit_frame_ring at
[...]/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7403
Checking the code, it gives us the inlined function ixgbe_maybe_stop_tx(),
called from ixgbe_xmit_frame_ring():
static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
{
if (likely(ixgbe_desc_unused(tx_ring) >= size))
[...]
}
Checking now the inlined function ixgbe_desc_unused():
static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring)
{
u16 ntc = ring->next_to_clean;
u16 ntu = ring->next_to_use;
[...]
}
Using crash, we can validate the offset 0x58 in the struct ixgbe_ring
(from the null dereference at 0000000000000058):
crash> struct -ox ixgbe_ring|grep -A1 58
[0x58] u16 next_to_use;
[0x5a] u16 next_to_clean;
It matches what is expected given the ixgbe_desc_unused() code; struct
ixgbe_ring was null and the function tried to get the value of next_to_use.
Although C code shows that the value "ring->next_to_clean" should
trigger the crash before, compiler reordered the instructions as showed
by the crash disassembly:
crash> disassemble ixgbe_xmit_frame_ring
[...]
0xffffffffc00ddab8 <+24>: mov %rdx,%rbx
[...]
0xffffffffc00ddb21 <+129>: movzwl 0x58(%rbx),%eax
0xffffffffc00ddb25 <+133>: movzwl 0x5a(%rbx),%esi
[...]
Finally, from the stack frame information in crash, we can double-validate
that ixgbe_ring is null:
crash> bt -f |grep ixgbe_xmit_frame_ring -A7
[exception RIP: ixgbe_xmit_frame_ring+129]
RIP: ffffffffc00ddb21 RSP: ffff88103f283d20 RFLAGS: 00010246
RAX: 00000000000000c2 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff8800538c0840 RDI: ffff881034167ec0
[...]
Since the x86-64 ABI calling convention specifies that the parameters
are passed in registers RDI, RSI, RDX (in that order), the 3rd parameter
(ixgbe_ring) is in RDX, which is null.
I'll continue the investigation now to understand why this value was null
at this point.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877
Title:
Crash in ixgbe, during tx packet xmit (while potentially changing
queues number)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs