I was reading assembly and comparing with the code to evaluate the
accuracy of the registers during the dump, and also some points in which
it could has failed.
In the tx transmit function of ixgbe - ixgbe_xmit_frame(), we have the
following:
tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];
Checking the assembly, it really ignores ring in this point since it comes as
NULL, so
we are getting a NULL tx_ring due to adapter->tx_ring[skb->queue_mapping] being
NULL.
The struct sk_buff passed in %rdi is odd, it contains no valid data it seems.
Even so,
the queue_mapping is 0x0, and checking ixgbe_adapter during the crash moment,
adapter->tx_ring[0x0] is valid and shouldn't cause the NULL pointer
dereference.
I think a race may be happening and in the moment tx_ring is assigned in
ixgbe_xmit_frame(), it's NULL, but it's filled right after with a valid pointer
in
another function, running concurrently.
I've noticed a queue allocation function assigns this pointer, and also, one
interesting thing I've observed from dmesg is a successive amount of
interface/queue re-initialization (it seems):
[ 6.628974] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 20, Tx
Queue count = 20
[...]
[ 1493.198280] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control:
RX/TX
[...]
[ 4113.173315] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 19, Tx
Queue count = 19
[ 4113.365528] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control:
RX/TX
[...]
[28662.834289] ixgbe 0000:04:00.1: Multiqueue Enabled: Rx Queue count = 18, Tx
Queue count = 18
[28663.018209] ixgbe 0000:04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control:
RX/TX
[28663.018356] BUG: unable to handle kernel NULL pointer dereference at
0000000000000058
So, noticed the number of queues is reducing by 1 each time we see these
messages in dmesg.
It seems triggered by "ethtool --set-channels" changing the number of tx/rx
queues for the interface.
Also, an oddity from the dump:
crash> ixgbe_adapter -x ffff8800538c0840
struct ixgbe_adapter {
active_vlans = {0x1, 0x0, [...] 0x0, 0x5500000000000, 0x0},
[...]
So, besides the VLAN 0, there's more bits set in this bit field; I don't know
why, it seems not
expected, will study more the code.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877
Title:
Crash in ixgbe, during tx packet xmit (while potentially changing
queues number)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs