[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2020-07-14 Thread Guilherme G. Piccoli
We couldn't reproduce the bug and reporter cannot help in providing data, so 
we're marking as invalid. If anybody ever reproduces that, please ping here and 
reopen.
Thanks,


Guilherme

** Changed in: linux (Ubuntu)
   Status: In Progress => Invalid

** Changed in: linux (Ubuntu Xenial)
   Status: Confirmed => Invalid

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2019-07-24 Thread Brad Figg
** Tags added: cscc

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-10-02 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu Xenial)
   Status: New => Confirmed

** Changed in: linux (Ubuntu)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-10-02 Thread Eric Desrochers
** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
I was reading assembly and comparing with the code to evaluate the
accuracy of the registers during the dump, and also some points in which
it could has failed.

In the tx transmit function of ixgbe - ixgbe_xmit_frame(), we have the
following:

tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];

Checking the assembly, it really ignores ring in this point since it comes as 
NULL, so 
we are getting a NULL tx_ring due to adapter->tx_ring[skb->queue_mapping] being 
NULL. 

The struct sk_buff passed in %rdi is odd, it contains no valid data it seems. 
Even so, 
the queue_mapping is 0x0, and checking ixgbe_adapter during the crash moment, 
adapter->tx_ring[0x0] is valid and shouldn't cause the NULL pointer 
dereference. 

I think a race may be happening and in the moment tx_ring is assigned in 
ixgbe_xmit_frame(), it's NULL, but it's filled right after with a valid pointer 
in 
another function, running concurrently. 

I've noticed a queue allocation function assigns this pointer, and also, one 
interesting thing I've observed from dmesg is a successive amount of 
interface/queue re-initialization (it seems): 

[ 6.628974] ixgbe :04:00.1: Multiqueue Enabled: Rx Queue count = 20, Tx 
Queue count = 20 
[...] 
[ 1493.198280] ixgbe :04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
[...] 
[ 4113.173315] ixgbe :04:00.1: Multiqueue Enabled: Rx Queue count = 19, Tx 
Queue count = 19 
[ 4113.365528] ixgbe :04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
[...] 
[28662.834289] ixgbe :04:00.1: Multiqueue Enabled: Rx Queue count = 18, Tx 
Queue count = 18 
[28663.018209] ixgbe :04:00.1 eth5: NIC Link is Up 10 Gbps, Flow Control: 
RX/TX 
[28663.018356] BUG: unable to handle kernel NULL pointer dereference at 
0058 

So, noticed the number of queues is reducing by 1 each time we see these 
messages in dmesg. 
It seems triggered by "ethtool --set-channels" changing the number of tx/rx 
queues for the interface.


Also, an oddity from the dump: 

crash> ixgbe_adapter -x 8800538c0840 
struct ixgbe_adapter { 
active_vlans = {0x1, 0x0, [...] 0x0, 0x55000, 0x0}, 
[...] 

So, besides the VLAN 0, there's more bits set in this bit field; I don't know 
why, it seems not 
expected, will study more the code.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
** Attachment added: "lspci_-nnvv.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+attachment/5193804/+files/lspci_-nnvv.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
A preliminary analysis of the problem, based in a crash dump collected.

>From dmesg, we have

[28663.018356] BUG: unable to handle kernel NULL pointer dereference at 
0058 
[28663.026266] IP: [] ixgbe_xmit_frame_ring+0x81/0xf50 
[ixgbe] 

Using addr2line to validate the line in the ixgbe code, we got:

#nm ixgbe.ko |grep "ixgbe_xmit_frame_ring" 
aaa0 T ixgbe_xmit_frame_ring 

# printf "%0x\n" $((0xaaa0+0x81)) 
ab21 

# addr2line -fip -e ixgbe.ko -j .text ab21 
ixgbe_xmit_frame_ring at 
[...]/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:7403 


Checking the code, it gives us the inlined function ixgbe_maybe_stop_tx(), 
called from ixgbe_xmit_frame_ring(): 

static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size) 
{ 
if (likely(ixgbe_desc_unused(tx_ring) >= size)) 
[...] 
} 


Checking now the inlined function ixgbe_desc_unused(): 
static inline u16 ixgbe_desc_unused(struct ixgbe_ring *ring) 
{ 
u16 ntc = ring->next_to_clean; 
u16 ntu = ring->next_to_use; 
[...] 
} 


Using crash, we can validate the offset 0x58 in the struct ixgbe_ring 
(from the null dereference at 0058): 

crash> struct -ox ixgbe_ring|grep -A1 58 
[0x58] u16 next_to_use; 
[0x5a] u16 next_to_clean; 

It matches what is expected given the ixgbe_desc_unused() code; struct 
ixgbe_ring was null and the function tried to get the value of next_to_use. 

Although C code shows that the value "ring->next_to_clean" should 
trigger the crash before, compiler reordered the instructions as showed 
by the crash disassembly: 

crash> disassemble ixgbe_xmit_frame_ring 
[...] 
0xc00ddab8 <+24>: mov %rdx,%rbx 
[...] 
0xc00ddb21 <+129>: movzwl 0x58(%rbx),%eax 
0xc00ddb25 <+133>: movzwl 0x5a(%rbx),%esi 
[...] 


Finally, from the stack frame information in crash, we can double-validate 
that ixgbe_ring is null: 

crash> bt -f |grep ixgbe_xmit_frame_ring -A7 
[exception RIP: ixgbe_xmit_frame_ring+129] 
RIP: c00ddb21 RSP: 88103f283d20 RFLAGS: 00010246 
RAX: 00c2 RBX:  RCX: 0001 
RDX:  RSI: 8800538c0840 RDI: 881034167ec0 
[...] 

Since the x86-64 ABI calling convention specifies that the parameters 
are passed in registers RDI, RSI, RDX (in that order), the 3rd parameter 
(ixgbe_ring) is in RDX, which is null. 

I'll continue the investigation now to understand why this value was null 
at this point.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
** Attachment added: "disassembly of relevant functions in crash"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+attachment/5193816/+files/ixgbe_xmit_ring.asm

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
** Attachment added: "ethtool_-i_eth5.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+attachment/5193805/+files/ethtool_-i_eth5.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1794877] Re: Crash in ixgbe, during tx packet xmit (while potentially changing queues number)

2018-09-27 Thread Guilherme G. Piccoli
lspci -nn output for this adapter:

04:00.1 Ethernet controller [0200]: Intel Corporation 82599 10 Gigabit Dual 
Port Backplane Connection [8086:10f8] (rev 01)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- https://bugs.launchpad.net/bugs/1794877

Title:
  Crash in ixgbe, during tx packet xmit (while potentially changing
  queues number)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794877/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs