** Description changed:

- (This bug provides a place to track the progress of this issue upstream
- and then in to Ubuntu.)
+ SRU Justification
+ =================
  
  A ppc64le system runs as a guest under PowerVM. This guest has a bnx2x
  card attached, and uses openvswitch to bridge an ibmveth interface for
  traffic from other LPARs.
  
  We see the following crash sometimes when running netperf:
- May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_attn_int_deasserted3:4323(enP24p1s0f2)]MC assert! 
- May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:720(enP24p1s0f2)]XSTORM_ASSERT_LIST_INDEX 0x2 
- May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:736(enP24p1s0f2)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 
0x25e42a7e 0x00462a38 0x00010052 
- May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:750(enP24p1s0f2)]Chip Revision: everest3, FW Version: 7_13_1 
- May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_attn_int_deasserted3:4329(enP24p1s0f2)]driver assert 
- May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_panic_dump:923(enP24p1s0f2)]begin crash dump ----------------- 
+ May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_attn_int_deasserted3:4323(enP24p1s0f2)]MC assert!
+ May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:720(enP24p1s0f2)]XSTORM_ASSERT_LIST_INDEX 0x2
+ May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:736(enP24p1s0f2)]XSTORM_ASSERT_INDEX 0x0 = 0x00000000 
0x25e42a7e 0x00462a38 0x00010052
+ May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_mc_assert:750(enP24p1s0f2)]Chip Revision: everest3, FW Version: 7_13_1
+ May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_attn_int_deasserted3:4329(enP24p1s0f2)]driver assert
+ May 10 17:16:32 tuk6r1phn2 kernel: bnx2x: 
[bnx2x_panic_dump:923(enP24p1s0f2)]begin crash dump -----------------
  ... (dump of registers follows) ...
  
  Subsequent debugging reveals that the packets causing the issue come
  through the ibmveth interface - from the AIX LPAR. The veth protocol is
  'special' - communication between LPARs on the same chassis can use very
  large (64k) frames to reduce overhead. Normal networks cannot handle
  such large packets, so traditionally, the VIOS partition would signal to
  the AIX partitions that it was 'special', and AIX would send regular,
  ethernet-sized packets to VIOS, which VIOS would then send out.
  
  This signalling between VIOS and AIX is done in a way that is not
  standards-compliant, and so was never made part of Linux. Instead, the
  Linux driver has always understood large frames and passed them up the
  network stack.
  
  In some cases (e.g. with TCP), multiple TCP segments are coalesced into
  one large packet. In Linux, this goes through the generic receive
  offload code, using a similar mechanism to GSO. These segments can be
  very large which presents as a very large MSS (maximum segment size) or
  gso_size.
  
  Normally, the large packet is simply passed to whatever network
  application on Linux is going to consume it, and everything is OK.
  
  However, in this case, the packets go through Open vSwitch, and are then
  passed to the bnx2x driver. The bnx2x driver/hardware supports TSO and
  GSO, but with a restriction: the maximum segment size is limited to
- around 9700 bytes. Normally this is more than adequate as jumbo frames
- are limited to 9000 bytes. However, if a large packet with large (>9700
- byte) TCP segments arrives through ibmveth, and is passed to bnx2x, the
- hardware will panic.
+ around 9700 bytes. Normally this is more than adequate. However, if a
+ large packet with very large (>9700 byte) TCP segments arrives through
+ ibmveth, and is passed to bnx2x, the hardware will panic.
  
- Turning off TSO prevents the crash as the kernel resegments the data and
- assembles the packets in software. This has a performance cost.
+ Impact
+ ------
  
- Clearly at the very least, bnx2x should not crash in this case.
+ bnx2x card panics, requiring power cycle to restore functionality.
  
- One patch to do this was sent upstream:
- https://www.spinics.net/lists/netdev/msg452932.html
+ The workaround is turning off TSO, which prevents the crash as the
+ kernel resegments *all* packets in software, not just ones that are too
+ big. This has a performance cost.
+ 
+ 
+ Fix
+ ---
+ 
+ Test packet size in bnx2x feature check path.
+ 
+ Regression Potential
+ --------------------
+ 
+ Limited to bnx2x card driver.
+ The most likely failure case is a false-positive on the size check, which 
would lead to a performance regression only.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1715519

Title:
  bnx2x_attn_int_deasserted3:4323 MC assert!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1715519/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to