Private bug reported:

For the customer OpenStack deployment we deploy infra nodes on Dell R630
servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC
(quad port: 2x1G ports, 2x10G ports). For each port in UP state, we
observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.

perf report shows function bnx2x_ptp_task taking up much of the CPUs
time: https://pastebin.canonical.com/p/kfrpd6Pwh5/

Also, /var/log/syslog contains the following outputs every few seconds:

[1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only a 
single outstanding packet to timestamp, this packet will not be timestamped 
[1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only a 
single outstanding packet to timestamp, this packet will not be timestamped 
[1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only a 
single outstanding packet to timestamp, this packet will not be timestamped 
[1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only a 
single outstanding packet to timestamp, this packet will not be timestamped 

So, the problem seems to be in a "timestampped" TX packet; the driver
for some reason (to be yet understood) get an unexpected value from a
register and then, it that same function, reschedule itself to try again
this register read, read gets a bad value again, and so on infinitely.

This is showing in the system as the 100% CPU usage kthreads; the
message "The device supports only a single outstanding packet to
timestamp, this packet will not be timestamped" happens because the
driver can only timestamp a single TX packet at a time, and given it's
stuck trying, it cannot accept another packet in this "queue".

The infinite loop appears to be:

static void bnx2x_ptp_task(struct work_struct *work) 
{ 
struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task); 
int port = BP_PORT(bp); 
u32 val_seq; 
u64 timestamp, ns; 
struct skb_shared_hwtstamps shhwtstamps; 

/* Read Tx timestamp registers */ 
val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID : 
NIG_REG_P0_TLLH_PTP_BUF_SEQID); 
if (val_seq & 0x10000) { 
[...] 
} else { 
DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n"); 
/* Reschedule to keep checking for a valid timestamp value */ 
schedule_work(&bp->ptp_task); 
} 

It appears that val_seq & 0x10000 is never true, so the task constantly
reschedules itself immediately. Instrumenting the function shows that it
is being called in excess of 100,000 times per second. The REG_RD call
does appear to be expensive (as it's a register read from the device)
and shows high in the perf report, but that by itself doesn't appear to
be the root cause (i.e., it's not hanging forever in the REG_RD).

The cause appears to be that the driver is not prepared to deal with the
PTP request never being completed by the hardware. It's unclear why it
isn't completing, but regardless, the driver should not loop forever
here.


Additional info: 


ubuntu@infra-1:~$ uname -a 
Linux infra-1 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Lin 


ubuntu@infra-1:~$ lspci | grep Broadcom 
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 
01:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 
01:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 


ubuntu@infra-1:~$ lspci -n | grep 01:00 
01:00.0 0200: 14e4:168a (rev 10) 
01:00.1 0200: 14e4:168a (rev 10) 
01:00.2 0200: 14e4:168a (rev 10) 
01:00.3 0200: 14e4:168a (rev 10) 


ubuntu@infra-1:~/deploy$ sudo lshw -c network 
*-network:0 
description: Ethernet interface 
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet 
vendor: Broadcom Inc. and subsidiaries 
physical id: 0 
bus info: pci@0000:01:00.0 
logical name: eno1 
version: 10 
serial: 42:39:92:e0:66:b6 
size: 10Gbit/s 
capacity: 10Gbit/s 
width: 64 bits 
clock: 33MHz 
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet 
physical tp 100bt 100bt-fd 1000bt-fd 10000bt-fd autonegotiation 
configuration: autonegotiation=on broadcast=yes driver=bnx2x 
driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 phy 1.45 
latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s 
resources: irq:79 memory:95000000-957fffff memory:95800000-95ffffff 
memory:96030000-9603ffff memory:91a00000-91a7ffff 
*-network:1 
description: Ethernet interface 
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet 
vendor: Broadcom Inc. and subsidiaries 
physical id: 0.1 
bus info: pci@0000:01:00.1 
logical name: eno2 
version: 10 
serial: 42:39:92:e0:66:b6 
size: 10Gbit/s 
capacity: 10Gbit/s 
width: 64 bits 
clock: 33MHz 
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet 
physical tp 100bt 100bt-fd 1000bt-fd 10000bt-fd autonegotiation 
configuration: autonegotiation=on broadcast=yes driver=bnx2x 
driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 phy 1.45 
latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s 
resources: irq:90 memory:94000000-947fffff memory:94800000-94ffffff 
memory:96020000-9602ffff memory:91a80000-91afffff 
*-network:2 
description: Ethernet interface 
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet 
vendor: Broadcom Inc. and subsidiaries 
physical id: 0.2 
bus info: pci@0000:01:00.2 
logical name: eno3 
version: 10 
serial: 52:f2:aa:63:a5:3c 
size: 1Gbit/s 
capacity: 1Gbit/s 
width: 64 bits 
clock: 33MHz 
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet 
physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation 
configuration: autonegotiation=on broadcast=yes driver=bnx2x 
driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 latency=0 
link=yes multicast=yes port=twisted pair slave=yes speed=1Gbit/s 
resources: irq:90 memory:93000000-937fffff memory:93800000-93ffffff 
memory:96010000-9601ffff memory:91b00000-91b7ffff 
*-network:3 
description: Ethernet interface 
product: NetXtreme II BCM57800 1/10 Gigabit Ethernet 
vendor: Broadcom Inc. and subsidiaries 
physical id: 0.3 
bus info: pci@0000:01:00.3 
logical name: eno4 
version: 10 
serial: 52:f2:aa:63:a5:3c 
size: 1Gbit/s 
capacity: 1Gbit/s 
width: 64 bits 
clock: 33MHz 
capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet 
physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation 
configuration: autonegotiation=on broadcast=yes driver=bnx2x 
driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 latency=0 
link=yes multicast=yes port=twisted pair slave=yes speed=1Gbit/s 
resources: irq:111 memory:92000000-927fffff memory:92800000-92ffffff 
memory:96000000-9600ffff memory:91b80000-91bfffff 
*-network:0 
description: Ethernet interface 
physical id: 3 
logical name: bond1.1166 
serial: 42:39:92:e0:66:b6 
capabilities: ethernet physical 
configuration: autonegotiation=off broadcast=yes driver=802.1Q VLAN Support 
driverversion=1.8 duplex=full firmware=N/A link=yes multicast=yes 
*-network:1 
description: Ethernet interface 
physical id: 4 
logical name: bond1 
serial: 42:39:92:e0:66:b6 
capabilities: ethernet physical 
configuration: autonegotiation=off broadcast=yes driver=bonding 
driverversion=3.7.1 duplex=full firmware=2 link=yes master=yes multicast=yes 
*-network:2 
description: Ethernet interface 
physical id: 5 
logical name: broam 
serial: 36:76:ae:d3:1d:3b 
capabilities: ethernet physical 
configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A 
ip=10.246.65.10 link=yes multicast=yes 
*-network:3 
description: Ethernet interface 
physical id: 6 
logical name: brinternal 
serial: ce:27:22:0d:8b:d1 
capabilities: ethernet physical 
configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A 
ip=10.246.66.10 link=yes multicast=yes 
*-network:4 
description: Ethernet interface 
physical id: 7 
logical name: bond1.1171 
serial: 42:39:92:e0:66:b6 
capabilities: ethernet physical 
configuration: autonegotiation=off broadcast=yes driver=802.1Q VLAN Support 
driverversion=1.8 duplex=full firmware=N/A link=yes multicast=yes 
*-network:5 
description: Ethernet interface 
physical id: 8 
logical name: bond0 
serial: 52:f2:aa:63:a5:3c 
capabilities: ethernet physical 
configuration: autonegotiation=off broadcast=yes driver=bonding 
driverversion=3.7.1 duplex=full firmware=2 link=yes master=yes multicast=yes 
*-network:6 
description: Ethernet interface 
physical id: 9 
logical name: brexternal 
serial: 5e:e0:5c:1f:da:01 
capabilities: ethernet physical 
configuration: broadcast=yes driver=bridge driverversion=2.3 firmware=N/A 
ip=10.246.71.10 link=yes multicast=yes 


ubuntu@infra-1:~$ modinfo bnx2x 
filename: 
/lib/modules/4.15.0-50-generic/kernel/drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko
 
firmware: bnx2x/bnx2x-e2-7.13.1.0.fw 
firmware: bnx2x/bnx2x-e1h-7.13.1.0.fw 
firmware: bnx2x/bnx2x-e1-7.13.1.0.fw 
version: 1.712.30-0 
license: GPL 
description: QLogic 
BCM57710/57711/57711E/57712/57712_MF/57800/57800_MF/57810/57810_MF/57840/57840_MF
 Driver 
author: Eliezer Tamir 
srcversion: 5338D57FE057310DCD66774 
alias: pci:v000014E4d0000163Fsv*sd*bc*sc*i* 
alias: pci:v000014E4d0000163Esv*sd*bc*sc*i* 
alias: pci:v000014E4d0000163Dsv*sd*bc*sc*i* 
alias: pci:v00001077d000016ADsv*sd*bc*sc*i* 
alias: pci:v000014E4d000016ADsv*sd*bc*sc*i* 
alias: pci:v00001077d000016A4sv*sd*bc*sc*i* 
alias: pci:v000014E4d000016A4sv*sd*bc*sc*i* 
alias: pci:v000014E4d000016ABsv*sd*bc*sc*i* 
alias: pci:v000014E4d000016AFsv*sd*bc*sc*i* 
alias: pci:v000014E4d000016A2sv*sd*bc*sc*i* 
alias: pci:v00001077d000016A1sv*sd*bc*sc*i* 
alias: pci:v000014E4d000016A1sv*sd*bc*sc*i* 
alias: pci:v000014E4d0000168Dsv*sd*bc*sc*i* 
alias: pci:v000014E4d000016AEsv*sd*bc*sc*i* 
alias: pci:v000014E4d0000168Esv*sd*bc*sc*i* 
alias: pci:v000014E4d000016A9sv*sd*bc*sc*i* 
alias: pci:v000014E4d000016A5sv*sd*bc*sc*i* 
alias: pci:v000014E4d0000168Asv*sd*bc*sc*i* 
alias: pci:v000014E4d0000166Fsv*sd*bc*sc*i* 
alias: pci:v000014E4d00001663sv*sd*bc*sc*i* 
alias: pci:v000014E4d00001662sv*sd*bc*sc*i* 
alias: pci:v000014E4d00001650sv*sd*bc*sc*i* 
alias: pci:v000014E4d0000164Fsv*sd*bc*sc*i* 
alias: pci:v000014E4d0000164Esv*sd*bc*sc*i* 
depends: mdio,libcrc32c,ptp 
retpoline: Y 
intree: Y 
name: bnx2x 
vermagic: 4.15.0-50-generic SMP mod_unload 
signat: PKCS#7 
signer: 
sig_key: 
sig_hashalgo: md4 
parm: num_queues: Set number of queues (default is as a number of CPUs) (int) 
parm: disable_tpa: Disable the TPA (LRO) feature (int) 
parm: int_mode: Force interrupt mode other than MSI-X (1 INT#x; 2 MSI) (int) 
parm: dropless_fc: Pause on exhausted host ring (int) 
parm: mrrs: Force Max Read Req Size (0..3) (for debug) (int) 
parm: debug: Default debug msglevel (int)

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Information type changed from Public to Private

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to