[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-08-13 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.4.0-159.187

---
linux (4.4.0-159.187) xenial; urgency=medium

  * CVE-2019-1125
- x86/cpufeatures: Carve out CQM features retrieval
- x86/cpufeatures: Combine word 11 and 12 into a new scattered features word
- x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
- x86/speculation: Enable Spectre v1 swapgs mitigations
- x86/entry/64: Use JMP instead of JMPQ
- x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS

linux (4.4.0-158.186) xenial; urgency=medium

  * xenial/linux: 4.4.0-158.186 -proposed tracker (LP: #1837609)

  * Packaging resync (LP: #1786013)
- [Packaging] resync git-ubuntu-log
- [Packaging] update helper scripts

  * ixgbe{vf} - Physical Function gets IRQ when VF checks link state
(LP: #1836760)
- ixgbevf: Use cached link state instead of re-reading the value for ethtool

  * CVE-2018-5383
- crypto: kpp - Key-agreement Protocol Primitives API (KPP)
- crypto: dh - Add DH software implementation
- crypto: ecdh - Add ECDH software support
- crypto: ecdh - make ecdh_shared_secret unique
- crypto: doc - add KPP documentation
- crypto: kpp, (ec)dh - fix typos
- crypto: ecc - remove unused function arguments
- crypto: ecc - remove unnecessary casts
- crypto: ecc - rename ecdh_make_pub_key()
- crypto: ecdh - add privkey generation support
- crypto: ecc - Fix NULL pointer deref. on no default_rng
- [Config] CRYPTO_ECDH=m
- Bluetooth: convert smp and selftest to crypto kpp API
- crypto: ecdh - add public key verification test

  * Xenial update: 4.4.185 upstream stable release (LP: #1836668)
- fs/binfmt_flat.c: make load_flat_shared_library() work
- scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
- tracing: Silence GCC 9 array bounds warning
- gcc-9: silence 'address-of-packed-member' warning
- usb: chipidea: udc: workaround for endpoint conflict issue
- Input: uinput - add compat ioctl number translation for UI_*_FF_UPLOAD
- apparmor: enforce nullbyte at end of tag string
- parport: Fix mem leak in parport_register_dev_model
- parisc: Fix compiler warnings in float emulation code
- IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
- MIPS: uprobes: remove set but not used variable 'epc'
- net: hns: Fix loopback test failed at copper ports
- sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD
- scripts/checkstack.pl: Fix arm64 wrong or unknown architecture
- scsi: ufs: Check that space was properly alloced in copy_query_response
- s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
- hwmon: (pmbus/core) Treat parameters as paged if on multiple pages
- Btrfs: fix race between readahead and device replace/removal
- btrfs: start readahead also in seed devices
- can: flexcan: fix timeout when set small bitrate
- can: purge socket error queue on sock destruct
- ARM: imx: cpuidle-imx6sx: Restrict the SW2ISO increase to i.MX6SX
- Bluetooth: Align minimum encryption key size for LE and BR/EDR connections
- Bluetooth: Fix regression with minimum encryption key size alignment
- SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write
- cfg80211: fix memory leak of wiphy device name
- mac80211: drop robust management frames from unknown TA
- perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit
  set nul
- perf help: Remove needless use of strncpy()
- 9p/rdma: do not disconnect on down_interruptible EAGAIN
- 9p: acl: fix uninitialized iattr access
- 9p/rdma: remove useless check in cm_event_handler
- 9p: p9dirent_read: check network-provided name length
- net/9p: include trans_common.h to fix missing prototype warning.
- ovl: modify ovl_permission() to do checks on two inodes
- x86/speculation: Allow guests to use SSBD even if host does not
- cpu/speculation: Warn on unsupported mitigations= parameter
- sctp: change to hold sk after auth shkey is created successfully
- tipc: change to use register_pernet_device
- tipc: check msg->req data len in tipc_nl_compat_bearer_disable
- team: Always enable vlan tx offload
- ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
- bonding: Always enable vlan tx offload
- net: check before dereferencing netdev_ops during busy poll
- Bluetooth: Fix faulty expression for minimum encryption key size check
- um: Compile with modern headers
- ASoC : cs4265 : readable register too low
- spi: bitbang: Fix NULL pointer dereference in spi_unregister_master
- ASoC: max98090: remove 24-bit format support if RJ is 0
- usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i]
- usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC
- scsi: hpsa: correct ioaccel2 chaining
- 

[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-08-13 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.0.0-25.26

---
linux (5.0.0-25.26) disco; urgency=medium

  * CVE-2019-1125
- x86/cpufeatures: Carve out CQM features retrieval
- x86/cpufeatures: Combine word 11 and 12 into a new scattered features word
- x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations
- x86/speculation: Enable Spectre v1 swapgs mitigations
- x86/entry/64: Use JMP instead of JMPQ
- x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS

 -- Kleber Sacilotto de Souza   Thu, 01 Aug
2019 12:04:35 +0200

** Changed in: linux (Ubuntu Disco)
   Status: Fix Committed => Fix Released

** CVE added: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2019-1125

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in FF-Series:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-08-12 Thread Pedro GuimarĂ£es
Hi, does the fix for Bionic has been backported to stable?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in FF-Series:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-08-09 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.2.0-10.11

---
linux (5.2.0-10.11) eoan; urgency=medium

  * eoan/linux: 5.2.0-10.11 -proposed tracker (LP: #1838113)

  * Packaging resync (LP: #1786013)
- [Packaging] resync git-ubuntu-log

  * Eoan update: v5.2.4 upstream stable release (LP: #1838428)
- bnx2x: Prevent load reordering in tx completion processing
- caif-hsi: fix possible deadlock in cfhsi_exit_module()
- hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
- igmp: fix memory leak in igmpv3_del_delrec()
- ipv4: don't set IPv6 only flags to IPv4 addresses
- ipv6: rt6_check should return NULL if 'from' is NULL
- ipv6: Unlink sibling route in case of failure
- net: bcmgenet: use promisc for unsupported filters
- net: dsa: mv88e6xxx: wait after reset deactivation
- net: make skb_dst_force return true when dst is refcounted
- net: neigh: fix multiple neigh timer scheduling
- net: openvswitch: fix csum updates for MPLS actions
- net: phy: sfp: hwmon: Fix scaling of RX power
- net_sched: unset TCQ_F_CAN_BYPASS when adding filters
- net: stmmac: Re-work the queue selection for TSO packets
- net/tls: make sure offload also gets the keys wiped
- nfc: fix potential illegal memory access
- r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
- rxrpc: Fix send on a connected, but unbound socket
- sctp: fix error handling on stream scheduler initialization
- sctp: not bind the socket in sctp_connect
- sky2: Disable MSI on ASUS P6T
- tcp: be more careful in tcp_fragment()
- tcp: fix tcp_set_congestion_control() use from bpf hook
- tcp: Reset bytes_acked and bytes_received when disconnecting
- vrf: make sure skb->data contains ip header to make routing
- net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn
- net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
- net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
- net: bridge: don't cache ether dest pointer on input
- net: bridge: stp: don't cache eth dest pointer before skb pull
- macsec: fix use-after-free of skb during RX
- macsec: fix checksumming after decryption
- netrom: fix a memory leak in nr_rx_frame()
- netrom: hold sock when setting skb->destructor
- selftests: txring_overwrite: fix incorrect test of mmap() return value
- net/tls: fix poll ignoring partially copied records
- net/tls: reject offload of TLS 1.3
- net/mlx5e: Fix port tunnel GRE entropy control
- net/mlx5e: Rx, Fix checksum calculation for new hardware
- net/mlx5e: Fix return value from timeout recover function
- net/mlx5e: Fix error flow in tx reporter diagnose
- bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
- mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed
- net/mlx5: E-Switch, Fix default encap mode
- mlxsw: spectrum: Do not process learned records with a dummy FID
- dma-buf: balance refcount inbalance
- dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc
- Revert "gpio/spi: Fix spi-gpio regression on active high CS"
- gpiolib: of: fix a memory leak in of_gpio_flags_quirks()
- gpio: davinci: silence error prints in case of EPROBE_DEFER
- MIPS: lb60: Fix pin mappings
- perf script: Assume native_arch for pipe mode
- perf/core: Fix exclusive events' grouping
- perf/core: Fix race between close() and fork()
- ext4: don't allow any modifications to an immutable file
- ext4: enforce the immutable flag on open files
- mm: add filemap_fdatawait_range_keep_errors()
- jbd2: introduce jbd2_inode dirty range scoping
- ext4: use jbd2_inode dirty range scoping
- ext4: allow directory holes
- KVM: nVMX: do not use dangling shadow VMCS after guest reset
- KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
- Revert "kvm: x86: Use task structs fpu field for user"
- sd_zbc: Fix report zones buffer allocation
- block: Limit zone array allocation size
- net: sched: verify that q!=NULL before setting q->flags
- Linux 5.2.4

  * linux hwe i386 kernel 5.0.0-21.22~18.04.1 crashes on Lenovo x220
(LP: #1838115)
- x86/mm: Check for pfn instead of page in vmalloc_sync_one()
- x86/mm: Sync also unmappings in vmalloc_sync_all()
- mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()

  * br_netfilter: namespace sysctl operations (LP: #1836910)
- netfilter: bridge: port sysctls to use brnf_net
- netfilter: bridge: namespace bridge netfilter sysctls
- netfilter: bridge: prevent UAF in brnf_exit_net()

  * Eoan update: v5.2.3 upstream stable release (LP: #1838089)
- ath10k: Check tx_stats before use it
- ath10k: htt: don't use txdone_fifo with SDIO
- ath10k: fix incorrect multicast/broadcast rate setting
- ath9k: Don't trust TX status TID 

[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-30 Thread Guilherme G. Piccoli
I've validated the -proposed kernels for Xenial (4.4.0-158), Bionic (4.15.0-56) 
and Disco (5.0.0-22), using the test case mentioned in the description. All 
working fine, the issue is gone.
Also, the patch was released upstream in the 5.3.x series, so I'll mark 
ff-series as Released.

Cheers,


Guilherme

** Changed in: linux (Ubuntu Ff-series)
   Status: Fix Committed => Fix Released

** Tags removed: verification-needed-bionic verification-needed-disco 
verification-needed-xenial
** Tags added: verification-done-bionic verification-done-disco 
verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-30 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
xenial' to 'verification-done-xenial'. If the problem still exists,
change the tag 'verification-needed-xenial' to 'verification-failed-
xenial'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-25 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-25 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
disco' to 'verification-done-disco'. If the problem still exists, change
the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-24 Thread Brad Figg
** Tags added: cscc

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-16 Thread Kleber Sacilotto de Souza
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Disco)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Xenial)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-10 Thread Stefan Bader
** Changed in: linux (Ubuntu Cosmic)
   Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-03 Thread Guilherme G. Piccoli
SRU submitted to kernel mailing-list: https://lists.ubuntu.com/archives
/kernel-team/2019-July/101925.html

I've marked Xenial->Disco as "In Progress", because we need acceptance from 
kernel team.
On the other hand, Devel series (Eoan/Ff) will get the fix via regular rebase 
with
Linus tree, hence I've put "Fix Committed".

Thanks,


Guilherme

** Changed in: linux (Ubuntu Ff-series)
   Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Eoan)
   Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Disco)
   Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Cosmic)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Eoan)
   Status: Fix Committed => In Progress

** Changed in: linux (Ubuntu Eoan)
   Status: In Progress => Fix Committed

** Changed in: linux (Ubuntu Disco)
   Status: Fix Committed => In Progress

** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu Xenial)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in FF-Series:
  Fix Committed

Bug description:
  [Impact]

  * The PTP feature in bnx2x driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping - which is
  observed as a bad register read in bnx2x_ptp_task() - then the ptp
  worker function will reschedule itself indefinitely until the value
  read from the register is meaningful. With that behavior, if an
  userspace tool request a bad configured RX filter to bnx2x (or if NIC
  firmware has any other issue in timestamping), the function
  bnx2x_ptp_task() will be rescheduled forever and cause a unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  
  * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":

  "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
  outstanding packet to timestamp, this packet will not be timestamped"

  Also, by using ftrace user can notice that function bnx2x_ptp_task()
  is being called a lot, and by enabling bnx2x PTP debugging log
  (ethtool -s  msglvl 16777216) it's possible to observe the
  following message flooding the kernel log:

  "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp
  yet"

  
  * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
  git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
  Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.


  [Test case]

  Reproducing the problem is not difficult; we've used chrony in Bionic
  to trigger the problem. The steps are:

  a) Install chrony on Bionic in a system with working NIC managed by
  bnx2x;

  b) Edit chrony configuration and add: "hwtimestamp *" to the top of
  its conf file;

  c) Restart chrony service

  Check dmesg for the "[...]single outstanding packet" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  
  [Regression potential]

  The patch scope is restricted to bnx2x ptp handler, and was validated
  by the driver maintainer. If there's any possibility of regressions,
  we believe the worst would be an issue affecting the packet
  timestamping, not messing with the regular xmit path for the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-07-03 Thread Guilherme G. Piccoli
** Description changed:

- For the customer OpenStack deployment we deploy infra nodes on Dell R630
- servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC
- (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we
- observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.
+ [Impact]
  
- perf report shows function bnx2x_ptp_task taking up much of the CPUs
- time: https://pastebin.canonical.com/p/kfrpd6Pwh5/
+ * The PTP feature in bnx2x driver is implemented in a way that if the
+ NIC firmware takes some time to perform the timestamping - which is
+ observed as a bad register read in bnx2x_ptp_task() - then the ptp
+ worker function will reschedule itself indefinitely until the value read
+ from the register is meaningful. With that behavior, if an userspace
+ tool request a bad configured RX filter to bnx2x (or if NIC firmware has
+ any other issue in timestamping), the function bnx2x_ptp_task() will be
+ rescheduled forever and cause a unbound resource consumption. This
+ manifests as a kworker thread consuming 100% of CPU.
  
- Also, /var/log/syslog contains the following outputs every few seconds:
  
- [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
+ * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":
  
- So, the problem seems to be in a "timestampped" TX packet; the driver
- for some reason (to be yet understood) get an unexpected value from a
- register and then, it that same function, reschedule itself to try again
- this register read, read gets a bad value again, and so on infinitely.
+ "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
+ outstanding packet to timestamp, this packet will not be timestamped"
  
- This is showing in the system as the 100% CPU usage kthreads; the
- message "The device supports only a single outstanding packet to
- timestamp, this packet will not be timestamped" happens because the
- driver can only timestamp a single TX packet at a time, and given it's
- stuck trying, it cannot accept another packet in this "queue".
+ Also, by using ftrace user can notice that function bnx2x_ptp_task() is
+ being called a lot, and by enabling bnx2x PTP debugging log (ethtool -s
+  msglvl 16777216) it's possible to observe the following message
+ flooding the kernel log:
  
- The infinite loop appears to be:
+ "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet"
  
- static void bnx2x_ptp_task(struct work_struct *work)
- {
- struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
- int port = BP_PORT(bp);
- u32 val_seq;
- u64 timestamp, ns;
- struct skb_shared_hwtstamps shhwtstamps;
  
- /* Read Tx timestamp registers */
- val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
- NIG_REG_P0_TLLH_PTP_BUF_SEQID);
- if (val_seq & 0x1) {
- [...]
- } else {
- DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
- /* Reschedule to keep checking for a valid timestamp value */
- schedule_work(>ptp_task);
- }
+ * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
+ git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
+ Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.
  
- It appears that val_seq & 0x1 is never true, so the task constantly
- reschedules itself immediately. Instrumenting the function shows that it
- is being called in excess of 100,000 times per second. The REG_RD call
- does appear to be expensive (as it's a register read from the device)
- and shows high in the perf report, but that by itself doesn't appear to
- be the root cause (i.e., it's not hanging forever in the REG_RD).
  
- The cause appears to be that the driver is not prepared to deal with the
- PTP request never being completed by the hardware. It's unclear why it
- isn't completing, but regardless, the driver should not loop forever
- here.
+ [Test case]
+ 
+ Reproducing the problem is not difficult; we've used chrony in Bionic to
+ trigger the problem. The steps are:
+ 
+ a) Install chrony on Bionic in a system with working NIC managed by
+ bnx2x;
+ 
+ b) Edit chrony configuration and add: "hwtimestamp *" to the top of 

[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-06-21 Thread Guilherme G. Piccoli
** Attachment added: "system_details.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+attachment/5272083/+files/system_details.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Cosmic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Confirmed
Status in linux source package in FF-Series:
  Confirmed

Bug description:
  For the customer OpenStack deployment we deploy infra nodes on Dell
  R630 servers. The servers have onboard Broadcom's NetXtreme II
  BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP
  state, we observe 100% CPU load. So in total, we observe 4 CPUs with
  100% load.

  perf report shows function bnx2x_ptp_task taking up much of the CPUs
  time: https://pastebin.canonical.com/p/kfrpd6Pwh5/

  Also, /var/log/syslog contains the following outputs every few
  seconds:

  [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
  [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
  [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
  [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped

  So, the problem seems to be in a "timestampped" TX packet; the driver
  for some reason (to be yet understood) get an unexpected value from a
  register and then, it that same function, reschedule itself to try
  again this register read, read gets a bad value again, and so on
  infinitely.

  This is showing in the system as the 100% CPU usage kthreads; the
  message "The device supports only a single outstanding packet to
  timestamp, this packet will not be timestamped" happens because the
  driver can only timestamp a single TX packet at a time, and given it's
  stuck trying, it cannot accept another packet in this "queue".

  The infinite loop appears to be:

  static void bnx2x_ptp_task(struct work_struct *work)
  {
  struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
  int port = BP_PORT(bp);
  u32 val_seq;
  u64 timestamp, ns;
  struct skb_shared_hwtstamps shhwtstamps;

  /* Read Tx timestamp registers */
  val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
  NIG_REG_P0_TLLH_PTP_BUF_SEQID);
  if (val_seq & 0x1) {
  [...]
  } else {
  DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
  /* Reschedule to keep checking for a valid timestamp value */
  schedule_work(>ptp_task);
  }

  It appears that val_seq & 0x1 is never true, so the task
  constantly reschedules itself immediately. Instrumenting the function
  shows that it is being called in excess of 100,000 times per second.
  The REG_RD call does appear to be expensive (as it's a register read
  from the device) and shows high in the perf report, but that by itself
  doesn't appear to be the root cause (i.e., it's not hanging forever in
  the REG_RD).

  The cause appears to be that the driver is not prepared to deal with
  the PTP request never being completed by the hardware. It's unclear
  why it isn't completing, but regardless, the driver should not loop
  forever here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-06-21 Thread Guilherme G. Piccoli
Thanks Przemyslaw, good explanation on bug's description! I'm dealing
with this one, will update status here with news.

Cheers,


Guilherme

** Also affects: linux (Ubuntu Cosmic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
   Status: Incomplete

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Ff-series)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Eoan)
   Status: Incomplete => Confirmed

** Changed in: linux (Ubuntu Ff-series)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Disco)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Cosmic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Xenial)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Cosmic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Disco)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Eoan)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Ff-series)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Cosmic)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Ff-series)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Eoan)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Disco)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Tags removed: bionic
** Tags added: bnx2x sts

** Description changed:

  For the customer OpenStack deployment we deploy infra nodes on Dell R630
  servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC
  (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we
  observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.
  
  perf report shows function bnx2x_ptp_task taking up much of the CPUs
  time: https://pastebin.canonical.com/p/kfrpd6Pwh5/
  
  Also, /var/log/syslog contains the following outputs every few seconds:
  
- [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 
- [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 
- [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 
- [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 
+ [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
+ [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
+ [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
+ [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
  
  So, the problem seems to be in a "timestampped" TX packet; the driver
  for some reason (to be yet understood) get an unexpected value from a
  register and then, it that same function, reschedule itself to try again
  this register read, read gets a bad value again, and so on infinitely.
  
  This is showing in the system as the 100% CPU usage kthreads; the
  message "The device supports only a single outstanding packet to
  timestamp, this packet will not be timestamped" happens because the
  driver can only timestamp a single TX packet at a time, and given it's
  stuck trying, it cannot accept another packet in this "queue".
  
  The infinite loop appears to be:
  
- static void bnx2x_ptp_task(struct work_struct *work) 
- { 
- struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task); 
- int port = BP_PORT(bp); 
- u32 val_seq; 
- u64 timestamp, ns; 
- struct skb_shared_hwtstamps shhwtstamps; 
+ static void bnx2x_ptp_task(struct work_struct *work)
+ {
+ struct bnx2x *bp = container_of(work, struct bnx2x, 

[Kernel-packages] [Bug 1832082] Re: bnx2x driver causes 100% CPU load

2019-06-19 Thread Przemyslaw Hausman
** Information type changed from Private to Public

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

Status in linux package in Ubuntu:
  New

Bug description:
  For the customer OpenStack deployment we deploy infra nodes on Dell
  R630 servers. The servers have onboard Broadcom's NetXtreme II
  BCM57800 NIC (quad port: 2x1G ports, 2x10G ports). For each port in UP
  state, we observe 100% CPU load. So in total, we observe 4 CPUs with
  100% load.

  perf report shows function bnx2x_ptp_task taking up much of the CPUs
  time: https://pastebin.canonical.com/p/kfrpd6Pwh5/

  Also, /var/log/syslog contains the following outputs every few
  seconds:

  [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 
  [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 
  [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 
  [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped 

  So, the problem seems to be in a "timestampped" TX packet; the driver
  for some reason (to be yet understood) get an unexpected value from a
  register and then, it that same function, reschedule itself to try
  again this register read, read gets a bad value again, and so on
  infinitely.

  This is showing in the system as the 100% CPU usage kthreads; the
  message "The device supports only a single outstanding packet to
  timestamp, this packet will not be timestamped" happens because the
  driver can only timestamp a single TX packet at a time, and given it's
  stuck trying, it cannot accept another packet in this "queue".

  The infinite loop appears to be:

  static void bnx2x_ptp_task(struct work_struct *work) 
  { 
  struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task); 
  int port = BP_PORT(bp); 
  u32 val_seq; 
  u64 timestamp, ns; 
  struct skb_shared_hwtstamps shhwtstamps; 

  /* Read Tx timestamp registers */ 
  val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID : 
  NIG_REG_P0_TLLH_PTP_BUF_SEQID); 
  if (val_seq & 0x1) { 
  [...] 
  } else { 
  DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n"); 
  /* Reschedule to keep checking for a valid timestamp value */ 
  schedule_work(>ptp_task); 
  } 

  It appears that val_seq & 0x1 is never true, so the task
  constantly reschedules itself immediately. Instrumenting the function
  shows that it is being called in excess of 100,000 times per second.
  The REG_RD call does appear to be expensive (as it's a register read
  from the device) and shows high in the perf report, but that by itself
  doesn't appear to be the root cause (i.e., it's not hanging forever in
  the REG_RD).

  The cause appears to be that the driver is not prepared to deal with
  the PTP request never being completed by the hardware. It's unclear
  why it isn't completing, but regardless, the driver should not loop
  forever here.

  
  Additional info: 

  
  ubuntu@infra-1:~$ uname -a 
  Linux infra-1 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Lin 

  
  ubuntu@infra-1:~$ lspci | grep Broadcom 
  01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 
  01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 
  01:00.2 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 
  01:00.3 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme II 
BCM57800 1/10 Gigabit Ethernet (rev 10) 

  
  ubuntu@infra-1:~$ lspci -n | grep 01:00 
  01:00.0 0200: 14e4:168a (rev 10) 
  01:00.1 0200: 14e4:168a (rev 10) 
  01:00.2 0200: 14e4:168a (rev 10) 
  01:00.3 0200: 14e4:168a (rev 10) 

  
  ubuntu@infra-1:~/deploy$ sudo lshw -c network 
  *-network:0 
  description: Ethernet interface 
  product: NetXtreme II BCM57800 1/10 Gigabit Ethernet 
  vendor: Broadcom Inc. and subsidiaries 
  physical id: 0 
  bus info: pci@:01:00.0 
  logical name: eno1 
  version: 10 
  serial: 42:39:92:e0:66:b6 
  size: 10Gbit/s 
  capacity: 10Gbit/s 
  width: 64 bits 
  clock: 33MHz 
  capabilities: pm vpd msi msix pciexpress bus_master cap_list rom ethernet 
physical tp 100bt 100bt-fd 1000bt-fd 1bt-fd autonegotiation 
  configuration: autonegotiation=on broadcast=yes driver=bnx2x 
driverversion=1.712.30-0 duplex=full firmware=FFV14.10.07 bc 7.14.11 phy 1.45 
latency=0 link=yes multicast=yes port=twisted pair slave=yes