[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-02-17 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-88.88

---
linux (4.15.0-88.88) bionic; urgency=medium

  * bionic/linux: 4.15.0-88.88 -proposed tracker (LP: #1862824)

  * Segmentation fault (kernel oops) with memory-hotplug in
ubuntu_kernel_selftests on Bionic kernel (LP: #1862312)
- Revert "mm/memory_hotplug: fix online/offline_pages called w.o.
  mem_hotplug_lock"
- mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

linux (4.15.0-87.87) bionic; urgency=medium

  * bionic/linux: 4.15.0-87.87 -proposed tracker (LP: #1861165)

  * Bionic update: upstream stable patchset 2020-01-22 (LP: #1860602)
- scsi: lpfc: Fix discovery failures when target device connectivity bounces
- scsi: mpt3sas: Fix clear pending bit in ioctl status
- scsi: lpfc: Fix locking on mailbox command completion
- Input: atmel_mxt_ts - disable IRQ across suspend
- iommu/tegra-smmu: Fix page tables in > 4 GiB memory
- scsi: target: compare full CHAP_A Algorithm strings
- scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
- scsi: csiostor: Don't enable IRQs too early
- powerpc/pseries: Mark accumulate_stolen_time() as notrace
- powerpc/pseries: Don't fail hash page table insert for bolted mapping
- powerpc/tools: Don't quote $objdump in scripts
- dma-debug: add a schedule point in debug_dma_dump_mappings()
- clocksource/drivers/asm9260: Add a check for of_clk_get
- powerpc/security/book3s64: Report L1TF status in sysfs
- powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
- ext4: update direct I/O read lock pattern for IOCB_NOWAIT
- jbd2: Fix statistics for the number of logged blocks
- scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and 
WRITE(6)
- scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
- f2fs: fix to update dir's i_pino during cross_rename
- clk: qcom: Allow constant ratio freq tables for rcg
- irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
- irqchip: ingenic: Error out if IRQ domain creation failed
- fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned 
long
- scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
- scsi: ufs: fix potential bug which ends in system hang
- powerpc/pseries/cmm: Implement release() function for sysfs device
- powerpc/security: Fix wrong message when RFI Flush is disable
- scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE
- clk: pxa: fix one of the pxa RTC clocks
- bcache: at least try to shrink 1 node in bch_mca_scan()
- HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
- libnvdimm/btt: fix variable 'rc' set but not used
- HID: Improve Windows Precision Touchpad detection.
- scsi: pm80xx: Fix for SATA device discovery
- scsi: ufs: Fix error handing during hibern8 enter
- scsi: scsi_debug: num_tgts must be >= 0
- scsi: NCR5380: Add disconnect_mask module parameter
- scsi: iscsi: Don't send data to unbound connection
- scsi: target: iscsi: Wait for all commands to finish before freeing a
  session
- gpio: mpc8xxx: Don't overwrite default irq_set_type callback
- apparmor: fix unsigned len comparison with less than zero
- scripts/kallsyms: fix definitely-lost memory leak
- cdrom: respect device capabilities during opening action
- perf script: Fix brstackinsn for AUXTRACE
- perf regs: Make perf_reg_name() return "unknown" instead of NULL
- s390/zcrypt: handle new reply code FILTERED_BY_HYPERVISOR
- libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h
- s390/cpum_sf: Check for SDBT and SDB consistency
- ocfs2: fix passing zero to 'PTR_ERR' warning
- kernel: sysctl: make drop_caches write-only
- userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
- x86/mce: Fix possibly incorrect severity calculation on AMD
- net, sysctl: Fix compiler warning when only cBPF is present
- netfilter: nf_queue: enqueue skbs with NULL dst
- ALSA: hda - Downgrade error message for single-cmd fallback
- bonding: fix active-backup transition after link failure
- perf strbuf: Remove redundant va_end() in strbuf_addv()
- Make filldir[64]() verify the directory entry filename is valid
- filldir[64]: remove WARN_ON_ONCE() for bad directory entries
- netfilter: ebtables: compat: reject all padding in matches/watchers
- 6pack,mkiss: fix possible deadlock
- netfilter: bridge: make sure to pull arp header in br_nf_forward_arp()
- inetpeer: fix data-race in inet_putpeer / inet_putpeer
- net: add a READ_ONCE() in skb_peek_tail()
- net: icmp: fix data-race in cmp_global_allow()
- hrtimer: Annotate lockless access to timer->state
- spi: fsl: don't map irq during probe
- tty/serial: atmel: fix out of range clock divider handling
- pinctrl: baytrail: 

[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-02-13 Thread Guilherme G. Piccoli
Thank you Khaled! I've just checked the source from bionic-proposed
(4.15.0-88) and the patch is there, as expected - I've verified that
comparing with the upstream patch, it's all correct.

Given we don't have the HW and the bug reporter seems to be unable to
verify it (and the patch is self-contained/straightforward), we can
consider it ready to go!

Cheers,


Guilherme

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-02-12 Thread Khaled El Mously
Thanks @gpiccoli
Doing the same for bionic, based on the same logic as comment #12

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-02-03 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
bionic' to 'verification-done-bionic'. If the problem still exists,
change the tag 'verification-needed-bionic' to 'verification-failed-
bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-01-27 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 5.0.0-40.44

---
linux (5.0.0-40.44) disco; urgency=medium

  * disco/linux: 5.0.0-40.44 -proposed tracker (LP: #1859724)

  * use-after-free in i915_ppgtt_close (LP: #1859522) // CVE-2020-7053
- SAUCE: drm/i915: Fix use-after-free when destroying GEM context

  * CVE-2019-14615
- drm/i915/gen9: Clear residual context state on context switch

  * System hang with kernel traces while entering reboot process on a Disco
ARM64 moonshot node (LP: #1859582)
- Revert "RDMA/cm: Fix memory leak in cm_add/remove_one"

linux (5.0.0-39.43) disco; urgency=medium

  * disco/linux: 5.0.0-39.43 -proposed tracker (LP: #1858547)

  * [Regression] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
(LP: #1856608)
- SAUCE: Revert "usb: handle warm-reset port requests on hub resume"

  * PAN is broken for execute-only user mappings on ARMv8 (LP: #1858815)
- arm64: Revert support for execute-only user mappings

  * Fix unusable USB hub on Dell TB16 after S3 (LP: #1855312)
- SAUCE: USB: core: Make port power cycle a seperate helper function
- SAUCE: USB: core: Attempt power cycle port when it's in eSS.Disabled state

  * [sas-1126]scsi: hisi_sas: Fix out of bound at debug_I_T_nexus_reset()
(LP: #1853992)
- scsi: hisi_sas: Fix out of bound at debug_I_T_nexus_reset()

  * [sas-1126]scsi: hisi_sas: Assign NCQ tag for all NCQ commands (LP: #1853995)
- scsi: hisi_sas: Assign NCQ tag for all NCQ commands

  * [sas-1126]scsi: hisi_sas: Fix the conflict between device gone and host
reset (LP: #1853997)
- scsi: hisi_sas: Fix the conflict between device gone and host reset

  * scsi: hisi_sas: Check sas_port before using it (LP: #1855952)
- scsi: hisi_sas: Check sas_port before using it

  * CVE-2019-18885
- btrfs: refactor btrfs_find_device() take fs_devices as argument
- btrfs: merge btrfs_find_device and find_device

  *  Integrate Intel SGX driver into linux-azure (LP: #1844245)
- [Packaging] Add systemd service to load intel_sgx

  * [SRU][B/OEM-B/OEM-OSP1/D/E/F] Add LG I2C touchscreen multitouch support
(LP: #1857541)
- SAUCE: HID: multitouch: Add LG MELF0410 I2C touchscreen support

  * cifs: DFS Caching feature causing problems traversing multi-tier DFS setups
(LP: #1854887)
- cifs: Fix retrieval of DFS referrals in cifs_mount()

  * qede driver causes 100% CPU load (LP: #1855409)
- qede: Handle infinite driver spinning for Tx timestamp.

  * [roce-1126]RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
(LP: #1853989)
- RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
- RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver

  * [roce-1126]RDMA/hns: Fixs hw access invalid dma memory error (LP: #1853990)
- RDMA/hns: Fixs hw access invalid dma memory error

  * [hns-1126]net: hns3: revert to old channel when setting new channel num fail
(LP: #1853983)
- net: hns3: revert to old channel when setting new channel num fail

  * [hns-1126]net: hns3: fix port setting handle for fibre port
(LP: #1853984)
- net: hns3: fix port setting handle for fibre port

  * [hns-1126] net: hns: add support for vlan TSO (LP: #1853937)
- net: hns: add support for vlan TSO

  * [hns-1126]net: hns3: fix flow control configure issue for fibre port
(LP: #1853948)
- net: hns3: fix flow control configure issue for fibre port

  * mce: ras:  When inject 1bit ecc error,  there is no mce log recorded in the
dmesg (LP: #1857413)
- RAS/CEC: Increment cec_entered under the mutex lock
- RAS/CEC: Check count_threshold unconditionally

  * efivarfs test in ubuntu_kernel_selftest failed on the second run
(LP: #1809704)
- selftests/efivarfs: clean up test files from test_create*()

  * CVE-2019-19082
- drm/amd/display: prevent memory leak

  * CVE-2019-19078
- ath10k: fix memory leak

  * CVE-2019-19077
- RDMA: Fix goto target to release the allocated memory

  * Disco update: upstream stable patchset 2019-12-17 (LP: #1856754)
- rsi: release skb if rsi_prepare_beacon fails
- arm64: tegra: Fix 'active-low' warning for Jetson TX1 regulator
- sparc64: implement ioremap_uc
- lp: fix sparc64 LPSETTIMEOUT ioctl
- usb: gadget: u_serial: add missing port entry locking
- tty: serial: fsl_lpuart: use the sg count from dma_map_sg
- tty: serial: msm_serial: Fix flow control
- serial: pl011: Fix DMA ->flush_buffer()
- serial: serial_core: Perform NULL checks for break_ctl ops
- serial: ifx6x60: add missed pm_runtime_disable
- autofs: fix a leak in autofs_expire_indirect()
- RDMA/hns: Correct the value of HNS_ROCE_HEM_CHUNK_LEN
- iwlwifi: pcie: don't consider IV len in A-MSDU
- exportfs_decode_fh(): negative pinned may become positive without the 
parent
  locked
- audit_get_nd(): don't unlock parent too early
- NFC: nxp-nci: Fix NULL pointer 

[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-01-24 Thread Guilherme G. Piccoli
Unfortunately seems we don't have hardware to fully verify the patch.
But given it's a minor fix, restricted to a single driver and upstream
for a while (also, it's on E/F already), I consider that safe to ship
without full verification.

Also, I've download the Disco kernel source, and validated the fix against the 
upstream patch - all fine. Hence, I'm marking Disco as verified here.
Cheers,


Guilherme

** Tags removed: verification-needed-disco
** Tags added: verification-done-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-01-21 Thread Khaled El Mously
I reached out to phausman on #canonical-support

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-01-10 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
disco' to 'verification-done-disco'. If the problem still exists, change
the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-01-07 Thread Kleber Sacilotto de Souza
** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => Fix Committed

** Changed in: linux (Ubuntu Disco)
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2020-01-07 Thread Stefan Bader
** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Disco)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-18 Thread Guilherme G. Piccoli
SRU submitted today: 
https://lists.ubuntu.com/archives/kernel-team/2019-December/106479.html
Cheers,

Guilherme

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  [Impact]

  * The PTP feature in qede driver is implemented in a way that if the
  NIC firmware takes some time to perform the timestamping then the PTP
  worker function will reschedule itself indefinitely until the value
  read from a device register is meaningful. With that behavior, if an
  userspace tool requests a bad configured TX/RX filter (or if NIC
  firmware has any other issue in timestamping), the function
  qede_ptp_task() will reschedule itself forever and cause an unbound
  resource consumption. This manifests as a kworker thread consuming
  100% of CPU.

  * The dmesg log will show a message like this:
  "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"

  Also, by using perf user can observe a stack like the following:
  - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

  * The patch proposed in this SRU request refactors the PTP worked in
  qede by adding a time limit, after which the task doesn't reschedule
  itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
  Handle infinite driver spinning for Tx timestamp.")
  http://git.kernel.org/linus/9adebac37e7d

  Besides fixing the issue, it also adds an ethtool statistics for
  accounting the PTP errors.

  [Test case]

  By using chrony in Bionic, the following steps will reproduce the
  issue:

  a) Install chrony on Bionic in a system with working NIC managed by qede;
  b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
  c) Restart chrony service

  Check dmesg for the "[...]Timestamping in progress" message and the
  overall CPU workload using a tool like "top" to observe a kthread
  consuming 100% of CPU.

  [Regression potential]

  The patch scope is restricted to qede PTP handler, and is upstream for
  more than 7 months. If there's any possibility of regressions, the
  worst would be an issue affecting the packet timestamping, not messing
  with the regular xmit path of the driver.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-18 Thread Guilherme G. Piccoli
** Description changed:

- This bug is similar to #1832082 (bnx2x driver causes 100% CPU load) but
- applies for qede driver instead of bnx2x. The symptoms are the same:
+ [Impact]
  
- With chrony installed, and configured with "hwtimestamp *", I observe
- 100% CPU load on 2 CPU cores.
+ * The PTP feature in qede driver is implemented in a way that if the NIC
+ firmware takes some time to perform the timestamping then the PTP worker
+ function will reschedule itself indefinitely until the value read from a
+ device register is meaningful. With that behavior, if an userspace tool
+ requests a bad configured TX/RX filter (or if NIC firmware has any other
+ issue in timestamping), the function qede_ptp_task() will reschedule
+ itself forever and cause an unbound resource consumption. This manifests
+ as a kworker thread consuming 100% of CPU.
  
- Running perf report shows that kernel is busy executing qede_ptp_task
- function in qede driver.
+ * The dmesg log will show a message like this:
+ "qede_ptp_tx_ts:533(eno3)]Timestamping in progress"
  
- A workaround is to disable "hwtimestamp *" in chrony configuration.
- 
- ---
- 
- $ modinfo qede
- filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
- version:8.10.10.21
- license:GPL
- description:QLogic FastLinQ 4 Ethernet Driver
- srcversion: D5EC89D815FC81B973EE9F0
- alias:  pci:v1077d8090sv*sd*bc*sc*i*
- alias:  pci:v1077d8070sv*sd*bc*sc*i*
- alias:  pci:v1077d1664sv*sd*bc*sc*i*
- alias:  pci:v1077d1656sv*sd*bc*sc*i*
- alias:  pci:v1077d1654sv*sd*bc*sc*i*
- alias:  pci:v1077d1644sv*sd*bc*sc*i*
- alias:  pci:v1077d1636sv*sd*bc*sc*i*
- alias:  pci:v1077d1666sv*sd*bc*sc*i*
- alias:  pci:v1077d1634sv*sd*bc*sc*i*
- depends:ptp,qed
- retpoline:  Y
- intree: Y
- name:   qede
- vermagic:   4.15.0-72-generic SMP mod_unload 
- signat: PKCS#7
- signer: 
- sig_key:
- sig_hashalgo:   md4
- parm:   debug: Default debug msglevel (uint)
- 
- 
- $ uname -a
- Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux
- 
- 
- $ lspci | grep -i ether
- 19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
- 19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
- 19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
- 19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
- 
- 
- # perf report snippet:
- 
-   Children  Self  Command  Shared Object
- -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
+ Also, by using perf user can observe a stack like the following:
+ - 44.76% 0.00% kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending
+ 
+ * The patch proposed in this SRU request refactors the PTP worked in
+ qede by adding a time limit, after which the task doesn't reschedule
+ itself anymore, failing the timestamp procedure: 9adebac37e7d ("qede:
+ Handle infinite driver spinning for Tx timestamp.")
+ http://git.kernel.org/linus/9adebac37e7d
+ 
+ Besides fixing the issue, it also adds an ethtool statistics for
+ accounting the PTP errors.
+ 
+ [Test case]
+ 
+ By using chrony in Bionic, the following steps will reproduce the issue:
+ 
+ a) Install chrony on Bionic in a system with working NIC managed by qede;
+ b) Edit chrony configuration and add: "hwtimestamp *" to the top of its conf 
file;
+ c) Restart chrony service
+ 
+ Check dmesg for the "[...]Timestamping in progress" message and the
+ overall CPU workload using a tool like "top" to observe a kthread
+ consuming 100% of CPU.
+ 
+ [Regression potential]
+ 
+ The patch scope is restricted to qede PTP handler, and is upstream for
+ more than 7 months. If there's any possibility of regressions, the worst
+ would be an issue affecting the packet timestamping, not messing with
+ the regular xmit path of the driver.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid

[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-18 Thread Guilherme G. Piccoli
Hi Przemyslaw, thank you for the great report and debug! I'll proceed with the 
SRU proposal.
Cheers,

Guilherme

** Tags added: disco sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-18 Thread Przemyslaw Hausman
@gpiccoli, I can now confirm that I've been running the system with the
test kernel from your PPA for a day and the issue did not appear
anymore. Looks like the upstream commit fixes the problem.

Thanks a lot for figuring it out and for providing the test kernel!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-17 Thread Guilherme G. Piccoli
The kernel commit 9adebac37e7d ("qede: Handle infinite driver spinning
for Tx timestamp.") is present in kernels 5.3 and subsequent ones, and
after some preliminary results from @phausman, seems the commit properly
fixes the issue.

Hence, we will start the SRU process for Bionic/Disco.
Cheers,


Guilherme

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-17 Thread Guilherme G. Piccoli
** Changed in: linux (Ubuntu Focal)
   Status: Incomplete => Fix Released

** Changed in: linux (Ubuntu Eoan)
   Status: Incomplete => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-12 Thread Guilherme G. Piccoli
I've built a kernel for Bionic with the commit 9adebac37e7d ("qede: Handle 
infinite driver spinning for Tx timestamp.") in order to validate if such patch 
fixes or at least alleviate the symptom.
It's built on the following PPA: 
https://launchpad.net/~gpiccoli/+archive/ubuntu/test1855409

@phausman or anybody affected, I appreciate if you can give it a try, I
don't have the HW currently for the experiment.

Thanks,


Guilherme

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  Incomplete
Status in linux source package in Focal:
  Incomplete

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-09 Thread Guilherme G. Piccoli
Thanks for the report @phausman! I've checked mainline and there's a
commit that matches your report, we should try this one: 9adebac37e7d
("qede: Handle infinite driver spinning for Tx timestamp.") [0].

This is present in Eoan and Focal, but not in Bionic/Disco. After we
confirm this is the fix we can proceed with the SRU - in case this patch
doesn't help to alleviate the problem, then Eoan/Focal will be targeted
too and we'll need to think in a proper solution.

Notice Xenial GA kernel (v4.4) does not have PTP support in qede driver,
so this issue doesn't apply to Xenial (and once it's fixed in Bionic
it'll reach organically Xenial-HWE kernel).

I'll work in building a 4.15 kernel for Bionic with the aforementioned fix, in 
order you can test.
Cheers,


Guilherme


[0] http://git.kernel.org/linus/9adebac37e7d

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  New
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  New
Status in linux source package in Focal:
  New

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-09 Thread Guilherme G. Piccoli
** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
   Status: Incomplete

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Disco)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Eoan)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Focal)
 Assignee: (unassigned) => Guilherme G. Piccoli (gpiccoli)

** Changed in: linux (Ubuntu Focal)
   Status: Incomplete => New

** Changed in: linux (Ubuntu Disco)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Xenial)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  New
Status in linux source package in Xenial:
  Invalid
Status in linux source package in Bionic:
  Confirmed
Status in linux source package in Disco:
  Confirmed
Status in linux source package in Eoan:
  New
Status in linux source package in Focal:
  New

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1855409] Re: qede driver causes 100% CPU load

2019-12-06 Thread Przemyslaw Hausman
** Attachment added: "perf-report.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+attachment/5310185/+files/perf-report.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1855409

Title:
  qede driver causes 100% CPU load

Status in linux package in Ubuntu:
  New

Bug description:
  This bug is similar to #1832082 (bnx2x driver causes 100% CPU load)
  but applies for qede driver instead of bnx2x. The symptoms are the
  same:

  With chrony installed, and configured with "hwtimestamp *", I observe
  100% CPU load on 2 CPU cores.

  Running perf report shows that kernel is busy executing qede_ptp_task
  function in qede driver.

  A workaround is to disable "hwtimestamp *" in chrony configuration.

  ---

  $ modinfo qede
  filename:   
/lib/modules/4.15.0-72-generic/kernel/drivers/net/ethernet/qlogic/qede/qede.ko
  version:8.10.10.21
  license:GPL
  description:QLogic FastLinQ 4 Ethernet Driver
  srcversion: D5EC89D815FC81B973EE9F0
  alias:  pci:v1077d8090sv*sd*bc*sc*i*
  alias:  pci:v1077d8070sv*sd*bc*sc*i*
  alias:  pci:v1077d1664sv*sd*bc*sc*i*
  alias:  pci:v1077d1656sv*sd*bc*sc*i*
  alias:  pci:v1077d1654sv*sd*bc*sc*i*
  alias:  pci:v1077d1644sv*sd*bc*sc*i*
  alias:  pci:v1077d1636sv*sd*bc*sc*i*
  alias:  pci:v1077d1666sv*sd*bc*sc*i*
  alias:  pci:v1077d1634sv*sd*bc*sc*i*
  depends:ptp,qed
  retpoline:  Y
  intree: Y
  name:   qede
  vermagic:   4.15.0-72-generic SMP mod_unload 
  signat: PKCS#7
  signer: 
  sig_key:
  sig_hashalgo:   md4
  parm:   debug: Default debug msglevel (uint)

  
  $ uname -a
  Linux dcn1-clm-inf-1 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 
2019 x86_64 x86_64 x86_64 GNU/Linux

  
  $ lspci | grep -i ether
  19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)
  19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 
10/25/40/50GbE Controller (rev 02)

  
  # perf report snippet:

Children  Self  Command  Shared Object
  -   44.76% 0.00%  kworker/16:5 [kernel.kallsyms]
   ret_from_fork
 - kthread
- 44.74% worker_thread
   - 44.57% process_one_work
  - 42.67% qede_ptp_task
 - 38.86% qed_ptp_hw_read_tx_ts
  qed_rd
 - 3.03% queue_work_on
- 2.06% __queue_work
   - 0.68% get_work_pool
  - 0.61% radix_tree_lookup
   __radix_tree_lookup
0.50% set_work_pool_and_clear_pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1855409/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp