Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-09-04 Thread Salvatore Bonaccorso
Hi Dirk,

On Fri, Sep 04, 2020 at 05:51:58PM +0200, Dirk Kostrewa wrote:
> Hi Salvatore,
> 
> meanwhile, Dell has replaced the mainboard of my laptop, and after that,
> both the USB over-current kernel messages and the kworker processes with
> high CPU load are gone.
> 
> Many thanks for caring about my bug report!

Thanks for reporting back! So I'm closing as well the bugreport as
thre is nothing to be done on Linux side for it.

Glad if it was of help.

Regards,
Salvatore



Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-09-04 Thread Dirk Kostrewa

Hi Salvatore,

meanwhile, Dell has replaced the mainboard of my laptop, and after that, 
both the USB over-current kernel messages and the kworker processes with 
high CPU load are gone.


Many thanks for caring about my bug report!

Best regards,

Dirk.

Am 29.08.20 um 11:26 schrieb Salvatore Bonaccorso:

Hi Dirk,

Thanks for testing that.

On Sat, Aug 29, 2020 at 11:04:43AM +0200, Dirk Kostrewa wrote:

Hi Salvatore,

I have enabled the verbose debugging mode on the command line and have
appended the first 5000 lines of the dmesg output to this e-mail, running
the current kernel from the Buster backports with the two kworker processes
with high CPU load present.

After that, I have applied your patch to this kernel and rebooted with the
patched kernel:

5.7.0-0.bpo.2-amd64 #1 SMP Debian 5.7.10-1~bpo10+1a~test (2020-08-28) x86_64
GNU/Linux

With your patch applied, the two kworker processes with high CPU load
completely disappeared!

Unfortunately I suspect this indicates either a HW fault or a HW
design error as stated in the found kernel-thread which was just
uncovered by the mentioned kernel fix (which we temporarily reverted
with the patch). I can try to ask Alan Stern.

There might be a workaround workarble for you, the issue should
disapear if you prevent the system to automatically try to suspend
usb2 root hub (but you have the same on usb1 root hub).

# echo on >/sys/bus/usb/devices/usb2/power/control

will do that for the usb2 root hub.

Regards,
Salvatore




Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-29 Thread Salvatore Bonaccorso
Hi Dirk,

Thanks for testing that.

On Sat, Aug 29, 2020 at 11:04:43AM +0200, Dirk Kostrewa wrote:
> Hi Salvatore,
> 
> I have enabled the verbose debugging mode on the command line and have
> appended the first 5000 lines of the dmesg output to this e-mail, running
> the current kernel from the Buster backports with the two kworker processes
> with high CPU load present.
> 
> After that, I have applied your patch to this kernel and rebooted with the
> patched kernel:
> 
> 5.7.0-0.bpo.2-amd64 #1 SMP Debian 5.7.10-1~bpo10+1a~test (2020-08-28) x86_64
> GNU/Linux
> 
> With your patch applied, the two kworker processes with high CPU load
> completely disappeared!

Unfortunately I suspect this indicates either a HW fault or a HW
design error as stated in the found kernel-thread which was just
uncovered by the mentioned kernel fix (which we temporarily reverted
with the patch). I can try to ask Alan Stern.

There might be a workaround workarble for you, the issue should
disapear if you prevent the system to automatically try to suspend
usb2 root hub (but you have the same on usb1 root hub).

# echo on >/sys/bus/usb/devices/usb2/power/control

will do that for the usb2 root hub.

Regards,
Salvatore



Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-29 Thread Dirk Kostrewa

Hi Salvatore,

I have enabled the verbose debugging mode on the command line and have 
appended the first 5000 lines of the dmesg output to this e-mail, 
running the current kernel from the Buster backports with the two 
kworker processes with high CPU load present.


After that, I have applied your patch to this kernel and rebooted with 
the patched kernel:


5.7.0-0.bpo.2-amd64 #1 SMP Debian 5.7.10-1~bpo10+1a~test (2020-08-28) 
x86_64 GNU/Linux


With your patch applied, the two kworker processes with high CPU load 
completely disappeared!


A snapshot of the "top" command shows the following top 10 processes:

$ top
top - 10:54:43 up 5 min,  3 users,  load average: 0.18, 0.26, 0.13
Tasks: 225 total,   1 running, 224 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa, 0.0 hi,  0.0 si,  
0.0 st

MiB Mem :  15928.9 total,  14186.5 free,    900.8 used,    841.7 buff/cache
MiB Swap:  0.0 total,  0.0 free,  0.0 used. 14711.4 avail Mem

  PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ 
COMMAND
  344 root -51   0   0  0  0 S   0.3 0.0   0:00.10 
irq/134-iwlwifi
  425 root  20   0   0  0  0 I   0.3 0.0   0:00.09 
kworker/5:3-events
 1216 rtkit 21   1  152652   2856   2616 S   0.3 0.0   0:00.01 
rtkit-daemon
 1272 dirk  20   0   52376  17908   5460 S   0.3 0.1   0:00.02 
hp-systray

    1 root  20   0  169784  10436   7844 S   0.0 0.1   0:01.64 systemd
    2 root  20   0   0  0  0 S   0.0 0.0   0:00.00 
kthreadd

    3 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 rcu_gp
    4 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 
rcu_par_gp
    5 root  20   0   0  0  0 I   0.0 0.0   0:00.04 
kworker/0:0-events_+
    6 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 
kworker/0:0H-kblockd

...

Many thanks for looking after this issue and having found a fix for this!

Best regards,

Dirk


Am 28.08.20 um 16:33 schrieb Salvatore Bonaccorso:

hi Dirk,

On Wed, Aug 12, 2020 at 05:53:57PM +0200, Dirk Kostrewa wrote:

Hi Salvatore,

I just found out, that if none of the two USB ports is connected, there are
two kworker processes with permanently high CPU load, if one USB port is
connected and the other not, there is one such kworker process, and if both
USB ports are connected, there is no kworker process with high CPU load.
I think, this supports your suspicion that these kworker processes are
connected with the overcurrent condition for both USB ports that I also see
in the dmesg output.
What puzzles me, is that I've observed these oddly behaving kworker
processes also with the 5.6 kernel that I've tried from the Buster Backports
repository.

The kernel parameter variant did not work correctly as there are no
dynamic debug output afaics (the double quotes seem to placed in the
wrong place), please just try the setting at runtime instead:

# echo 'file drivers/usb/* +p;' > /sys/kernel/debug/dynamic_debug/control

What I was meaning is (and this is confirmed if you see the issue
issue as well with the more recent kernels), that the specified commit
actually uncovers the issue present possibly with the HW.

Similarly to you someone else, where in known case with faulty HW,
reported the following issue upstream:

https://lore.kernel.org/lkml/20200720083956.ga4...@dhcp22.suse.cz/

I would like to see if we can collect as much information as possible
and possibly crosscheck with upstream.

If build the kernel with the attached patch (that is with the commit
wich is supsected to uncover the issue), does then the issue goes
away?

You can folllow the quide in
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s4.2.2
for the "simple patching and building" and quickly chekcing a patch.

Regards,
Salvatore


dmesg.txt.gz
Description: application/gzip


Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-28 Thread Salvatore Bonaccorso
hi Dirk,

On Wed, Aug 12, 2020 at 05:53:57PM +0200, Dirk Kostrewa wrote:
> Hi Salvatore,
> 
> I just found out, that if none of the two USB ports is connected, there are
> two kworker processes with permanently high CPU load, if one USB port is
> connected and the other not, there is one such kworker process, and if both
> USB ports are connected, there is no kworker process with high CPU load.
> I think, this supports your suspicion that these kworker processes are
> connected with the overcurrent condition for both USB ports that I also see
> in the dmesg output.
> What puzzles me, is that I've observed these oddly behaving kworker
> processes also with the 5.6 kernel that I've tried from the Buster Backports
> repository.

The kernel parameter variant did not work correctly as there are no
dynamic debug output afaics (the double quotes seem to placed in the
wrong place), please just try the setting at runtime instead:

# echo 'file drivers/usb/* +p;' > /sys/kernel/debug/dynamic_debug/control

What I was meaning is (and this is confirmed if you see the issue
issue as well with the more recent kernels), that the specified commit
actually uncovers the issue present possibly with the HW.

Similarly to you someone else, where in known case with faulty HW,
reported the following issue upstream:

https://lore.kernel.org/lkml/20200720083956.ga4...@dhcp22.suse.cz/

I would like to see if we can collect as much information as possible
and possibly crosscheck with upstream.

If build the kernel with the attached patch (that is with the commit
wich is supsected to uncover the issue), does then the issue goes
away?

You can folllow the quide in
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#s4.2.2
for the "simple patching and building" and quickly chekcing a patch.

Regards,
Salvatore
>From 61ca5460d93d1a60a9aee0e46d51ae126593fda2 Mon Sep 17 00:00:00 2001
From: Salvatore Bonaccorso 
Date: Fri, 28 Aug 2020 16:31:04 +0200
Subject: [PATCH] Revert "xhci: prevent bus suspend if a roothub port detected
 a over-current condition"

This reverts commit d3ee95dedd88ed3fcd4647e4a8e265acaf27b2f0.
---
 drivers/usb/host/xhci-hub.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index a58ef53e4ae1..eb4284696f25 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1481,8 +1481,6 @@ int xhci_hub_status_data(struct usb_hcd *hcd, char *buf)
 		}
 		if ((temp & PORT_RC))
 			reset_change = true;
-		if (temp & PORT_OC)
-			status = 1;
 	}
 	if (!status && !reset_change) {
 		xhci_dbg(xhci, "%s: stopping port polling.\n", __func__);
@@ -1548,13 +1546,6 @@ int xhci_bus_suspend(struct usb_hcd *hcd)
  port_index);
 			goto retry;
 		}
-		/* bail out if port detected a over-current condition */
-		if (t1 & PORT_OC) {
-			bus_state->bus_suspended = 0;
-			spin_unlock_irqrestore(>lock, flags);
-			xhci_dbg(xhci, "Bus suspend bailout, port over-current detected\n");
-			return -EBUSY;
-		}
 		/* suspend ports in U0, or bail out for new connect changes */
 		if ((t1 & PORT_PE) && (t1 & PORT_PLS_MASK) == XDEV_U0) {
 			if ((t1 & PORT_CSC) && wake_enabled) {
-- 
2.28.0



Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-21 Thread Dirk Kostrewa

Hi Salvatore,

I just want to inform you that I've installed the recent kernel from the 
Buster backports, 5.7.0-0.bpo.2-amd64 #1 SMP Debian 5.7.10-1~bpo10+1 
(2020-07-30) x86_64 GNU/Linux, and I'm still seeing the two kworker 
processes with high CPU load, probably related to the two USB ports with 
over-current condition.


Regards,

Dirk.

Am 12.08.20 um 18:05 schrieb Salvatore Bonaccorso:

Hi,

Just commenting on the following:

On Wed, Aug 12, 2020 at 05:53:57PM +0200, Dirk Kostrewa wrote:
[...]

What puzzles me, is that I've observed these oddly behaving kworker
processes also with the 5.6 kernel that I've tried from the Buster Backports
repository.

The mentioned commit, is included in the following upstream versions
(relevant for Debian): v4.19.119 (so in buster), v5.6.8 (and so the
buster-backports kernel), v5.7-rc3.

Regards,
Salvatore




Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-13 Thread Dirk Kostrewa

Hi Salavatore,

I have kernel "linux-image-4.19.0-10-amd64/stable,now 4.19.132-1 amd64" 
installed, so it should already include the mentioned commit, if I 
understand correctly (I'm a bit confused by the two different version 
numbers used by Debian). I have also tried the most recent kernel 
"linux-image-5.6.0-0.bpo.2-amd64/buster-backports 5.6.14-2~bpo10+1 
amd64". For both kernels, I see the two kworker processes with high CPU 
load.


Regards,

Dirk.

Am 12.08.20 um 18:05 schrieb Salvatore Bonaccorso:

Hi,

Just commenting on the following:

On Wed, Aug 12, 2020 at 05:53:57PM +0200, Dirk Kostrewa wrote:
[...]

What puzzles me, is that I've observed these oddly behaving kworker
processes also with the 5.6 kernel that I've tried from the Buster Backports
repository.

The mentioned commit, is included in the following upstream versions
(relevant for Debian): v4.19.119 (so in buster), v5.6.8 (and so the
buster-backports kernel), v5.7-rc3.

Regards,
Salvatore




Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-12 Thread Salvatore Bonaccorso
Hi,

Just commenting on the following:

On Wed, Aug 12, 2020 at 05:53:57PM +0200, Dirk Kostrewa wrote:
[...]
> What puzzles me, is that I've observed these oddly behaving kworker
> processes also with the 5.6 kernel that I've tried from the Buster Backports
> repository.

The mentioned commit, is included in the following upstream versions
(relevant for Debian): v4.19.119 (so in buster), v5.6.8 (and so the
buster-backports kernel), v5.7-rc3.

Regards,
Salvatore



Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-12 Thread Dirk Kostrewa

Hi Salvatore,

I just found out, that if none of the two USB ports is connected, there 
are two kworker processes with permanently high CPU load, if one USB 
port is connected and the other not, there is one such kworker process, 
and if both USB ports are connected, there is no kworker process with 
high CPU load.
I think, this supports your suspicion that these kworker processes are 
connected with the overcurrent condition for both USB ports that I also 
see in the dmesg output.
What puzzles me, is that I've observed these oddly behaving kworker 
processes also with the 5.6 kernel that I've tried from the Buster 
Backports repository.


Cheers,

Dirk.

Am 12.08.20 um 13:02 schrieb Dirk Kostrewa:

Hi Salvatore,

yesterday, I installed the kernel 5.6.0 from the Buster Backports and 
saw again a kworker process with high CPU load.
Oddly, this morning, my laptop didn't boot, so I decided to do a fresh 
install of Debian Buster 10.5.0 (image with non-free firmware because 
of my wifi card) and installed only thunderbird and vim. There is 
still one kworker process with permanently high CPU load.


I gave the dyndbg command that you told me as a kernel parameter upon 
booting and have appended the dmesg output as file dmesg.txt.gz.


Cheers,

Dirk.

Am 11.08.20 um 21:21 schrieb Salvatore Bonaccorso:

Hi Dirk,

On Tue, Aug 11, 2020 at 12:58:15PM +0200, Dirk Kostrewa wrote:

Hi Salavatore,

as an additional control, I have completely uninstalled the nvidia 
graphics

driver and repeated the kworker observations using the nouveau graphics
driver with the kernel 4.19.0-10-amd64. This time, there are even two
kworker processes constantly running with high CPU load:

$ top
top - 12:37:20 up 10 min,  4 users,  load average: 2.79, 2.54, 1.56
Tasks: 197 total,   3 running, 194 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 24.2 sy,  0.0 ni, 74.2 id,  0.0 wa, 0.0 hi, 1.6 
si,  0.0

st
MiB Mem :  15889.4 total,  13964.7 free,    626.8 used, 1297.9 
buff/cache
MiB Swap:  0.0 total,  0.0 free,  0.0 used. 14849.1 
avail Mem


   PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ 
COMMAND

   164 root  20   0   0  0  0 R  80.0 0.0 8:41.67
kworker/6:2+pm
   455 root  20   0   0  0  0 R  80.0 0.0 8:28.23
kworker/2:2+pm
    22 root  20   0   0  0  0 S  20.0 0.0 2:14.82
ksoftirqd/2
    42 root  20   0   0  0  0 S  20.0 0.0 2:08.67
ksoftirqd/6
 1 root  20   0  169644  10212   7796 S   0.0 0.1 0:01.52 
systemd
 2 root  20   0   0  0  0 S   0.0 0.0 0:00.00 
kthreadd
 3 root   0 -20   0  0  0 I   0.0 0.0 0:00.00 
rcu_gp

 4 root   0 -20   0  0  0 I   0.0 0.0 0:00.00
rcu_par_gp
 6 root   0 -20   0  0  0 I   0.0 0.0 0:00.00
kworker/0:0H-kblockd
 7 root  20   0   0  0  0 I   0.0 0.0 0:00.05
kworker/u16:0-event+

The stacks of the two kworker processes show the same output:

[<0>] 0x

I have appended the top 5000 lines tracing as a compressed ascii file
out-cut.txt,gz and the dmesg output as compressed ascii file 
dmesg.txt.gz.


I hope, this helps to find out where the problem with the high CPU 
load of

the kworker processes come from.

Thanks this is very helpful.

I suspect what you are seeing is an issue with the usb hubport present
before but now uncovered due to the upstream change e9fb08d617bf
("xhci: prevent bus suspend if a roothub port detected a over-current
condition")[1], which was as well backported to v4.19.y in 4.19.119.

Can you add some dynamic debugging on the 'drivers/usb/'[2] ideally at
boot time. On runtime it is

# echo 'file drivers/usb/* +p;' > 
/sys/kernel/debug/dynamic_debug/control


or as kernel parameter to have enable the debug messages at boot time
already:

dyndbg="file drivers/usb/* +p;"

Can you attach the dmesg with the enabled debugging?

Regards,
Salvatore

  [1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9fb08d617bfae5471d902112667d0eeb9dee3c4
  [2] 
https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html




Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-12 Thread Dirk Kostrewa

Hi Salvatore,

yesterday, I installed the kernel 5.6.0 from the Buster Backports and 
saw again a kworker process with high CPU load.
Oddly, this morning, my laptop didn't boot, so I decided to do a fresh 
install of Debian Buster 10.5.0 (image with non-free firmware because of 
my wifi card) and installed only thunderbird and vim. There is still one 
kworker process with permanently high CPU load.


I gave the dyndbg command that you told me as a kernel parameter upon 
booting and have appended the dmesg output as file dmesg.txt.gz.


Cheers,

Dirk.

Am 11.08.20 um 21:21 schrieb Salvatore Bonaccorso:

Hi Dirk,

On Tue, Aug 11, 2020 at 12:58:15PM +0200, Dirk Kostrewa wrote:

Hi Salavatore,

as an additional control, I have completely uninstalled the nvidia graphics
driver and repeated the kworker observations using the nouveau graphics
driver with the kernel 4.19.0-10-amd64. This time, there are even two
kworker processes constantly running with high CPU load:

$ top
top - 12:37:20 up 10 min,  4 users,  load average: 2.79, 2.54, 1.56
Tasks: 197 total,   3 running, 194 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 24.2 sy,  0.0 ni, 74.2 id,  0.0 wa, 0.0 hi,  1.6 si,  0.0
st
MiB Mem :  15889.4 total,  13964.7 free,    626.8 used, 1297.9 buff/cache
MiB Swap:  0.0 total,  0.0 free,  0.0 used. 14849.1 avail Mem

   PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
   164 root  20   0   0  0  0 R  80.0 0.0   8:41.67
kworker/6:2+pm
   455 root  20   0   0  0  0 R  80.0 0.0   8:28.23
kworker/2:2+pm
    22 root  20   0   0  0  0 S  20.0 0.0   2:14.82
ksoftirqd/2
    42 root  20   0   0  0  0 S  20.0 0.0   2:08.67
ksoftirqd/6
     1 root  20   0  169644  10212   7796 S   0.0 0.1   0:01.52 systemd
     2 root  20   0   0  0  0 S   0.0 0.0   0:00.00 kthreadd
     3 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 rcu_gp
     4 root   0 -20   0  0  0 I   0.0 0.0   0:00.00
rcu_par_gp
     6 root   0 -20   0  0  0 I   0.0 0.0   0:00.00
kworker/0:0H-kblockd
     7 root  20   0   0  0  0 I   0.0 0.0   0:00.05
kworker/u16:0-event+

The stacks of the two kworker processes show the same output:

[<0>] 0x

I have appended the top 5000 lines tracing as a compressed ascii file
out-cut.txt,gz and the dmesg output as compressed ascii file dmesg.txt.gz.

I hope, this helps to find out where the problem with the high CPU load of
the kworker processes come from.

Thanks this is very helpful.

I suspect what you are seeing is an issue with the usb hubport present
before but now uncovered due to the upstream change e9fb08d617bf
("xhci: prevent bus suspend if a roothub port detected a over-current
condition")[1], which was as well backported to v4.19.y in 4.19.119.

Can you add some dynamic debugging on the 'drivers/usb/'[2] ideally at
boot time. On runtime it is

# echo 'file drivers/usb/* +p;' > /sys/kernel/debug/dynamic_debug/control

or as kernel parameter to have enable the debug messages at boot time
already:

dyndbg="file drivers/usb/* +p;"

Can you attach the dmesg with the enabled debugging?

Regards,
Salvatore

  [1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9fb08d617bfae5471d902112667d0eeb9dee3c4
  [2] 
https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html


dmesg.txt.gz
Description: application/gzip


Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-11 Thread Salvatore Bonaccorso
Hi Dirk,

On Tue, Aug 11, 2020 at 12:58:15PM +0200, Dirk Kostrewa wrote:
> Hi Salavatore,
> 
> as an additional control, I have completely uninstalled the nvidia graphics
> driver and repeated the kworker observations using the nouveau graphics
> driver with the kernel 4.19.0-10-amd64. This time, there are even two
> kworker processes constantly running with high CPU load:
> 
> $ top
> top - 12:37:20 up 10 min,  4 users,  load average: 2.79, 2.54, 1.56
> Tasks: 197 total,   3 running, 194 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.0 us, 24.2 sy,  0.0 ni, 74.2 id,  0.0 wa, 0.0 hi,  1.6 si,  0.0
> st
> MiB Mem :  15889.4 total,  13964.7 free,    626.8 used, 1297.9 buff/cache
> MiB Swap:  0.0 total,  0.0 free,  0.0 used. 14849.1 avail Mem
> 
>   PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
>   164 root  20   0   0  0  0 R  80.0 0.0   8:41.67
> kworker/6:2+pm
>   455 root  20   0   0  0  0 R  80.0 0.0   8:28.23
> kworker/2:2+pm
>    22 root  20   0   0  0  0 S  20.0 0.0   2:14.82
> ksoftirqd/2
>    42 root  20   0   0  0  0 S  20.0 0.0   2:08.67
> ksoftirqd/6
>     1 root  20   0  169644  10212   7796 S   0.0 0.1   0:01.52 systemd
>     2 root  20   0   0  0  0 S   0.0 0.0   0:00.00 kthreadd
>     3 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 rcu_gp
>     4 root   0 -20   0  0  0 I   0.0 0.0   0:00.00
> rcu_par_gp
>     6 root   0 -20   0  0  0 I   0.0 0.0   0:00.00
> kworker/0:0H-kblockd
>     7 root  20   0   0  0  0 I   0.0 0.0   0:00.05
> kworker/u16:0-event+
> 
> The stacks of the two kworker processes show the same output:
> 
> [<0>] 0x
> 
> I have appended the top 5000 lines tracing as a compressed ascii file
> out-cut.txt,gz and the dmesg output as compressed ascii file dmesg.txt.gz.
> 
> I hope, this helps to find out where the problem with the high CPU load of
> the kworker processes come from.

Thanks this is very helpful.

I suspect what you are seeing is an issue with the usb hubport present
before but now uncovered due to the upstream change e9fb08d617bf
("xhci: prevent bus suspend if a roothub port detected a over-current
condition")[1], which was as well backported to v4.19.y in 4.19.119.

Can you add some dynamic debugging on the 'drivers/usb/'[2] ideally at
boot time. On runtime it is 

# echo 'file drivers/usb/* +p;' > /sys/kernel/debug/dynamic_debug/control

or as kernel parameter to have enable the debug messages at boot time
already:

dyndbg="file drivers/usb/* +p;"

Can you attach the dmesg with the enabled debugging?

Regards,
Salvatore

 [1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9fb08d617bfae5471d902112667d0eeb9dee3c4
 [2] https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html



Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-11 Thread Dirk Kostrewa

Hi Salavatore,

as an additional control, I have completely uninstalled the nvidia 
graphics driver and repeated the kworker observations using the nouveau 
graphics driver with the kernel 4.19.0-10-amd64. This time, there are 
even two kworker processes constantly running with high CPU load:


$ top
top - 12:37:20 up 10 min,  4 users,  load average: 2.79, 2.54, 1.56
Tasks: 197 total,   3 running, 194 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us, 24.2 sy,  0.0 ni, 74.2 id,  0.0 wa, 0.0 hi,  1.6 si,  
0.0 st

MiB Mem :  15889.4 total,  13964.7 free,    626.8 used, 1297.9 buff/cache
MiB Swap:  0.0 total,  0.0 free,  0.0 used. 14849.1 avail Mem

  PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ 
COMMAND
  164 root  20   0   0  0  0 R  80.0 0.0   8:41.67 
kworker/6:2+pm
  455 root  20   0   0  0  0 R  80.0 0.0   8:28.23 
kworker/2:2+pm
   22 root  20   0   0  0  0 S  20.0 0.0   2:14.82 
ksoftirqd/2
   42 root  20   0   0  0  0 S  20.0 0.0   2:08.67 
ksoftirqd/6

    1 root  20   0  169644  10212   7796 S   0.0 0.1   0:01.52 systemd
    2 root  20   0   0  0  0 S   0.0 0.0   0:00.00 
kthreadd

    3 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 rcu_gp
    4 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 
rcu_par_gp
    6 root   0 -20   0  0  0 I   0.0 0.0   0:00.00 
kworker/0:0H-kblockd
    7 root  20   0   0  0  0 I   0.0 0.0   0:00.05 
kworker/u16:0-event+


The stacks of the two kworker processes show the same output:

[<0>] 0x

I have appended the top 5000 lines tracing as a compressed ascii file 
out-cut.txt,gz and the dmesg output as compressed ascii file dmesg.txt.gz.


I hope, this helps to find out where the problem with the high CPU load 
of the kworker processes come from.


Cheers,

Dirk.

Am 02.08.20 um 18:22 schrieb Salvatore Bonaccorso:

Hi Dirk,

On Sun, Aug 02, 2020 at 03:44:09PM +0200, Salvatore Bonaccorso wrote:

Control: tags -1 + moreinfo

Hi Dirk

On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:

Package: src:linux
Version: 4.19.132-1
Severity: normal

Dear Maintainer,

after booting the kernel 4.19.0-10-amd64, there is a kworker process running
with a permanent high CPU load of almost 90% as reported by the "top"
command:

$ top
top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  2.3 si,  0.0
st
MiB Mem :  15889.4 total,  14173.1 free,    889.3 used,    827.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  14677.7 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
    64 root      20   0       0      0      0 R  86.7   0.0 0:47.41
kworker/0:2+pm
     9 root      20   0       0      0      0 S  20.0   0.0 0:08.84
ksoftirqd/0
   364 root     -51   0       0      0      0 S   6.7   0.0 0:00.50
irq/126-nvidia
  1177 dirk      20   0 2921696 122848  94268 S   6.7   0.8 0:02.23 kwin_x11
     1 root      20   0  169652  10280   7740 S   0.0   0.1 0:01.56 systemd
     2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 kthreadd
...

The expected result after booting the kernel 4.19.0-10-amd64 is a kworker
process with a CPU load close to 0%.

As a control, booting the previous kernel 4.19.0-9-amd64 does not show a
high CPU load for the kworker process. Instead, the kworker CPU load
reported by the "top" command is 0.0%.

Therefore, I suspect a bug in the kernel 4.19.0-10-amd64.

Neither "dmesg" nor "journalctl -b" show any messages containing "kworker".

I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and libc6:amd64
2.28-10.

If you need more information, I would be happy to provide it.

To find out what could be the cause, could you have a look at
https://www.kernel.org/doc/html/latest/core-api/workqueue.html#debugging
this could help determining isolating why the kworker goes crazy.

Please as well to the above one additional thing: Can you reproduce
the issue when the kernel does not get tained? So without loading the
propriertary, out-of-tree modules.

This is particularly important if the issue can be tracked down, found
in upstream and needs to be reported upstream.

Regards,
Salvatore


dmesg.txt.gz
Description: application/gzip


out-cut.txt.gz
Description: application/gzip


Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-02 Thread Dirk Kostrewa

Hi Salvatore,

I have removed the xorg.conf with the Nvidia graphics driver and any 
nvidia-related *.conf files in /etc/modprobe.d/, and I have rebooted the 
laptop. The following output should show, that only the default nouveau 
driver is loaded:


# lsmod | grep nvidia

# lsmod | grep nouveau
nouveau  2179072  0
ttm   131072  1 nouveau
i2c_algo_bit   16384  2 i915,nouveau
drm_kms_helper    208896  2 i915,nouveau
mxm_wmi    16384  1 nouveau
drm   495616  12 drm_kms_helper,i915,ttm,nouveau
wmi    28672  6 
dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor,mxm_wmi,nouveau

video  45056  4 dell_wmi,dell_laptop,i915,nouveau
button 16384  1 nouveau

# lspci -k | egrep 'VGA|3D' -A2
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 
(rev 06)

    Subsystem: Dell HD Graphics 530
    Kernel driver in use: i915
--
01:00.0 3D controller: NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2)
    Subsystem: Dell GM107GLM [Quadro M1000M]
    Kernel driver in use: nouveau

# dmesg | grep -i nvidia
[    4.282530] nouveau :01:00.0: NVIDIA GM107 (117310a2)
[    4.547712] audit: type=1400 audit(1596389563.639:8): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="nvidia_modprobe" pid=543 comm="apparmor_parser"
[    4.547714] audit: type=1400 audit(1596389563.639:9): 
apparmor="STATUS" operation="profile_load" profile="unconfined" 
name="nvidia_modprobe//kmod" pid=543 comm="apparmor_parser"

[    5.944911] nvidia: loading out-of-tree module taints kernel.
[    5.944918] nvidia: module license 'NVIDIA' taints kernel.
[    5.949482] nvidia: module verification failed: signature and/or 
required key missing - tainting kernel
[    5.962949] nvidia-nvlink: Nvlink Core is being initialized, major 
device number 241
[    5.963181] NVRM: The NVIDIA probe routine was not called for 1 
device(s).

   NVRM: nouveau, rivafb, nvidiafb or rivatv
   NVRM: was loaded and obtained ownership of the NVIDIA 
device(s).

   NVRM: driver(s)), then try loading the NVIDIA kernel module
[    5.963182] NVRM: No NVIDIA graphics adapter probed!
[    6.005267] nvidia-nvlink: Unregistered the Nvlink Core, major device 
number 241
[    6.075128] nvidia-nvlink: Nvlink Core is being initialized, major 
device number 241
[    6.075448] NVRM: The NVIDIA probe routine was not called for 1 
device(s).

   NVRM: nouveau, rivafb, nvidiafb or rivatv
   NVRM: was loaded and obtained ownership of the NVIDIA 
device(s).

   NVRM: driver(s)), then try loading the NVIDIA kernel module
[    6.075449] NVRM: No NVIDIA graphics adapter probed!
[    6.097310] nvidia-nvlink: Unregistered the Nvlink Core, major device 
number 241


Apparently, the nvidia driver was loaded first, and after that, the 
nouveau driver took over.


Here is the "top" result, again with a permanent high CPU load for a 
kworker process:


# top
top - 19:50:57 up 18 min,  4 users,  load average: 1,26, 1,22, 0,93
Tasks: 198 total,   2 running, 196 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,0 us, 11,3 sy,  0,0 ni, 87,1 id,  0,0 wa,  0,0 hi, 1,6 si,  
0,0 st

MiB Mem :  15889,5 total,  13903,9 free,    808,5 used,   1177,0 buff/cache
MiB Swap:  0,0 total,  0,0 free,  0,0 used.  14617,1 avail Mem

  PID USER  PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
   72 root  20   0   0  0  0 R  86,7   0,0 15:23.97 
kworker/7:1+pm
   47 root  20   0   0  0  0 S  13,3   0,0 2:52.21 
ksoftirqd/7

  684 root  20   0  505356 126896 102732 S   6,7   0,8 0:20.77 Xorg
    1 root  20   0  169624  10312   7880 S   0,0   0,1 0:01.34 systemd
    2 root  20   0   0  0  0 S   0,0   0,0 0:00.00 
kthreadd


Here is the stack of PID 72:

# cat /proc/72/stack
[<0>] 0x

The file with a few seconds tracing, cut after line 5000 and compressed, 
is attached as "out-no-nvidia.txt.gz".


Please, let me know, whether my way of not loading the nvidia driver was 
sufficient or not. If it is required to completely uninstall the Nvidia 
driver for a really untainted system, I will do it, but would need more 
time for this.


Regards,

Dirk.

Am 02.08.20 um 18:22 schrieb Salvatore Bonaccorso:


Hi Dirk,

On Sun, Aug 02, 2020 at 03:44:09PM +0200, Salvatore Bonaccorso wrote:

Control: tags -1 + moreinfo

Hi Dirk

On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:

Package: src:linux
Version: 4.19.132-1
Severity: normal

Dear Maintainer,

after booting the kernel 4.19.0-10-amd64, there is a kworker process running
with a permanent high CPU load of almost 90% as reported by the "top"
command:

$ top
top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  

Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-02 Thread Dirk Kostrewa

Hi Salvatore,

thank you for taking care of this!

I first did the tracing for a few seconds, and I have appended the 
compressed output "out.txt.gz", cut after line 5000, to this e-mail. 
Since some "nvidia"-related processes also appear, I want to inform you 
that I have an Optimus laptop where the Nvidia GPU renders images and 
the integrated Intel GPU sends the images to the monitor, just in case.


I also tried the stack trace, but was not sure, whether I did it right - 
so, this is what I did:


# top

top - 16:29:42 up 7 min,  3 users, load average: 1,82, 1,52, 0,80
Tasks: 200 total,   2 running, 198 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,5 us, 12,4 sy,  0,0 ni, 86,5 id,  0,0 wa, 0,0 hi,  0,5 si,  
0,0 st

MiB Mem :  15889,4 total,  13390,9 free,   1263,3 used, 1235,2 buff/cache
MiB Swap:  0,0 total,  0,0 free,  0,0 used. 14265,6 avail Mem

  PID USER  PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ 
COMMAND
   70 root  20   0   0  0  0 R  84,0 0,0   6:23.39 
kworker/4:1+pm
   32 root  20   0   0  0  0 S  16,0 0,0   1:12.32 
ksoftirqd/4

  761 root  20   0  349132 104820  67088 S   3,7 0,6   0:21.44 Xorg
...

I saw the kworker process with PID 70 and thus looked at the stack of 
this process:


# cat /proc/70/stack
[<0>] usb_start_wait_urb+0x65/0x160 [usbcore]
[<0>] usb_control_msg+0xdd/0x140 [usbcore]
[<0>] set_port_feature+0x30/0x40 [usbcore]
[<0>] hub_suspend+0x1e3/0x250 [usbcore]
[<0>] usb_suspend_both+0x9d/0x230 [usbcore]
[<0>] usb_runtime_suspend+0x2a/0x70 [usbcore]
[<0>] __rpm_callback+0xc7/0x200
[<0>] rpm_callback+0x1f/0x70
[<0>] rpm_suspend+0x138/0x670
[<0>] __pm_runtime_suspend+0x41/0x80
[<0>] usb_runtime_idle+0x2d/0x40 [usbcore]
[<0>] __rpm_callback+0xc7/0x200
[<0>] rpm_idle+0xa5/0x310
[<0>] pm_runtime_work+0x73/0x90
[<0>] process_one_work+0x1a7/0x3a0
[<0>] worker_thread+0x30/0x390
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x35/0x40
[<0>] 0x

I hope, this was right. If I can give you any more information, please, 
let me know.


Regards,

Dirk.

Am 02.08.20 um 15:44 schrieb Salvatore Bonaccorso:

Control: tags -1 + moreinfo

Hi Dirk

On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:

Package: src:linux
Version: 4.19.132-1
Severity: normal

Dear Maintainer,

after booting the kernel 4.19.0-10-amd64, there is a kworker process running
with a permanent high CPU load of almost 90% as reported by the "top"
command:

$ top
top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  2.3 si,  0.0
st
MiB Mem :  15889.4 total,  14173.1 free,    889.3 used,    827.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  14677.7 avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
    64 root      20   0       0      0      0 R  86.7   0.0 0:47.41
kworker/0:2+pm
     9 root      20   0       0      0      0 S  20.0   0.0 0:08.84
ksoftirqd/0
   364 root     -51   0       0      0      0 S   6.7   0.0 0:00.50
irq/126-nvidia
  1177 dirk      20   0 2921696 122848  94268 S   6.7   0.8 0:02.23 kwin_x11
     1 root      20   0  169652  10280   7740 S   0.0   0.1 0:01.56 systemd
     2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 kthreadd
...

The expected result after booting the kernel 4.19.0-10-amd64 is a kworker
process with a CPU load close to 0%.

As a control, booting the previous kernel 4.19.0-9-amd64 does not show a
high CPU load for the kworker process. Instead, the kworker CPU load
reported by the "top" command is 0.0%.

Therefore, I suspect a bug in the kernel 4.19.0-10-amd64.

Neither "dmesg" nor "journalctl -b" show any messages containing "kworker".

I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and libc6:amd64
2.28-10.

If you need more information, I would be happy to provide it.

To find out what could be the cause, could you have a look at
https://www.kernel.org/doc/html/latest/core-api/workqueue.html#debugging
this could help determining isolating why the kworker goes crazy.

Regards,
Salvatore


out.txt.gz
Description: application/gzip


Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-02 Thread Salvatore Bonaccorso
Hi Dirk,

On Sun, Aug 02, 2020 at 03:44:09PM +0200, Salvatore Bonaccorso wrote:
> Control: tags -1 + moreinfo
> 
> Hi Dirk
> 
> On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:
> > Package: src:linux
> > Version: 4.19.132-1
> > Severity: normal
> > 
> > Dear Maintainer,
> > 
> > after booting the kernel 4.19.0-10-amd64, there is a kworker process running
> > with a permanent high CPU load of almost 90% as reported by the "top"
> > command:
> > 
> > $ top
> > top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
> > Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
> > %Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  2.3 si,  0.0
> > st
> > MiB Mem :  15889.4 total,  14173.1 free,    889.3 used,    827.0 buff/cache
> > MiB Swap:      0.0 total,      0.0 free,      0.0 used.  14677.7 avail Mem
> > 
> >   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
> >    64 root      20   0       0      0      0 R  86.7   0.0 0:47.41
> > kworker/0:2+pm
> >     9 root      20   0       0      0      0 S  20.0   0.0 0:08.84
> > ksoftirqd/0
> >   364 root     -51   0       0      0      0 S   6.7   0.0 0:00.50
> > irq/126-nvidia
> >  1177 dirk      20   0 2921696 122848  94268 S   6.7   0.8 0:02.23 kwin_x11
> >     1 root      20   0  169652  10280   7740 S   0.0   0.1 0:01.56 systemd
> >     2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 kthreadd
> > ...
> > 
> > The expected result after booting the kernel 4.19.0-10-amd64 is a kworker
> > process with a CPU load close to 0%.
> > 
> > As a control, booting the previous kernel 4.19.0-9-amd64 does not show a
> > high CPU load for the kworker process. Instead, the kworker CPU load
> > reported by the "top" command is 0.0%.
> > 
> > Therefore, I suspect a bug in the kernel 4.19.0-10-amd64.
> > 
> > Neither "dmesg" nor "journalctl -b" show any messages containing "kworker".
> > 
> > I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and libc6:amd64
> > 2.28-10.
> > 
> > If you need more information, I would be happy to provide it.
> 
> To find out what could be the cause, could you have a look at
> https://www.kernel.org/doc/html/latest/core-api/workqueue.html#debugging
> this could help determining isolating why the kworker goes crazy.

Please as well to the above one additional thing: Can you reproduce
the issue when the kernel does not get tained? So without loading the
propriertary, out-of-tree modules.

This is particularly important if the issue can be tracked down, found
in upstream and needs to be reported upstream.

Regards,
Salvatore



Processed: Re: Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-02 Thread Debian Bug Tracking System
Processing control commands:

> tags -1 + moreinfo
Bug #966703 [src:linux] linux-image-4.19.0-10-amd64: kworker process with 
permanent high CPU load
Added tag(s) moreinfo.

-- 
966703: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966703
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-02 Thread Salvatore Bonaccorso
Control: tags -1 + moreinfo

Hi Dirk

On Sun, Aug 02, 2020 at 10:00:27AM +0200, Dirk Kostrewa wrote:
> Package: src:linux
> Version: 4.19.132-1
> Severity: normal
> 
> Dear Maintainer,
> 
> after booting the kernel 4.19.0-10-amd64, there is a kworker process running
> with a permanent high CPU load of almost 90% as reported by the "top"
> command:
> 
> $ top
> top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
> Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  2.3 si,  0.0
> st
> MiB Mem :  15889.4 total,  14173.1 free,    889.3 used,    827.0 buff/cache
> MiB Swap:      0.0 total,      0.0 free,      0.0 used.  14677.7 avail Mem
> 
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
>    64 root      20   0       0      0      0 R  86.7   0.0 0:47.41
> kworker/0:2+pm
>     9 root      20   0       0      0      0 S  20.0   0.0 0:08.84
> ksoftirqd/0
>   364 root     -51   0       0      0      0 S   6.7   0.0 0:00.50
> irq/126-nvidia
>  1177 dirk      20   0 2921696 122848  94268 S   6.7   0.8 0:02.23 kwin_x11
>     1 root      20   0  169652  10280   7740 S   0.0   0.1 0:01.56 systemd
>     2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 kthreadd
> ...
> 
> The expected result after booting the kernel 4.19.0-10-amd64 is a kworker
> process with a CPU load close to 0%.
> 
> As a control, booting the previous kernel 4.19.0-9-amd64 does not show a
> high CPU load for the kworker process. Instead, the kworker CPU load
> reported by the "top" command is 0.0%.
> 
> Therefore, I suspect a bug in the kernel 4.19.0-10-amd64.
> 
> Neither "dmesg" nor "journalctl -b" show any messages containing "kworker".
> 
> I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and libc6:amd64
> 2.28-10.
> 
> If you need more information, I would be happy to provide it.

To find out what could be the cause, could you have a look at
https://www.kernel.org/doc/html/latest/core-api/workqueue.html#debugging
this could help determining isolating why the kworker goes crazy.

Regards,
Salvatore



Bug#966703: linux-image-4.19.0-10-amd64: kworker process with permanent high CPU load

2020-08-02 Thread Dirk Kostrewa

Package: src:linux
Version: 4.19.132-1
Severity: normal

Dear Maintainer,

after booting the kernel 4.19.0-10-amd64, there is a kworker process 
running with a permanent high CPU load of almost 90% as reported by the 
"top" command:


$ top
top - 09:48:19 up 0 min,  4 users,  load average: 1.91, 0.58, 0.20
Tasks: 218 total,   2 running, 216 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.8 us, 12.4 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.0 hi,  2.3 si, 
 0.0 st

MiB Mem :  15889.4 total,  14173.1 free,    889.3 used,    827.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  14677.7 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
   64 root      20   0       0      0      0 R  86.7   0.0 0:47.41 
kworker/0:2+pm
    9 root      20   0       0      0      0 S  20.0   0.0 0:08.84 
ksoftirqd/0
  364 root     -51   0       0      0      0 S   6.7   0.0 0:00.50 
irq/126-nvidia
 1177 dirk      20   0 2921696 122848  94268 S   6.7   0.8 0:02.23 
kwin_x11

    1 root      20   0  169652  10280   7740 S   0.0   0.1 0:01.56 systemd
    2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
kthreadd

...

The expected result after booting the kernel 4.19.0-10-amd64 is a 
kworker process with a CPU load close to 0%.


As a control, booting the previous kernel 4.19.0-9-amd64 does not show a 
high CPU load for the kworker process. Instead, the kworker CPU load 
reported by the "top" command is 0.0%.


Therefore, I suspect a bug in the kernel 4.19.0-10-amd64.

Neither "dmesg" nor "journalctl -b" show any messages containing "kworker".

I am using Debian/GNU Linux 10.5 with kernel 4.19.0-10-amd64 and 
libc6:amd64 2.28-10.


If you need more information, I would be happy to provide it.

Cheers,

Dirk.

-- Package-specific info:
** Version:
Linux version 4.19.0-10-amd64 (debian-kernel@lists.debian.org 
) (gcc version 8.3.0 (Debian 
8.3.0-6)) #1 SMP Debian 4.19.132-1 (2020-07-24)


** Command line:
BOOT_IMAGE=/boot/vmlinuz-4.19.0-10-amd64 
root=UUID=7eb1c27f-5474-41cb-a4fc-de2944149287 ro quiet


** Tainted: PWOE (12801)
 * Proprietary module has been loaded.
 * Taint on warning.
 * Out-of-tree module has been loaded.
 * Unsigned module has been loaded.

** Kernel log:
Unable to read kernel log; any relevant messages should be attached

** Model information
sys_vendor: Dell Inc.
product_name: Precision 5510
product_version:
chassis_vendor: Dell Inc.
chassis_version:
bios_vendor: Dell Inc.
bios_version: 1.13.1
board_vendor: Dell Inc.
board_name: 0N8J4R
board_version: A00

** Loaded modules:
rfcomm
ctr
ccm
cmac
bnep
snd_hda_codec_hdmi
arc4
intel_rapl
dell_rbtn
iwlmvm
nls_ascii
nls_cp437
vfat
fat
snd_hda_codec_realtek
x86_pkg_temp_thermal
fuse
intel_powerclamp
mac80211
snd_hda_codec_generic
coretemp
mei_wdt
btusb
btrtl
btbcm
kvm_intel
btintel
dell_laptop
dell_wmi
bluetooth
kvm
iwlwifi
dell_smbios
snd_hda_intel
irqbypass
dcdbas
crct10dif_pclmul
crc32_pclmul
snd_hda_codec
sg
dell_smm_hwmon
hid_multitouch
joydev
wmi_bmof
dell_wmi_descriptor
ghash_clmulni_intel
drbg
snd_hda_core
serio_raw
cfg80211
ansi_cprng
intel_cstate
snd_hwdep
efi_pstore
snd_pcm
snd_timer
ecdh_generic
intel_uncore
snd
rtsx_pci_ms
mei_me
nvidia_drm(POE)
iTCO_wdt
memstick
intel_rapl_perf
rfkill
efivars
pcspkr
soundcore
idma64
pcc_cpufreq
iTCO_vendor_support
mei
intel_pch_thermal
nvidia_modeset(POE)
processor_thermal_device
tpm_tis
intel_soc_dts_iosf
tpm_tis_core
tpm
battery
rng_core
int3403_thermal
intel_hid
dell_smo8800
evdev
int3400_thermal
int3402_thermal
acpi_thermal_rel
sparse_keymap
int340x_thermal_zone
acpi_pad
ac
nvidia(POE)
ipmi_devintf
ipmi_msghandler
parport_pc
ppdev
lp
parport
efivarfs
ip_tables
x_tables
autofs4
ext4
crc16
mbcache
jbd2
crc32c_generic
fscrypto
ecb
usbhid
hid_generic
sd_mod
i915
crc32c_intel
i2c_designware_platform
i2c_designware_core
rtsx_pci_sdmmc
xhci_pci
i2c_algo_bit
mmc_core
xhci_hcd
drm_kms_helper
ahci
libahci
libata
aesni_intel
drm
mxm_wmi
aes_x86_64
psmouse
usbcore
i2c_i801
crypto_simd
scsi_mod
cryptd
glue_helper
i2c_hid
rtsx_pci
intel_lpss_pci
hid
intel_lpss
mfd_core
usb_common
thermal
fan
video
wmi
button

** PCI devices:
00:00.0 Host bridge [0600]: Intel Corporation Skylake Host Bridge/DRAM 
Registers [8086:1910] (rev 07)
Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host 
Bridge/DRAM Registers [1028:06e5]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- 
Latency: 0
Capabilities: 
Kernel driver in use: skl_uncore

00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller 
(x16) [8086:1901] (rev 07) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Latency: 0
Interrupt: pin A routed to IRQ 16
Bus: primary=00, secondary=01,