Hi,
I have some further info. I partially know cause and have a workaround.

There is my investigation. Some minor inaccuracies might be caused by 
retrospective writing:

1. I have tried to debug using strace. (Prerequisite: sudo 
qubes-dom0-install strace) After finding pid of qubesd, I ran:
sudo strace -s 256 -p PID_OF_QUBESD -o /tmp/qubesd.log

It looks like few seconds is enough to get a reasonable sample, see below.

2. I ran sort /tmp/qubesd.log | uniq -c | sort -n (one can also add “ -r | 
head -n 50”).

I have noticed an interesting line that repeats frequently:
sendto(270, "QubesNoSuchPropertyError\0", 25, 0, NULL, 0) = 25

3. Look closer:
$ grep --before=5 --after=5 QubesNoSuchPropertyError /tmp/qubesd.log 
The output contains many repeated occurrences of this, just with a 
different VM name. It seems to iterate over all the VMs (even those that 
are not running):
--
epoll_wait(3, [], <some_number>, 0)                = 0
getpid()                                = <some_number>
epoll_wait(3, [], <some_number>, 0)                = 0
epoll_wait(3, [], <some_number>, 0)                = 0
sendto(<some_number>, "2\0", 2, 0, NULL, 0)       = 2
sendto(<some_number>, "QubesNoSuchPropertyError\0", 25, 0, NULL, 0) = 25
sendto(<some_number>, "\0", 1, 0, NULL, 0)        = 1
sendto(<some_number>, "Invalid property 'internal' of <some-vm-name>\0", 
38, 0, NULL, 0) = 38
shutdown(<some_number>, SHUT_WR)                  = 0
epoll_wait(3, [{EPOLLIN, {u32=<some_number>, u64=<some_number>}}], 18, 0) = 
1
close(<some_number>)                              = 0

4. WTF, what would iterate over all the VMs? Maybe some script repeatedly 
runs qvm-ls? Let's ps aux | grep qvm-ls that! During increased CPU 
workload, I have identified:

qvm-ls --no-spinner --raw-data --fields NAME,FLAGS

5. During the current random CPU workload, I cannot reliably verify if it 
is the cause of the increased CPU usage, but at least I can verify if it is 
the cause of the error messages. So, I have tried the command while running 
this:

(sudo strace -s 256 -p PID_OF_QUBESD 2>&1) | grep 'Invalid property'

And yes, it seems to be the cause of the error messages and maybe also the 
source of increased CPU load.

6. Let's identify the script that runs the command: I ran htop, switched to 
tree mode (key t), waited for the qvm-ls (using watch + ps aux + grep) and 
typed “/qvm-ls”.

And the script to blame is – qubes-i3status

7. And yes, killing qubes-i3status has helped to decrease the CPU load. 
After doing that, I was able to confirm that qvm-ls --no-spinner --raw-data 
--fields NAME,FLAGS also causes the CPU load.


So, there are multiple causes combined:

* I have many VMs in my computer.
* I use i3 with qubes-i3status
* The script qubes-i3status calls command qvm-ls --no-spinner --raw-data 
--fields NAME,FLAGS quite frequently.
* The command qvm-ls --no-spinner --raw-data --fields NAME,FLAGS seems to 
cause high CPU load. Unfortunately, the process that shows the high CPU 
usage is qubesd, not qvm-ls.

What can be improved:

a. Don't use qubes-i3status. Problem solved.
b. Optimize qvm-ls. Not sure how hard it is.
c. Optimize qubes-i3status. I am not sure about the ideal way of doing 
that, but clearly running qvm-ls --no-spinner --raw-data --fields 
NAME,FLAGS just to compute the number of running qubes is far from optimal. 
One could add --running. And maybe it could have been written without 
flags. The script just ignores VMs with the first flag being “0” (maybe in 
order to ignore dom0) and the second flag being “r” (probably not needed 
with --running).

Regards,
Vít Šesták 'v6ak'
On Monday, January 4, 2021 at 11:51:23 PM UTC+1 Vít Šesták wrote:

> Hello,
> I have dual-core i7 7500U with disabled hyperthreading. In dom0, I often 
> have total CPU usage in tens of percents (often about 50 %, i.e., about 
> fully utilized single core). When I look at htop in dom0, it is clearly 
> caused by qubesd, which clearly uses the vast majority of CPU during these 
> peaks. Note that these peaks look rather random, I see no relation to any 
> activity. But they are quite frequent.
>
> When looking at the process tree, it has many child processes, probably 
> one for each domU qube. But they utilize near zero CPU.
>
> The column TIME+ confirms my CPU% observation in long term.
>
> I am not sure where to find any relevant log. Maybe journalctl, but I have 
> seen nothing suspicious there.
>
> Do you have any idea about the cause, solution or even a suggestion for 
> debugging?
>
> Regards,
> Vít Šesták 'v6ak'
>

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-users/9ac8afbe-55ee-485c-a29b-b7c75b92ecden%40googlegroups.com.

Reply via email to