Hi, I have some further info. I partially know cause and have a workaround.
There is my investigation. Some minor inaccuracies might be caused by retrospective writing: 1. I have tried to debug using strace. (Prerequisite: sudo qubes-dom0-install strace) After finding pid of qubesd, I ran: sudo strace -s 256 -p PID_OF_QUBESD -o /tmp/qubesd.log It looks like few seconds is enough to get a reasonable sample, see below. 2. I ran sort /tmp/qubesd.log | uniq -c | sort -n (one can also add “ -r | head -n 50”). I have noticed an interesting line that repeats frequently: sendto(270, "QubesNoSuchPropertyError\0", 25, 0, NULL, 0) = 25 3. Look closer: $ grep --before=5 --after=5 QubesNoSuchPropertyError /tmp/qubesd.log The output contains many repeated occurrences of this, just with a different VM name. It seems to iterate over all the VMs (even those that are not running): -- epoll_wait(3, [], <some_number>, 0) = 0 getpid() = <some_number> epoll_wait(3, [], <some_number>, 0) = 0 epoll_wait(3, [], <some_number>, 0) = 0 sendto(<some_number>, "2\0", 2, 0, NULL, 0) = 2 sendto(<some_number>, "QubesNoSuchPropertyError\0", 25, 0, NULL, 0) = 25 sendto(<some_number>, "\0", 1, 0, NULL, 0) = 1 sendto(<some_number>, "Invalid property 'internal' of <some-vm-name>\0", 38, 0, NULL, 0) = 38 shutdown(<some_number>, SHUT_WR) = 0 epoll_wait(3, [{EPOLLIN, {u32=<some_number>, u64=<some_number>}}], 18, 0) = 1 close(<some_number>) = 0 4. WTF, what would iterate over all the VMs? Maybe some script repeatedly runs qvm-ls? Let's ps aux | grep qvm-ls that! During increased CPU workload, I have identified: qvm-ls --no-spinner --raw-data --fields NAME,FLAGS 5. During the current random CPU workload, I cannot reliably verify if it is the cause of the increased CPU usage, but at least I can verify if it is the cause of the error messages. So, I have tried the command while running this: (sudo strace -s 256 -p PID_OF_QUBESD 2>&1) | grep 'Invalid property' And yes, it seems to be the cause of the error messages and maybe also the source of increased CPU load. 6. Let's identify the script that runs the command: I ran htop, switched to tree mode (key t), waited for the qvm-ls (using watch + ps aux + grep) and typed “/qvm-ls”. And the script to blame is – qubes-i3status 7. And yes, killing qubes-i3status has helped to decrease the CPU load. After doing that, I was able to confirm that qvm-ls --no-spinner --raw-data --fields NAME,FLAGS also causes the CPU load. So, there are multiple causes combined: * I have many VMs in my computer. * I use i3 with qubes-i3status * The script qubes-i3status calls command qvm-ls --no-spinner --raw-data --fields NAME,FLAGS quite frequently. * The command qvm-ls --no-spinner --raw-data --fields NAME,FLAGS seems to cause high CPU load. Unfortunately, the process that shows the high CPU usage is qubesd, not qvm-ls. What can be improved: a. Don't use qubes-i3status. Problem solved. b. Optimize qvm-ls. Not sure how hard it is. c. Optimize qubes-i3status. I am not sure about the ideal way of doing that, but clearly running qvm-ls --no-spinner --raw-data --fields NAME,FLAGS just to compute the number of running qubes is far from optimal. One could add --running. And maybe it could have been written without flags. The script just ignores VMs with the first flag being “0” (maybe in order to ignore dom0) and the second flag being “r” (probably not needed with --running). Regards, Vít Šesták 'v6ak' On Monday, January 4, 2021 at 11:51:23 PM UTC+1 Vít Šesták wrote: > Hello, > I have dual-core i7 7500U with disabled hyperthreading. In dom0, I often > have total CPU usage in tens of percents (often about 50 %, i.e., about > fully utilized single core). When I look at htop in dom0, it is clearly > caused by qubesd, which clearly uses the vast majority of CPU during these > peaks. Note that these peaks look rather random, I see no relation to any > activity. But they are quite frequent. > > When looking at the process tree, it has many child processes, probably > one for each domU qube. But they utilize near zero CPU. > > The column TIME+ confirms my CPU% observation in long term. > > I am not sure where to find any relevant log. Maybe journalctl, but I have > seen nothing suspicious there. > > Do you have any idea about the cause, solution or even a suggestion for > debugging? > > Regards, > Vít Šesták 'v6ak' > -- You received this message because you are subscribed to the Google Groups "qubes-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to qubes-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/qubes-users/9ac8afbe-55ee-485c-a29b-b7c75b92ecden%40googlegroups.com.