GitHub user TadiosAbebe added a comment to the discussion: Degraded cloudstack agent
> But I had a test all-in-one ACS on ubuntu 24.04 with libvirt 10.0.0, but I > couldn’t reproduce the issue I’m seeing in the production environment. I > repeatedly ran your test script: > > ``` > for i in `seq 1 20`;do > cmk deploy virtualmachine name=L2-wei-test-$i serviceofferingid=xxx > zoneid=xxx templateid=xxx networkids=xxx & >/dev/null; > sleep 2; > done > ``` > > to generate load, and the results were consistently fast: > > ``` > mysql> select id,name,created,update_time,(update_time-created) from > vm_instance where removed is null and name like "L2-wei%"; > +-----+----------------+---------------------+---------------------+-----------------------+ > | id | name | created | update_time | > (update_time-created) | > +-----+----------------+---------------------+---------------------+-----------------------+ > | 191 | L2-wei-test-1 | 2025-11-25 11:22:07 | 2025-11-25 11:22:14 | > 7 | > | 192 | L2-wei-test-2 | 2025-11-25 11:22:09 | 2025-11-25 11:22:16 | > 7 | > | 193 | L2-wei-test-3 | 2025-11-25 11:22:11 | 2025-11-25 11:22:17 | > 6 | > | 194 | L2-wei-test-4 | 2025-11-25 11:22:13 | 2025-11-25 11:22:22 | > 9 | > | 195 | L2-wei-test-5 | 2025-11-25 11:22:15 | 2025-11-25 11:22:20 | > 5 | > | 196 | L2-wei-test-6 | 2025-11-25 11:22:17 | 2025-11-25 11:22:23 | > 6 | > | 197 | L2-wei-test-7 | 2025-11-25 11:22:19 | 2025-11-25 11:22:26 | > 7 | > | 198 | L2-wei-test-8 | 2025-11-25 11:22:21 | 2025-11-25 11:22:27 | > 6 | > | 199 | L2-wei-test-9 | 2025-11-25 11:22:23 | 2025-11-25 11:22:29 | > 6 | > | 200 | L2-wei-test-10 | 2025-11-25 11:22:25 | 2025-11-25 11:22:31 | > 6 | > | 201 | L2-wei-test-11 | 2025-11-25 11:22:27 | 2025-11-25 11:22:34 | > 7 | > | 202 | L2-wei-test-12 | 2025-11-25 11:22:29 | 2025-11-25 11:22:36 | > 7 | > | 203 | L2-wei-test-13 | 2025-11-25 11:22:31 | 2025-11-25 11:22:38 | > 7 | > | 204 | L2-wei-test-14 | 2025-11-25 11:22:33 | 2025-11-25 11:22:41 | > 8 | > | 205 | L2-wei-test-15 | 2025-11-25 11:22:35 | 2025-11-25 11:22:42 | > 7 | > | 206 | L2-wei-test-16 | 2025-11-25 11:22:37 | 2025-11-25 11:22:45 | > 8 | > | 207 | L2-wei-test-17 | 2025-11-25 11:22:39 | 2025-11-25 11:22:48 | > 9 | > | 208 | L2-wei-test-18 | 2025-11-25 11:22:41 | 2025-11-25 11:22:49 | > 8 | > | 209 | L2-wei-test-19 | 2025-11-25 11:22:43 | 2025-11-25 11:22:51 | > 8 | > | 210 | L2-wei-test-20 | 2025-11-25 11:22:45 | 2025-11-25 11:22:55 | > 10 | > +-----+----------------+---------------------+---------------------+-----------------------+ > ``` > > I'll try to create a full environment consisting ceph and multiple kvm host > on a test environment to see if i can replicate the issue there and see if > libvirt 10.6.0 fix it, later this week. My update: After yesterdays test i let the all-in-one ACS sit without restarting libvirtd and cloudstack-agent for a while now, initially the resource utilization of the java process was ``` CPU: 0.5% MEM: 2.4% FD: 253 Threads: 83 Conn: 1 ``` After about 6 or 7 hours, got up to about CPU: 1.1% without any interaction on the host, just running the above 20 small cirros VMs previously launched. Then i tried to simulate some workload in a loop creating and destroying those 20 instances. The resource CPU utilization increased to about 4.5% after about 10 hours ``` CPU: 4.5% MEM: 2.6% FD: 253 Threads: 78 Conn: 1 ``` One thing i noticed when looking into the java process in htop is, 5 or 6 java process in our production cluster have a high CPU time for example on one of our compute host about 4 process have a CPU time value of around 53:54:30. Is this normal? If i restart libvirtd and the CPU time of the java process those process with high CPU time goes away and the max becomes 0:05:26 I'll replace the libvirt with 10.6.0 on the all-in-one instance and see what changes now. GitHub link: https://github.com/apache/cloudstack/discussions/12450#discussioncomment-15515843 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
