GitHub user akrasnov-drv added a comment to the discussion: CloudStack fails to start more VMs
I see your point. Unfortunately I currently have no infra for L2, I mean dhcp, router, etc. But I could test pure VM start, to see if VMs are created and started. I used cloudstack cli to do it. And after some maybe 10 executions I started getting TimeoutError on connecting to api. CloudStack UI also became unresponsive (maybe I need to properly clean the env, restart agents and server, and try again). Nevertheless all 120 VMs I requested were created in CloudStack, 9 finished with Error `Error while deploying Vm`, the rest started. Also I did another test, I executed start of 120 VMs in parallel in isolated network but without additional user-data or additional network config (we usually use static nat). In this case CloudStack managed to start about 80 VM, several more failed with Error, others stayed in Starting state (maybe they'll do finally). After start, most of VMs got proper hostname from VR, but part stayed with localhost. In any case it's a lot better then when we used user-data and static nat. And now I see that primary VR is in Unknown state. Nevertheless, the backup one does not take primary role, and I'm still able to ssh to the "primary" one. Also I see again that the host where VR runs again has log "frozen" I executed VM start at about 15:30, the log was last updated at 15:55, and the last 2 lines are: ``` 2025-01-15 15:33:15,742 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:[]) (logid:37db33e3) Attempting to create volume c8cbfca6-ea45-4698-8221-a578a7b8576d (Filesystem) in pool a0bdff3a-8016-4760-8cd4-eaeb5809ddc7 with size (6.00 GB) 6442450944 2025-01-15 15:55:31,242 WARN [kvm.resource.LibvirtComputingResource] (Script-5:[]) (logid:) Interrupting script. ``` btw, this host (where VR runs) managed to start just 2 VMs from that 120. I restarted libvirtd, and as before agent immediately returned to life and wrote this to its log ```2025-01-15 16:13:49,367 INFO [cloud.agent.Agent] (AgentShutdownThread:[]) (logid:) Stopping the agent: Reason = sig.kill : Detail = 2025-01-15 16:13:49,367 WARN [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-5:[]) (logid:bce375d5) Process [760381] for command [/usr/share/cloudstack-common/scripts/network/domr/router_proxy.sh update_config.py 169.254.206.35 vm_metadata.json.479c1074-4d9c-4bff-b06a-0fb61a9125ed ] timed out. Output is [java.io.IOException: Stream clos ed at java.base/java.io.BufferedReader.ensureOpen(BufferedReader.java:123) at java.base/java.io.BufferedReader.ready(BufferedReader.java:444) at com.cloud.utils.script.OutputInterpreter$TimedOutLogger.interpret(OutputInterpreter.java:72) at com.cloud.utils.script.Script$Task.run(Script.java:490) at com.cloud.utils.script.Script.execute(Script.java:298) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeInVR(LibvirtComputingResource.java:553) at com.cloud.agent.resource.virtualnetwork.VirtualRoutingResource.applyConfigToVR(VirtualRoutingResource.java:303) at com.cloud.agent.resource.virtualnetwork.VirtualRoutingResource.applyConfig(VirtualRoutingResource.java:318) at com.cloud.agent.resource.virtualnetwork.VirtualRoutingResource.executeRequest(VirtualRoutingResource.java:165) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtNetworkElementCommandWrapper.execute(LibvirtNetworkElementCommandWrapper.java:35) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtNetworkElementCommandWrapper.execute(LibvirtNetworkElementCommandWrapper.java:29) at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78) at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1945) at com.cloud.agent.Agent.processRequest(Agent.java:686) at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1109) at com.cloud.utils.nio.Task.call(Task.java:83) at com.cloud.utils.nio.Task.call(Task.java:29) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) ]. 2025-01-15 16:13:49,373 WARN [cloud.agent.Agent] (agentRequest-Handler-5:[]) (logid:bce375d5) Unable to send response: Seq 4-6283647380088688564: { Ans: , MgmtId: 108380656826451, via: 4, Ver: v1, Flags: 10, [{"com.cloud.agent.api.routing.GroupAnswer":{"results":["null - success: Creating file in VR, with ip: 169.254.206.35, file: vm_password.json.b 67bbb7a-3116-4255-bf06-a198e3faa3b9","null - success: Invalid unit name "cloud-password-server@10.10.240.1,10.10.245.127" escaped as "cloud-password-server@10.10.240.1\x2c10.10.245.127" (maybe you should use systemd-escape?). "],"result":"true","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.routing.GroupAnswer":{"results":["null - success: Creating file in VR, with ip: 169.254.206.35, file: vm_metadata.json.479c1074-4d9c-4bff-b06a-0fb61a9125ed","null - failed: timeout"],"result":"false","wait":"0","bypassHostMaintenance":"false"}}] } 2025-01-15 16:13:52,803 INFO [cloud.agent.AgentShell] (main:[]) (logid:) Agent started 2025-01-15 16:13:52,816 INFO [cloud.agent.AgentShell] (main:[]) (logid:) Implementation Version is 4.20.0.0 ``` then returned to normal. 2 of start VM tasks `cloudstack deployVirtualMachine` finished with error 530 just after that. And only then backup VR got primary function. After the above I managed to start additional VM, and it got proper name from VR. To summon, looks like under load some script running in libvirt makes the agent unusable. I'd think maybe about `/usr/share/cloudstack-agent/lib/libvirtqemuhook` that is placed to `/etc/libvirt/hooks/qemu`, but appeared it exists on all my hosts except the one where VR was stuck. GitHub link: https://github.com/apache/cloudstack/discussions/10184#discussioncomment-11845532 ---- This is an automatically sent email for users@cloudstack.apache.org. To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org