На 31 май 2020 г. 15:52:14 GMT+03:00, aigin...@gmail.com написа: >Hi, > >Our company uses Ovirt to host some of its virtual machines. The >version used is 4.2.6.4-1.el7. There are about 36 virtual hosts in it. >The specifications used for the host machine is 30G RAM and 6 CPUs. >Some of the VMs in the ovirt host run with 4 CPUs. Some with 2 CPUs. > >The problem I face now is that recently there was a need for high CPU >and memory specs to setup a VM for DR. I created a VM with 16G RAM and >6 CPUs, without checking the CPUs available in the host first. After >DR, the VM was brought down already. Then later another person in the >team brought the VM back up for a different DR use, for a much larger >DB restoration purpose. > >This caused the VM to pause due to storage error. And then worse things >happened, whereby 2 other VMs inadvertently went down. Although I >assumed that this was caused by storage errors/problems, the senior >admins in the team concluded that the problem was due to fencing >because of the max allotted CPU for the host being used for the VM.
Check the libvirt logs on the host where the VM was running. In the engine, you can check the logs for any fencing, but I have never seen such thing as "excessive CPU allocation" to cause fencing. Either the VM passes the checks (overcommit rules, scheduling,etc) and gets up and running or the engine will refuse to power it up. Also check via journalctl for any messages at that time for the 'sanlock.service' . Any issues (storage unavailable or high lattency detected) will be reported via the sanlock service on the affected node. If you use multipath - check if it also reported any paths failing. >Now what I need to know is how to properly allocate CPU resources to a >host to run multiple virtual machines in it like the situation above. The best way is to start with less CPUs as possible. Here is a short (or maybe not) example: Hoypervisor has 8 CPUs/8 Threads. First VM has 1 CPU. Second has 6 CPUs allocated and a third VM has 8 CPUs allocated. For the hypervisor to allocate CPU time for the third (beefy) VM, it needs to have all 8 CPUs available. As the host itself has 8 cores and usually some OS stuff is going on - the third VM will receive far less CPU time than first/second VM. >I even tried to look for errors in vdsm.log, but this log was not >available in the host machine nor in the affected VM. My colleague >asked me to check "Events" section of the ovirt management interface to >see past the past events. However, I don't find much details about the >fencing activity or how the fencing occurred or what caused the >fencing. > >And how did they conclude that the CPU count caused the fencing and not >the storage? Interesting question... I think they just assumed. In worst case scenario (CPU starvation), the vdsmd service might not respond to the engine, but then a 'soft' reboot will happen where the engine will restart this service over ssh. >_______________________________________________ >Users mailing list -- users@ovirt.org >To unsubscribe send an email to users-le...@ovirt.org >Privacy Statement: https://www.ovirt.org/privacy-policy.html >oVirt Code of Conduct: >https://www.ovirt.org/community/about/community-guidelines/ >List Archives: >https://lists.ovirt.org/archives/list/users@ovirt.org/message/QX7NAZQ67VBA3KLPYIOXYSTPNU46XOBO/ Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SRD5TUJNKF32H2L6C47HEV4SQDILKLRV/