In a host failure situation, we see that oVirt tries to restart the VMs on 
other hosts in the cluster but this (more often than not) fails due to kvm 
being unable to acquire a write lock on the qcow2 image. We see ovirt attempt 
to restart the VMs several times, each time on different hosts but with the 
same outcome after which it gives up trying.

After this we must log into the oVirt web interface and start the VM manually, 
which works fine (by this time we assume enough time has passed for the lock to 
clear itself).

This behaviour is experienced with Centos 7.6, Libvirt 4.5.0-10, vdsm 4.30.13-1

Log excerpt from hosted engine:

2019-04-24 17:05:26,653+01 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-82) [] VM 
'ef7e04f0-764a-4cfe-96bf-c0862f1f5b83'(vm-21.example.local) moved from 
'WaitForLaunch' --> 'Down'
2019-04-24 17:05:26,710+01 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-82) [] EVENT_ID: 
VM_DOWN_ERROR(119), VM vm-21.example.local is down with error. Exit message: 
internal error: process exited while connecting to monitor: 
2019-04-24T16:04:48.049352Z qemu-kvm: -drive 
file=/rhev/data-center/mnt/192.168.111.111:_/21a1390b-b73b-46b1-85b9-2bbf9bba5308/images/c9d96ab6-cb0b-4fba-9b07-096ff750c7f7/16da3660-1afe-40a3-b868-3a74e74bab2f,format=qcow2,if=none,id=drive-ua-c9d96ab6-cb0b-4fba-9b07-096ff750c7f7,serial=c9d96ab6-cb0b-4fba-9b07-096ff750c7f7,werror=stop,rerror=stop,cache=none,aio=threads:
 'serial' is deprecated, please use the corresponding option of '-device' 
instead
2019-04-24T16:04:48.079989Z qemu-kvm: -drive 
file=/rhev/data-center/mnt/192.168.111.111:_/21a1390b-b73b-46b1-85b9-2bbf9bba5308/images/c9d96ab6-cb0b-4fba-9b07-096ff750c7f7/16da3660-1afe-40a3-b868-3a74e74bab2f,format=qcow2,if=none,id=drive-ua-c9d96ab6-cb0b-4fba-9b07-096ff750c7f7,serial=c9d96ab6-cb0b-4fba-9b07-096ff750c7f7,werror=stop,rerror=stop,cache=none,aio=threads:
 Failed to get "write" lock

So my question is, how can I either force oVirt to continue to try restarting 
the VM or delay the initial VM restart for enough time to allow locks to clear? 

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/IZ6QSK3L27ZORGJ7ALWUDVBLNK7UVSVH/

Reply via email to