I have an HCI cluster running on Gluster storage. I exposed an NFS share into oVirt as a storage domain so that I could clone all of my VMs (I'm preparing to move physically to a new datacenter). I got 3-4 VMs cloned perfectly fine yesterday. But then this evening, I tried to clone a big VM, and it caused the disk to lock up. The VM went totally unresponsive, and I didn't see a way to cancel the clone. Nagios NRPE (on the client VM) was reporting server load over 65+, but I was never able to establish an SSH connection.
Eventually, I tried restarting the ovirt-engine, per https://access.redhat.com/solutions/396753. When that didn't work, I powered down the VM completely. But the disks were still locked. So I then tried to put the storage domain into maintenance mode, but that wound up putting the entire domain into a "locked" state. Finally, eventually, the disks unlocked, and I was able to power the VM back online. >From start to finish, my VM was down for about 45 minutes, including the time >when NRPE was still sending data to Nagios. What logs should I look at, and how can I troubleshoot what went wrong here, and hopefully avoid this from happening again? Sent with ProtonMail Secure Email.
publickey - [email protected] - 0x320CD582.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/ASEENELT4TRTXQ7MF4FKB6L75D3H75AN/

