On Fri, Jun 4, 2021 at 12:11 PM David White via Users <users@ovirt.org> wrote: > > I'm trying to figure out how to keep a "broken" NFS mount point from causing > the entire HCI cluster to crash. > > HCI is working beautifully. > Last night, I finished adding some NFS storage to the cluster - this is > storage that I don't necessarily need to be HA, and I was hoping to store > some backups and less-important VMs on, since my Gluster (sssd) storage > availability is pretty limited. > > But as a test, after I got everything setup, I stopped the nfs-server. > This caused the entire cluster to go down, and several VMs - that are not > stored on the NFS storage - went belly up.
Please explain in more detail "went belly up". In general vms not using he nfs storage domain should not be affected, but due to unfortunate design of vdsm, all storage domain share the same global lock and when one storage domain has trouble, it can cause delays in operations on other domains. This may lead to timeouts and vms reported as non-responsive, but the actual vms, should not be affected. If you have a good way to reproduce the issue please file a bug with all the logs, we try to improve this situation. > Once I started the NFS server process again, HCI did what it was supposed to > do, and was able to automatically recover. > My concern is that NFS is a single point of failure, and if VMs that don't > even rely on that storage are affected if the NFS storage goes away, then I > don't want anything to do with it. You need to understand the actual effect on the vms before you reject NFS. > On the other hand, I'm still struggling to come up with a good way to run > on-site backups and snapshots without using up more gluster space on my (more > expensive) sssd storage. NFS is useful for this purpose. You don't need synchronous replication, and you want the backups outside of your cluster so in case of disaster you can restore the backups on another system. Snapshots are always on the same storage so it will not help. > Is there any way to setup NFS storage for a Backup Domain - as well as a Data > domain (for lesser important VMs) - such that, if the NFS server crashed, all > of my non-NFS stuff would be unaffected? NFS storage domain will always affect other storage domains, but if you mount your NFS storage outside of ovirt, the mount will not affect the system. Then you can backup to this mount, for example using backup_vm.py: https://github.com/oVirt/ovirt-engine-sdk/blob/master/sdk/examples/backup_vm.py Or one of the backup solutions, all of them are not using a storage domain for keeping the backups so the mount should not affect the system. Nir _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MYQAQTMXRAZT7EYAYCMYXBJYZHSNJT7G/