Hi David, > On Sep 30, 2019, at 3:29 PM, Dr. David Alan Gilbert <dgilb...@redhat.com> > wrote: > > * Felipe Franciosi (fel...@nutanix.com) wrote: >> Heyall, >> >> We have a use case where a host should self-fence (and all VMs should >> die) if it doesn't hear back from a heartbeat within a certain time >> period. Lots of ideas were floated around where libvirt could take >> care of killing VMs or a separate service could do it. The concern >> with those is that various failures could lead to _those_ services >> being unavailable and the fencing wouldn't be enforced as it should. >> >> Ultimately, it feels like Qemu should be responsible for this >> heartbeat and exit (or execute a custom callback) on timeout. > > It doesn't feel doing it inside qemu would be any safer; something > outside QEMU can forcibly emit a kill -9 and qemu *will* stop.
The argument above is that we would have to rely on this external service being functional. Consider the case where the host is dysfunctional, with this service perhaps crashed and a corrupt filesystem preventing it from restarting. The VMs would never die. It feels like a Qemu timer-driven heartbeat check and calls abort() / exit() would be more reliable. Thoughts? Felipe > >> Does something already exist for this purpose which could be used? >> Would a generic Qemu-fencing infrastructure be something of interest? > Dave > > >> Cheers, >> F. >> > -- > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK