Thank you, Wei, as always. This is a half-empty versus half-full glass issue.
Based on our experience, there is more to lose than gain. I would suggest setting the default to reboot.host.and.alert.management.on.heartbeat.timeout=false. Regards, Antoine Antoine Boucher antoi...@haltondc.com [o] +1-226-505-9734 www.haltondc.com Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return e-mail, and delete this message and any attachments from your system. > On Mar 28, 2025, at 3:22 AM, Wei ZHOU <ustcweiz...@gmail.com> wrote: > > Hi, > > Currently this is the default behavior that the host is rebooted in case of > NFS failure. > > You can add the line to agent.properties and restart cloudstack-agent to > make it effective. > > reboot.host.and.alert.management.on.heartbeat.timeout=false > > > > -Wei > > On Fri, Mar 28, 2025 at 5:06 AM Antoine Boucher > <antoi...@haltondc.com.invalid> wrote: > >> We experienced unexpected cascading reboots across all hosts, followed by >> HA kicking in and migrating VMs. Amid the chaos, we discovered that a newly >> added zone-wide NFS server, used only by one stopped test VM, had gone >> offline. Once we disabled that NFS server in the UI, everything slowly >> stabilized. >> >> We have a large number of NFS servers online in the zone. Is this expected >> behavior? Can one NFS server going offline with just a single stopped VM >> trigger mass host reboots? This feels like operational madness. >> >> Regards, Antoine >> >> Antoine Boucher >> antoi...@haltondc.com >> [o] +1-226-505-9734 >> www.haltondc.com >>