Re: NFS Server Failure Caused Cascading Reboots and HA Events

Wei ZHOU Fri, 28 Mar 2025 00:23:14 -0700

Hi,

Currently this is the default behavior that the host is rebooted in case of
NFS failure.


You can add the line to agent.properties and restart cloudstack-agent to
make it effective.

reboot.host.and.alert.management.on.heartbeat.timeout=false



-Wei

On Fri, Mar 28, 2025 at 5:06 AM Antoine Boucher
<[email protected]> wrote:

> We experienced unexpected cascading reboots across all hosts, followed by
> HA kicking in and migrating VMs. Amid the chaos, we discovered that a newly
> added zone-wide NFS server, used only by one stopped test VM, had gone
> offline. Once we disabled that NFS server in the UI, everything slowly
> stabilized.
>
> We have a large number of NFS servers online in the zone. Is this expected
> behavior? Can one NFS server going offline with just a single stopped VM
> trigger mass host reboots? This feels like operational madness.
>
> Regards, Antoine
>
> Antoine Boucher
> [email protected]
> [o] +1-226-505-9734
> www.haltondc.com
>

Re: NFS Server Failure Caused Cascading Reboots and HA Events

Reply via email to