Hi Mauro,

that is really intresting to hear - i am not so long dealing with
cloudstack. so this is quiet new to me.
how ever: reading through the admin guide
http://docs.cloudstack.apache.org/en/latest/adminguide/reliability.html?highlight=Storage%20Outage#primary-storage-outage-and-data-loss
The described behaviour seems not "normal" for the hosts.

Did you already take a look into the isues on github?
Restarting all hosts of the cluster sounds like a bug to me - so might be
worth opening a new issue for further investigation?

Am Sa., 16. Okt. 2021 um 01:43 Uhr schrieb Mauro Ferraro - G2K Hosting <
mferr...@g2khosting.com>:

> Hi guys, how are you?.
>
> We are having this problems with ACS when a primary storages fails.
>
> We have several primary storage with Linux and NFS server serving KVM
> images. So every hosts have been mounted all the NFS servers because in
> one Host can be running VMs from different storages. The main problem of
> this, is when some storage fails because any reason all the cluster gets
> crazy and start rebooting the hosts to reconnect with this storage and
> all the VMs on the cluster, (including the VMs that were working good)
> goes down becuase the conection to one storage fails.
> If the problem with storage is permanent, the cluster never start again
> and hosts will reboot indefinitely.
>
> When this problem appears, the logs say this:
>
> host heartbeat: kvmheartbeat.sh will reboot system because it was unable
> to write the heartbeat to the storage.
>
> Many users, edit the script kvmheartbeat.shto avoid the hosts reboot or
> restart the agent on the host but i really not be sure that this is the
> real solution.
>
> Can someone help to propose a best solution at this high risk problem?.
>
> Regards,
>
> Mauro
>
>
>

Reply via email to