GitHub user sbrueseke added a comment to the discussion: About HA

We run into the same situation and did not understand HA in Cloudstack 
correctly, too. It is really hard for CS to be sure that a host is really down. 
There are so many situations where the management server is unable to connect 
to the host, but all VMs are still running. If CS no is trying to start the VMs 
on other hosts it will end in a mess.
Here is how we handle host failures, it is a manual process:
1) Our monitoring will inform us that a host is down.
2) We take a look and a technician is deciding that the host is really down and 
will not get up.
3) If the host is really down, we do a Force Reconnect on the host UI page.
4) After that we do a Declare Host as Degraded via UI.
5) After declaring a host as degraded HA (of the service offering) will kick in 
and restarts all VMs on other hosts.

Even if possible, we are not going to automate this. We want control over this 
and one big reason is that we also run SDS (linstor) on all hosts and so it 
will impact our primary storage, too.

Hope that helps!

GitHub link: 
https://github.com/apache/cloudstack/discussions/9988#discussioncomment-11405383

----
This is an automatically sent email for users@cloudstack.apache.org.
To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org

Reply via email to