GitHub user sbrueseke added a comment to the discussion: About HA
We run into the same situation and did not understand HA in Cloudstack correctly, too. It is really hard for CS to be sure that a host is really down. There are so many situations where the management server is unable to connect to the host, but all VMs are still running. If CS no is trying to start the VMs on other hosts it will end in a mess. Here is how we handle host failures, it is a manual process: 1) Our monitoring will inform us that a host is down. 2) We take a look and a technician is deciding that the host is really down and will not get up. 3) If the host is really down, we do a Force Reconnect on the host UI page. 4) After that we do a Declare Host as Degraded via UI. 5) After declaring a host as degraded HA (of the service offering) will kick in and restarts all VMs on other hosts. Even if possible, we are not going to automate this. We want control over this and one big reason is that we also run SDS (linstor) on all hosts and so it will impact our primary storage, too. Hope that helps! GitHub link: https://github.com/apache/cloudstack/discussions/9988#discussioncomment-11405383 ---- This is an automatically sent email for users@cloudstack.apache.org. To unsubscribe, please send an email to: users-unsubscr...@cloudstack.apache.org