Apology for fragmented messages, in existing framework cloudstack does not know for certain if your VMs are dead, or KVM hypervisor crashed, or its just a network blip, or perhaps you stopped kvm agent (or agent died). It takes a conservative approach and does not re-start the VMs on other hypervisors to avoid split brain scenario.
The only time it will restart KVM hypervisor and move VMs over - is when you loose a primary storage access to one of the hypervisors in the cluster - using NFS heartbeat method i mentioned earlier. New framework addresses the limitations above by 1) checking if there is any disk activity on VMs that are in uncertain state - if no activity for ALL VMs for "x" number of seconds 2) cloudstack will issue IPMI fence command to power down/reboot a host (via ILO or DRAC or something else similar) 3) the VMs will be restarted elsewhere Regards ilya On Tue, Jul 18, 2017 at 6:10 AM, ilya musayev <ilya.mailing.li...@gmail.com> wrote: > What share primary storage backend do you have for your VMs? > > If it is NFS - cloudstack agent writes heartbeat. When issue occurs - the > neighbor hosts will check if the hypervisor that failed - still writes to > heartbeat file. There are bunch of corner case where cloudstack HA does not > kick in - due to uncertainty. > > The new framework should address those uncertainties. > > KVM HA with IPMI Fencing - Apache Cloudstack - Apache Software ... > <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg2MAA&url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCLOUDSTACK%2FKVM%2BHA%2Bwith%2BIPMI%2BFencing&usg=AFQjCNG_-JHCYhcZm0lM9M4gKM4vKQ3hew> > [CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA > <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg9MAE&url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCLOUDSTACK-8943&usg=AFQjCNGkOyC0hR4otCJ1LZF4j-2HSayMyQ> > > Regards > ilya > > On Tue, Jul 18, 2017 at 6:06 AM, ilya musayev < > ilya.mailing.li...@gmail.com> wrote: > >> Hi Victor >> >> We recently rewrote KVM HA framework. Its being merged into latest build. >> >> >> On Tue, Jul 18, 2017 at 5:39 AM, victor <vic...@ihnetworks.com> wrote: >> >>> Hello Guys, >>> >>> I am facing the same issue that mentioned in the following url . >>> >>> ----------------- >>> >>> https://issues.apache.org/jira/browse/CLOUDSTACK-3535 >>> >>> ------------- >>> >>> When the host is put in maintenance mode , then ha enabled VM's are >>> automatically migrated to available host. But when the kvm host is down, no >>> HA is done. The vm's are still down until I put the host node back up. >>> >>> >>> I have tried everything like the following. >>> >>> ===== >>> >>> 1, system VM's and client vm's are created in shared storage >>> >>> 3, Added ha.tag host tags >>> >>> 2, Created host by adding ha tag >>> >>> 3, Created VE's in Ha enabled host with ha enabled service offering >>> >>> ==== >>> >>> Do you guys have successfully tested Ha. I am really stuck at this part. >>> >>> Regards >>> >>> >>> >>> >> >