Hi Paul, Thanks for the clarification. I currently don't have an ipmi enabled hardware (in test environment), but it will be beneficial if you can help me clear out some basic concepts of it: - If HA-enabled VMs are autostarted on another host when current host goes down, what is the need or purpose of HA-host? (other than management server able to remotely control it's power interfaces) - I understood the "Shoot-the-other-node-in-the-head" (STONITH) approach ACS uses to fence the host, but I couldn't find what mechanism or events trigger this?
Thanks and regards, Parth Patel On Wed, 14 Mar 2018 at 02:22 Paul Angus <paul.an...@shapeblue.com> wrote: > The management server doesn't ping the host through IPMI. However if > IPMI is not available, you will not be able to use Host HA, as there is no > way for CloudStack to 'fence' the host - that is shut it down to be sure > that a VM cannot start again on that host. > > I can explain why that is necessary if you wish. > > > Kind regards, > > Paul Angus > > paul.an...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > -----Original Message----- > From: Parth Patel <parthpatel2...@gmail.com> > Sent: 13 March 2018 16:57 > To: users@cloudstack.apache.org > Cc: Jon Marshall <jms....@hotmail.co.uk> > Subject: Re: KVM HostHA > > Hi Jon and Victor, > > I think the management server pings your host using ipmi (I really don't > hope this is the case). > In my case, I did not have OOBM enabled at all (my hardware didn't support > it) > I think you could disable OOBM and/or HA-Host and give that a try :) > > On Tue, 13 Mar 2018 at 20:40 victor <vic...@ihnetworks.com> wrote: > > > Hello Guys, > > > > I have tried the following two cases. > > > > 1, "echo c > /proc/sysrq-trigger" > > > > 2, Pulled the network cable of one of the host > > > > In both cases, the following happened. > > > > ===== > > 2018-03-13 08:22:54,978 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] > > (AgentTaskPool-15:ctx-c8d9f5d2) (logid:c0a3d2da) Notifying other nodes > > of to disconnect > > 2018-03-13 08:22:54,983 INFO [c.c.a.m.AgentManagerImpl] > > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Host 4 is > > disconnecting with event AgentDisconnected > > 2018-03-13 08:22:54,985 DEBUG [c.c.a.m.AgentManagerImpl] > > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Host 4 is already > > Alert > > 2018-03-13 08:22:54,985 DEBUG [c.c.a.m.AgentManagerImpl] > > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Deregistering link > > for > > 4 with state Alert > > 2018-03-13 08:22:54,985 DEBUG [c.c.a.m.AgentManagerImpl] > > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Remove Agent : 4 > > ===== > > > > But nothing happened for the vm's in that node. I have waited for one > > hour and the VM's in that node has been migrated to the other > > available hosts. I think the issue is that the management server still > > thinks that the VM's in that host is running. Please check the > > following logs > > > > ======= > > 2018-03-13 11:08:25,882 DEBUG [c.c.c.CapacityManagerImpl] > > (CapacityChecker:ctx-1d8378af) (logid:ae906a50) Found 1 VMs on host 4 > > 2018-03-13 11:08:25,888 DEBUG [c.c.c.CapacityManagerImpl] > > (CapacityChecker:ctx-1d8378af) (logid:ae906a50) Found 0 VM, not > > running on host 4 ======== > > > > > > On 03/13/2018 04:20 PM, Jon Marshall wrote: > > > I tried "echo c > /proc/sysrq-trigger" which stopped me getting into > > > the > > server but it did not stop the server responding to an ipmitool > > request on the manager eg - > > > > > > > > > "ipmitool -I lanplus -H 172.16.7.29 -U admin3 -P letmein chassis > status" > > > > > > > > > from the management server got an answer saying the chassis power > > > was on > > so CS never registered the compute node as down. > > > > > > > > > I am obviously doing something wrong but cannot work it out. > > > > > > > > > The management server has one NIC - 172.16.7.4 > > > > > > > > > Each compute node has 3 NICs - > > > > > > > > > cnode1 > > cnode2 > > > > > > > > > mangement NIC 172.16.7.5 172.16.7.6 > > > > > > vm NIC 172.16.6.130 172.16.6.131 > > > > > > storage - 172.16.250.4 172.16.250.5 > > > > > > > > > Dell LOM (for Idrac) 172.16.7.29 172.16.7.30 > > > > > > > > > the dell LOM IPs are the ones used to configure OOBM in the UI > > > > > > > > > > > > If I pull the storage NIC presumably nothing will happen as the > > > ipmitool > > check is running across the management NIC so I need to pull both ? > > > > > > My understanding of host HA was the management server monitored the > > compute nodes using ipmitool and if it did not get a response because > > the host was down it would fence off that host and move the VMs to an > > active compute node. > > > > > > This is obviously too simplistic so could someone explain how it is > > meant to work and what it is protecting against ? > > > > > > ________________________________ > > > From: Paul Angus <paul.an...@shapeblue.com> > > > Sent: 13 March 2018 07:01 > > > To: users@cloudstack.apache.org > > > Subject: RE: KVM HostHA > > > > > > Hi all, > > > > > > One small note, unplugging the management NIC will only cause an HA > > event if the storage is running over that NIC also. > > > > > > Is the storage is over a separate NIC then, the guest VMs will > > > continue > > to run when the mgmt. NIC is unplugged, Host HA will detect the disk > > activity and conclude that there is nothing it can do, as the VMs are > > still running other than mark the hosts as degraded. > > > > > > > > > Kind regards, > > > > > > Paul Angus > > > > > > paul.an...@shapeblue.com > > > www.shapeblue.com<http://www.shapeblue.com> > > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > > http://www.shapeblue.com/> > > > > > > Shapeblue - The CloudStack Company > <https://maps.google.com/?q=ack+Company+%0D%0A%3E+%3E+w&entry=gmail&source=g> > <http://www.shapeblue.com/> > > > www.shapeblue.com > > > Rapid deployment framework for Apache CloudStack IaaS Clouds. > > > CSForge is > > a framework developed by ShapeBlue to deli > > <https://maps.google.com/?q=is+a+framework+developed+by+ShapeBlue+to+d > > eli&entry=gmail&source=g>ver the rapid deployment of a standardised > > ... > > > > > > > > > > > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue > > > > > > > > > > > > > > > -----Original Message----- > > > From: Parth Patel <parthpatel2...@gmail.com> > > > Sent: 12 March 2018 17:35 > > > To: users@cloudstack.apache.org > > > Subject: Re: KVM HostHA > > > > > >> Hi Jon, > > >> > > >> As I said, in my case, making the host HA didn't work but by just > > >> having a HA VM running on host and executing - (WARNING) "echo c > > > >> /proc/sysrq-trigger" to simulate a kernel crash on host, the > > >> management server registered it as down and started the VM on another > > >> host. I know I've suggested this before but I insist you give this a > > >> try. Also, you don't need to completely power off the machine manually > > >> but just plugging out the network cable works fine. The cloudstack > > >> agent after losing connection to management server auto reboots > > >> because of KVM heartbeat check shell script mentioned by Rohit Yadav > > >> to one of my earlier queries in other thread. > > >> > > >> On Mon 12 Mar, 2018, 21:23 Jon Marshall, <jms....@hotmail.co.uk> > wrote: > > >> Hi Paul > > >> > > >> > > >> Thanks for the response. > > >> > > >> > > >> I think I am not understanding how it was meant to work then. My > > >> understanding was that the manager used ipmitool to just keep querying > > >> the compute nodes as to their status so I assumed it didn't matter how > > >> you shut the node down, once it was down the manager would get no > > >> response and mark it as down (which it does). > > >> > > >> > > >> I am in testing mode so I think I will just go and pull the power and > > >> see what happens :) > > >> > > >> > > >> Thanks > > >> > > >> > > >> Jon > > >> > > >> > > >> ________________________________ > > >> From: Paul Angus <paul.an...@shapeblue.com> > > >> Sent: 12 March 2018 15:31 > > >> To: users@cloudstack.apache.org > > >> Subject: RE: KVM HostHA > > >> Hi Jon, > > >> > > >> I think that what you guys are finding, is that a controlled host > > >> shutdown, which will cause the agent to shutdown cleanly; Is not > > >> considered an HA event. I wouldn't expect CloudStack to take any > > >> action if you shut down a host, only if the host (agent) stops > > responding. > > >> > > >> > > >> > > >> > > >> Kind regards, > > >> > > >> Paul Angus > > >> > > >> paul.an...@shapeblue.com > > >> www.shapeblue.com<http://www.shapeblue.com> > > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > > http://www.shapeblue.com/> > > > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > > www.shapeblue.com > > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge > is > > a framework developed by ShapeBlue to deliver the rapid deployment of a > > standardised ... > > > > > > > > > > > >> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png > > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png] > > > > > > ]< > > >> http://www.shapeblue.com/> > > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > > http://www.shapeblue.com/> > > > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > > www.shapeblue.com > > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge > is > > a framework developed by ShapeBlue to deliver the rapid deployment of a > > standardised ... > > > > > > > > > > > >> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > > http://www.shapeblue.com/> > > > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > > www.shapeblue.com > > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge > is > > a framework developed by ShapeBlue to deliver the rapid deployment of a > > standardised ... > > > > > > > > > > > >> www.shapeblue.com<http://www.shapeblue.com> > > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > > http://www.shapeblue.com/> > > > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > > www.shapeblue.com > > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge > is > > a framework developed by ShapeBlue to deliver > > < > https://maps.google.com/?q=framework+developed+by+ShapeBlue+to+deliver&entry=gmail&source=g > > > > the rapid deployment of a standardised ... > > > > > > > > > > > >> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge > > >> is a framework developed by ShapeBlue to deliver the rapid deployment > > >> of a standardised ... > > >> > > >> > > >> > > >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue > > >> > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: Jon Marshall <jms....@hotmail.co.uk> > > >> Sent: 12 March 2018 15:15 > > >> To: users@cloudstack.apache.org > > >> Subject: Re: KVM HostHA > > >> > > >> I have the same issue here and am not entirely sure what the behaviour > > >> should be. > > >> > > >> > > >> I have one manager node and 2 compute nodes running 4.11 with ipmi > > working > > >> correctly. > > >> > > >> > > >> From the UI under HA - > > >> > > >> > > >> HA Enabled Yes > > >> HA State Available > > >> HA Provider kvmhaprovider > > >> > > >> > > >> although interestingly from the "Details" tab it shows - > > >> > > >> > > >> HA enabled No > > >> > > >> > > >> which I assume is a cosmetic issue ? > > >> > > >> > > >> On each compute node I have one HA enabled VM and one non HA enabled > VM. > > >> > > >> > > >> I power off a compute node and the UI updates the host status and the > > VMs > > >> on that node stop responding but they never fail over to the other > node. > > >> > > >> > > >> Couple of things I noticed - > > >> > > >> > > >> 1) as soon as i power off the compute node the HA state on the other > > node > > >> shows "Ineligible" > > >> > > >> > > >> 2) In the UI the instances all still show as green even though two of > > them > > >> are not available > > >> > > >> > > >> Any help much appreciated > > >> > > >> > > >> > > >> > > >> ________________________________ > > >> From: victor <vic...@ihnetworks.com> > > >> Sent: 07 March 2018 17:01 > > >> To: users@cloudstack.apache.org > > >> Subject: KVM HostHA > > >> > > >> Hello Guys, > > >> > > >> I have installed cloudstack 4.11. I have enabled HA for each hosts I > > have > > >> added. I have also added ipmi successfully (using ipmi driver). > > >> The hosts are showing like the following. > > >> > > >> ======= > > >> > > >> HA Enabled Yes > > >> HA State Available > > >> HA Provider kvmhaprovider > > >> > > >> ====== > > >> > > >> Also the host is showing the following correctly > > >> > > >> Resource state --> Enabled > > >> State --> UP > > >> Power state --> On > > >> > > >> So I have shutdown one of the hosts to see how the KVM hosts Ha is > > >> working. I have waited for half an hour. But nothing has happened. > What > > >> will happen to the VM's in that host, if the host failed to back up. > > >> There isn't much from logs. > > >> > > >> Regards > > >> Victor > > >> > > > > >