Hi Jon and Victor, I think the management server pings your host using ipmi (I really don't hope this is the case). In my case, I did not have OOBM enabled at all (my hardware didn't support it) I think you could disable OOBM and/or HA-Host and give that a try :)
On Tue, 13 Mar 2018 at 20:40 victor <[email protected]> wrote: > Hello Guys, > > I have tried the following two cases. > > 1, "echo c > /proc/sysrq-trigger" > > 2, Pulled the network cable of one of the host > > In both cases, the following happened. > > ===== > 2018-03-13 08:22:54,978 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] > (AgentTaskPool-15:ctx-c8d9f5d2) (logid:c0a3d2da) Notifying other nodes > of to disconnect > 2018-03-13 08:22:54,983 INFO [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Host 4 is disconnecting > with event AgentDisconnected > 2018-03-13 08:22:54,985 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Host 4 is already Alert > 2018-03-13 08:22:54,985 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Deregistering link for > 4 with state Alert > 2018-03-13 08:22:54,985 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-16:ctx-d8204625) (logid:ffe4a426) Remove Agent : 4 > ===== > > But nothing happened for the vm's in that node. I have waited for one > hour and the VM's in that node has been migrated to the other available > hosts. I think the issue is that the management server still thinks that > the VM's in that host is running. Please check the following logs > > ======= > 2018-03-13 11:08:25,882 DEBUG [c.c.c.CapacityManagerImpl] > (CapacityChecker:ctx-1d8378af) (logid:ae906a50) Found 1 VMs on host 4 > 2018-03-13 11:08:25,888 DEBUG [c.c.c.CapacityManagerImpl] > (CapacityChecker:ctx-1d8378af) (logid:ae906a50) Found 0 VM, not running > on host 4 > ======== > > > On 03/13/2018 04:20 PM, Jon Marshall wrote: > > I tried "echo c > /proc/sysrq-trigger" which stopped me getting into the > server but it did not stop the server responding to an ipmitool request on > the manager eg - > > > > > > "ipmitool -I lanplus -H 172.16.7.29 -U admin3 -P letmein chassis status" > > > > > > from the management server got an answer saying the chassis power was on > so CS never registered the compute node as down. > > > > > > I am obviously doing something wrong but cannot work it out. > > > > > > The management server has one NIC - 172.16.7.4 > > > > > > Each compute node has 3 NICs - > > > > > > cnode1 > cnode2 > > > > > > mangement NIC 172.16.7.5 172.16.7.6 > > > > vm NIC 172.16.6.130 172.16.6.131 > > > > storage - 172.16.250.4 172.16.250.5 > > > > > > Dell LOM (for Idrac) 172.16.7.29 172.16.7.30 > > > > > > the dell LOM IPs are the ones used to configure OOBM in the UI > > > > > > > > If I pull the storage NIC presumably nothing will happen as the ipmitool > check is running across the management NIC so I need to pull both ? > > > > My understanding of host HA was the management server monitored the > compute nodes using ipmitool and if it did not get a response because the > host was down it would fence off that host and move the VMs to an active > compute node. > > > > This is obviously too simplistic so could someone explain how it is > meant to work and what it is protecting against ? > > > > ________________________________ > > From: Paul Angus <[email protected]> > > Sent: 13 March 2018 07:01 > > To: [email protected] > > Subject: RE: KVM HostHA > > > > Hi all, > > > > One small note, unplugging the management NIC will only cause an HA > event if the storage is running over that NIC also. > > > > Is the storage is over a separate NIC then, the guest VMs will continue > to run when the mgmt. NIC is unplugged, Host HA will detect the disk > activity and conclude that there is nothing it can do, as the VMs are still > running other than mark the hosts as degraded. > > > > > > Kind regards, > > > > Paul Angus > > > > [email protected] > > www.shapeblue.com<http://www.shapeblue.com> > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > http://www.shapeblue.com/> > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > www.shapeblue.com > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is > a framework developed by ShapeBlue to deli > <https://maps.google.com/?q=is+a+framework+developed+by+ShapeBlue+to+deli&entry=gmail&source=g>ver > the rapid deployment of a standardised ... > > > > > > > > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > @shapeblue > > > > > > > > > > -----Original Message----- > > From: Parth Patel <[email protected]> > > Sent: 12 March 2018 17:35 > > To: [email protected] > > Subject: Re: KVM HostHA > > > >> Hi Jon, > >> > >> As I said, in my case, making the host HA didn't work but by just > >> having a HA VM running on host and executing - (WARNING) "echo c > > >> /proc/sysrq-trigger" to simulate a kernel crash on host, the > >> management server registered it as down and started the VM on another > >> host. I know I've suggested this before but I insist you give this a > >> try. Also, you don't need to completely power off the machine manually > >> but just plugging out the network cable works fine. The cloudstack > >> agent after losing connection to management server auto reboots > >> because of KVM heartbeat check shell script mentioned by Rohit Yadav > >> to one of my earlier queries in other thread. > >> > >> On Mon 12 Mar, 2018, 21:23 Jon Marshall, <[email protected]> wrote: > >> Hi Paul > >> > >> > >> Thanks for the response. > >> > >> > >> I think I am not understanding how it was meant to work then. My > >> understanding was that the manager used ipmitool to just keep querying > >> the compute nodes as to their status so I assumed it didn't matter how > >> you shut the node down, once it was down the manager would get no > >> response and mark it as down (which it does). > >> > >> > >> I am in testing mode so I think I will just go and pull the power and > >> see what happens :) > >> > >> > >> Thanks > >> > >> > >> Jon > >> > >> > >> ________________________________ > >> From: Paul Angus <[email protected]> > >> Sent: 12 March 2018 15:31 > >> To: [email protected] > >> Subject: RE: KVM HostHA > >> Hi Jon, > >> > >> I think that what you guys are finding, is that a controlled host > >> shutdown, which will cause the agent to shutdown cleanly; Is not > >> considered an HA event. I wouldn't expect CloudStack to take any > >> action if you shut down a host, only if the host (agent) stops > responding. > >> > >> > >> > >> > >> Kind regards, > >> > >> Paul Angus > >> > >> [email protected] > >> www.shapeblue.com<http://www.shapeblue.com> > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > http://www.shapeblue.com/> > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > www.shapeblue.com > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is > a framework developed by ShapeBlue to deliver the rapid deployment of a > standardised ... > > > > > > > >> [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png] > > > > ]< > >> http://www.shapeblue.com/> > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > http://www.shapeblue.com/> > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > www.shapeblue.com > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is > a framework developed by ShapeBlue to deliver the rapid deployment of a > standardised ... > > > > > > > >> Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > http://www.shapeblue.com/> > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > www.shapeblue.com > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is > a framework developed by ShapeBlue to deliver the rapid deployment of a > standardised ... > > > > > > > >> www.shapeblue.com<http://www.shapeblue.com> > > [http://www.shapeblue.com/wp-content/uploads/2017/06/logo.png]< > http://www.shapeblue.com/> > > > > Shapeblue - The CloudStack Company<http://www.shapeblue.com/> > > www.shapeblue.com > > Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge is > a framework developed by ShapeBlue to deliver > <https://maps.google.com/?q=framework+developed+by+ShapeBlue+to+deliver&entry=gmail&source=g> > the rapid deployment of a standardised ... > > > > > > > >> Rapid deployment framework for Apache CloudStack IaaS Clouds. CSForge > >> is a framework developed by ShapeBlue to deliver the rapid deployment > >> of a standardised ... > >> > >> > >> > >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue > >> > >> > >> > >> > >> -----Original Message----- > >> From: Jon Marshall <[email protected]> > >> Sent: 12 March 2018 15:15 > >> To: [email protected] > >> Subject: Re: KVM HostHA > >> > >> I have the same issue here and am not entirely sure what the behaviour > >> should be. > >> > >> > >> I have one manager node and 2 compute nodes running 4.11 with ipmi > working > >> correctly. > >> > >> > >> From the UI under HA - > >> > >> > >> HA Enabled Yes > >> HA State Available > >> HA Provider kvmhaprovider > >> > >> > >> although interestingly from the "Details" tab it shows - > >> > >> > >> HA enabled No > >> > >> > >> which I assume is a cosmetic issue ? > >> > >> > >> On each compute node I have one HA enabled VM and one non HA enabled VM. > >> > >> > >> I power off a compute node and the UI updates the host status and the > VMs > >> on that node stop responding but they never fail over to the other node. > >> > >> > >> Couple of things I noticed - > >> > >> > >> 1) as soon as i power off the compute node the HA state on the other > node > >> shows "Ineligible" > >> > >> > >> 2) In the UI the instances all still show as green even though two of > them > >> are not available > >> > >> > >> Any help much appreciated > >> > >> > >> > >> > >> ________________________________ > >> From: victor <[email protected]> > >> Sent: 07 March 2018 17:01 > >> To: [email protected] > >> Subject: KVM HostHA > >> > >> Hello Guys, > >> > >> I have installed cloudstack 4.11. I have enabled HA for each hosts I > have > >> added. I have also added ipmi successfully (using ipmi driver). > >> The hosts are showing like the following. > >> > >> ======= > >> > >> HA Enabled Yes > >> HA State Available > >> HA Provider kvmhaprovider > >> > >> ====== > >> > >> Also the host is showing the following correctly > >> > >> Resource state --> Enabled > >> State --> UP > >> Power state --> On > >> > >> So I have shutdown one of the hosts to see how the KVM hosts Ha is > >> working. I have waited for half an hour. But nothing has happened. What > >> will happen to the VM's in that host, if the host failed to back up. > >> There isn't much from logs. > >> > >> Regards > >> Victor > >> > >
