Ok Somesh! I´ll do it! Thanks!
On Mon, Jul 20, 2015 at 11:42 AM, Somesh Naidu <somesh.na...@citrix.com> wrote: > Milamber, > > What you mention is not a HA use case but live migration. > > As mentioned by Luciano, he shutdown the KVM host "A". Mgmt server > managing that host realized that and disconnected the agent for that host. > I believe it is expected behavior for HA to kick in at this stage. But from > the logs Luciano shared it doesn't seem to be happening. > > Luciano, I suggest tracking this further in the issue tracking system by > raising a product bug. > > Regards, > Somesh > > > -----Original Message----- > From: Milamber [mailto:milam...@apache.org] > Sent: Saturday, July 18, 2015 12:54 PM > To: users@cloudstack.apache.org > Subject: Re: HA feature - KVM - CloudStack 4.5.1 > > > > On 17/07/2015 22:26, Somesh Naidu wrote: > >> Perhaps, the management server don't reconize the host 3 totally down > >> (ping alive? or some quorum don't ok) > >> The only way to the mgt server to accept totally that the host 3 has a > >> real problem that the host 3 has been reboot (around 12:44)? > > The host disconnect was triggered at 12:19 on host 3. Mgmt server was > pretty sure the host is down (it was a graceful shutdown I believe) which > is why it triggered a disconnect and notified other nodes. There was no > checkhealth/checkonhost/etc. triggered; just the agent disconnected and all > listeners (ping/etc.) notified. > > > > At this time mgmt server should have scheduled HA on all VMs running on > that host. The HA investigators would then work their way identifying > whether the VMs are still running, if they need to be fenced, etc. But this > never happened. > > > AFAIK, stopping the cloudstack-agent service don't allow to start the HA > process for the VMs hosted by the node. Seems normal to me that the HA > process don't start at this moment. > If I would start the HA process on a node, I go to the Web UI (or > cloudmonkey) to change the state of the Host from Up to Maintenance. > > > (after I can stop the CS-agent service if I need for exemple reboot a node) > > > > > > Regards, > > Somesh > > > > > > -----Original Message----- > > From: Milamber [mailto:milam...@apache.org] > > Sent: Friday, July 17, 2015 6:01 PM > > To: users@cloudstack.apache.org > > Subject: Re: HA feature - KVM - CloudStack 4.5.1 > > > > > > > > On 17/07/2015 21:23, Somesh Naidu wrote: > >> Ok, so here are my findings. > >> > >> 1. Host ID 3 was shutdown around 2015-07-16 12:19:09 at which point > management server called a disconnect. > >> 2. Based on the logs, it seems VM IDs 32, 18, 39 and 46 were running on > the host. > >> 3. No HA tasks for any of these VMs at this time. > >> 5. Management server restarted at around 2015-07-16 12:30:20. > >> 6. Host ID 3 connected back at around 2015-07-16 12:44:08. > >> 7. Management server identified the missing VMs and triggered HA on > those. > >> 8. The VMs were eventually started, all 4 of them. > >> > >> I am not 100% sure why HA wasn't triggered until 2015-07-16 12:30 (#3), > but I know that management server restart caused it not happen until the > host was reconnected. > > Perhaps, the management server don't reconize the host 3 totally down > > (ping alive? or some quorum don't ok) > > The only way to the mgt server to accept totally that the host 3 has a > > real problem that the host 3 has been reboot (around 12:44)? > > > > What is the storage subsystem? CLVMd? > > > > > >> Regards, > >> Somesh > >> > >> > >> -----Original Message----- > >> From: Luciano Castro [mailto:luciano.cas...@gmail.com] > >> Sent: Friday, July 17, 2015 12:13 PM > >> To: users@cloudstack.apache.org > >> Subject: Re: HA feature - KVM - CloudStack 4.5.1 > >> > >> No problems Somesh, thanks for your help. > >> > >> Link of log: > >> > >> > https://dl.dropboxusercontent.com/u/6774061/management-server.log.2015-07-16.gz > >> > >> Luciano > >> > >> On Fri, Jul 17, 2015 at 12:00 PM, Somesh Naidu <somesh.na...@citrix.com > > > >> wrote: > >> > >>> How large is the management server logs dated 2015-07-16? I would like > to > >>> review the logs. All the information I need from that incident should > be in > >>> there so I don't need any more testing. > >>> > >>> Regards, > >>> Somesh > >>> > >>> -----Original Message----- > >>> From: Luciano Castro [mailto:luciano.cas...@gmail.com] > >>> Sent: Friday, July 17, 2015 7:58 AM > >>> To: users@cloudstack.apache.org > >>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 > >>> > >>> Hi Somesh! > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> [root@1q2 ~]# zgrep -i -E > >>> > >>> > 'SimpleIvestigator|KVMInvestigator|PingInvestigator|ManagementIPSysVMInvestigator' > >>> /var/log/cloudstack/management/management-server.log.2015-07-16.gz > |tail > >>> -5000 > /tmp/management.txt > >>> [root@1q2 ~]# cat /tmp/management.txt > >>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.ExtensionRegistry] > (main:null) > >>> Registering extension [KVMInvestigator] in [Ha Investigators Registry] > >>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.RegistryLifecycle] > (main:null) > >>> Registered com.cloud.ha.KVMInvestigator@57ceec9a > >>> 2015-07-16 12:30:45,927 DEBUG [o.a.c.s.l.r.ExtensionRegistry] > (main:null) > >>> Registering extension [PingInvestigator] in [Ha Investigators Registry] > >>> 2015-07-16 12:30:45,928 DEBUG [o.a.c.s.l.r.ExtensionRegistry] > (main:null) > >>> Registering extension [ManagementIPSysVMInvestigator] in [Ha > Investigators > >>> Registry] > >>> 2015-07-16 12:30:53,796 INFO [o.a.c.s.l.r.DumpRegistry] (main:null) > >>> Registry [Ha Investigators Registry] contains [SimpleInvestigator, > >>> XenServerInvestigator, KVMInv > >>> > >>> I searched this log before, but as I thought that had not nothing > >>> special. > >>> > >>> If you want propose to me another scenario of test, I can do it. > >>> > >>> Thanks > >>> > >>> > >>> On Thu, Jul 16, 2015 at 7:27 PM, Somesh Naidu <somesh.na...@citrix.com > > > >>> wrote: > >>> > >>>> What about other investigators, specifically " KVMInvestigator, > >>>> PingInvestigator"? They report the VMs as alive=false too? > >>>> > >>>> Also, it is recommended that you look at the management-sever.log > instead > >>>> of catalina.out (for one, the latter doesn’t have timestamp). > >>>> > >>>> Regards, > >>>> Somesh > >>>> > >>>> > >>>> -----Original Message----- > >>>> From: Luciano Castro [mailto:luciano.cas...@gmail.com] > >>>> Sent: Thursday, July 16, 2015 1:14 PM > >>>> To: users@cloudstack.apache.org > >>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1 > >>>> > >>>> Hi Somesh! > >>>> > >>>> > >>>> thanks for help.. I did again ,and I collected new logs: > >>>> > >>>> My vm_instance name is i-2-39-VM. There was some routers in KVM host > 'A' > >>>> (this one that I powered off now): > >>>> > >>>> > >>>> [root@1q2 ~]# grep -i -E 'SimpleInvestigator.*false' > >>>> /var/log/cloudstack/management/catalina.out > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-e2f91c9c > >>> work-3) > >>>> SimpleInvestigator found VM[DomainRouter|r-4-VM]to be alive? false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-729acf4f > >>> work-7) > >>>> SimpleInvestigator found VM[User|i-23-33-VM]to be alive? false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-a66a4941 > >>> work-8) > >>>> SimpleInvestigator found VM[DomainRouter|r-36-VM]to be alive? false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-5977245e > >>>> work-10) SimpleInvestigator found VM[User|i-17-26-VM]to be alive? > false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-c7f39be0 > >>> work-9) > >>>> SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-ad4f5fda > >>>> work-10) SimpleInvestigator found VM[DomainRouter|r-46-VM]to be alive? > >>>> false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-0257f5af > >>>> work-11) SimpleInvestigator found VM[User|i-4-52-VM]to be alive? false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-7ddff382 > >>>> work-12) SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? > >>>> false > >>>> INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-9f79917e > >>>> work-13) SimpleInvestigator found VM[User|i-2-39-VM]to be alive? false > >>>> > >>>> > >>>> > >>>> KVM host 'B' agent log (where the machine would be migrate): > >>>> > >>>> 2015-07-16 16:58:56,537 INFO [kvm.resource.LibvirtComputingResource] > >>>> (agentRequest-Handler-4:null) Live migration of instance i-2-39-VM > >>>> initiated > >>>> 2015-07-16 16:58:57,540 INFO [kvm.resource.LibvirtComputingResource] > >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to > >>>> complete, waited 1000ms > >>>> 2015-07-16 16:58:58,541 INFO [kvm.resource.LibvirtComputingResource] > >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to > >>>> complete, waited 2000ms > >>>> 2015-07-16 16:58:59,542 INFO [kvm.resource.LibvirtComputingResource] > >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to > >>>> complete, waited 3000ms > >>>> 2015-07-16 16:59:00,543 INFO [kvm.resource.LibvirtComputingResource] > >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to > >>>> complete, waited 4000ms > >>>> 2015-07-16 16:59:01,245 INFO [kvm.resource.LibvirtComputingResource] > >>>> (agentRequest-Handler-4:null) Migration thread for i-2-39-VM is done > >>>> > >>>> It said done for my i-2-39-VM instance, but I can´t ping this host. > >>>> > >>>> Luciano > >>>> > >>> > >>> -- > >>> Luciano Castro > >>> > >> > > -- Luciano Castro