Ok Somesh! I´ll do it!

Thanks!

On Mon, Jul 20, 2015 at 11:42 AM, Somesh Naidu <somesh.na...@citrix.com>
wrote:

> Milamber,
>
> What you mention is not a HA use case  but live migration.
>
> As mentioned by Luciano, he shutdown the KVM host "A". Mgmt server
> managing that host realized that and disconnected the agent for that host.
> I believe it is expected behavior for HA to kick in at this stage. But from
> the logs Luciano shared it doesn't seem to be happening.
>
> Luciano, I suggest tracking this further in the issue tracking system by
> raising a product bug.
>
> Regards,
> Somesh
>
>
> -----Original Message-----
> From: Milamber [mailto:milam...@apache.org]
> Sent: Saturday, July 18, 2015 12:54 PM
> To: users@cloudstack.apache.org
> Subject: Re: HA feature - KVM - CloudStack 4.5.1
>
>
>
> On 17/07/2015 22:26, Somesh Naidu wrote:
> >> Perhaps, the management server don't reconize the host 3 totally down
> >> (ping alive? or some quorum don't ok)
> >> The only way to the mgt server to accept totally that the host 3 has a
> >> real problem that the host 3 has been reboot (around 12:44)?
> > The host disconnect was triggered at 12:19 on host 3. Mgmt server was
> pretty sure the host is down (it was a graceful shutdown I believe) which
> is why it triggered a disconnect and notified other nodes. There was no
> checkhealth/checkonhost/etc. triggered; just the agent disconnected and all
> listeners (ping/etc.) notified.
> >
> > At this time mgmt server should have scheduled HA on all VMs running on
> that host. The HA investigators would then work their way identifying
> whether the VMs are still running, if they need to be fenced, etc. But this
> never happened.
>
>
> AFAIK, stopping the cloudstack-agent service don't allow to start the HA
> process for the VMs hosted by the node. Seems normal to me that the HA
> process don't start at this moment.
> If I would start the HA process on a node, I go to the Web UI (or
> cloudmonkey) to change the state of the Host from Up to Maintenance.
>
>
> (after I can stop the CS-agent service if I need for exemple reboot a node)
>
>
> >
> > Regards,
> > Somesh
> >
> >
> > -----Original Message-----
> > From: Milamber [mailto:milam...@apache.org]
> > Sent: Friday, July 17, 2015 6:01 PM
> > To: users@cloudstack.apache.org
> > Subject: Re: HA feature - KVM - CloudStack 4.5.1
> >
> >
> >
> > On 17/07/2015 21:23, Somesh Naidu wrote:
> >> Ok, so here are my findings.
> >>
> >> 1. Host ID 3 was shutdown around 2015-07-16 12:19:09 at which point
> management server called a disconnect.
> >> 2. Based on the logs, it seems VM IDs 32, 18, 39 and 46 were running on
> the host.
> >> 3. No HA tasks for any of these VMs at this time.
> >> 5. Management server restarted at around 2015-07-16 12:30:20.
> >> 6. Host ID 3 connected back at around 2015-07-16 12:44:08.
> >> 7. Management server identified the missing VMs and triggered HA on
> those.
> >> 8. The VMs were eventually started, all 4 of them.
> >>
> >> I am not 100% sure why HA wasn't triggered until 2015-07-16 12:30 (#3),
> but I know that management server restart caused it not happen until the
> host was reconnected.
> > Perhaps, the management server don't reconize the host 3 totally down
> > (ping alive? or some quorum don't ok)
> > The only way to the mgt server to accept totally that the host 3 has a
> > real problem that the host 3 has been reboot (around 12:44)?
> >
> > What is the storage subsystem? CLVMd?
> >
> >
> >> Regards,
> >> Somesh
> >>
> >>
> >> -----Original Message-----
> >> From: Luciano Castro [mailto:luciano.cas...@gmail.com]
> >> Sent: Friday, July 17, 2015 12:13 PM
> >> To: users@cloudstack.apache.org
> >> Subject: Re: HA feature - KVM - CloudStack 4.5.1
> >>
> >> No problems Somesh, thanks for your help.
> >>
> >> Link of log:
> >>
> >>
> https://dl.dropboxusercontent.com/u/6774061/management-server.log.2015-07-16.gz
> >>
> >> Luciano
> >>
> >> On Fri, Jul 17, 2015 at 12:00 PM, Somesh Naidu <somesh.na...@citrix.com
> >
> >> wrote:
> >>
> >>> How large is the management server logs dated 2015-07-16? I would like
> to
> >>> review the logs. All the information I need from that incident should
> be in
> >>> there so I don't need any more testing.
> >>>
> >>> Regards,
> >>> Somesh
> >>>
> >>> -----Original Message-----
> >>> From: Luciano Castro [mailto:luciano.cas...@gmail.com]
> >>> Sent: Friday, July 17, 2015 7:58 AM
> >>> To: users@cloudstack.apache.org
> >>> Subject: Re: HA feature - KVM - CloudStack 4.5.1
> >>>
> >>> Hi Somesh!
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> [root@1q2 ~]# zgrep -i -E
> >>>
> >>>
> 'SimpleIvestigator|KVMInvestigator|PingInvestigator|ManagementIPSysVMInvestigator'
> >>> /var/log/cloudstack/management/management-server.log.2015-07-16.gz
> |tail
> >>> -5000 > /tmp/management.txt
> >>> [root@1q2 ~]# cat /tmp/management.txt
> >>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.ExtensionRegistry]
> (main:null)
> >>> Registering extension [KVMInvestigator] in [Ha Investigators Registry]
> >>> 2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.RegistryLifecycle]
> (main:null)
> >>> Registered com.cloud.ha.KVMInvestigator@57ceec9a
> >>> 2015-07-16 12:30:45,927 DEBUG [o.a.c.s.l.r.ExtensionRegistry]
> (main:null)
> >>> Registering extension [PingInvestigator] in [Ha Investigators Registry]
> >>> 2015-07-16 12:30:45,928 DEBUG [o.a.c.s.l.r.ExtensionRegistry]
> (main:null)
> >>> Registering extension [ManagementIPSysVMInvestigator] in [Ha
> Investigators
> >>> Registry]
> >>> 2015-07-16 12:30:53,796 INFO  [o.a.c.s.l.r.DumpRegistry] (main:null)
> >>> Registry [Ha Investigators Registry] contains [SimpleInvestigator,
> >>> XenServerInvestigator, KVMInv
> >>>
> >>> I  searched  this log before, but as I thought that had not nothing
> >>> special.
> >>>
> >>> If you want propose to me another scenario of test, I can do it.
> >>>
> >>> Thanks
> >>>
> >>>
> >>> On Thu, Jul 16, 2015 at 7:27 PM, Somesh Naidu <somesh.na...@citrix.com
> >
> >>> wrote:
> >>>
> >>>> What about other investigators, specifically " KVMInvestigator,
> >>>> PingInvestigator"? They report the VMs as alive=false too?
> >>>>
> >>>> Also, it is recommended that you look at the management-sever.log
> instead
> >>>> of catalina.out (for one, the latter doesn’t have timestamp).
> >>>>
> >>>> Regards,
> >>>> Somesh
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Luciano Castro [mailto:luciano.cas...@gmail.com]
> >>>> Sent: Thursday, July 16, 2015 1:14 PM
> >>>> To: users@cloudstack.apache.org
> >>>> Subject: Re: HA feature - KVM - CloudStack 4.5.1
> >>>>
> >>>> Hi Somesh!
> >>>>
> >>>>
> >>>> thanks for help.. I did again ,and I collected new logs:
> >>>>
> >>>> My vm_instance name is i-2-39-VM. There was some routers in KVM host
> 'A'
> >>>> (this one that I powered off now):
> >>>>
> >>>>
> >>>> [root@1q2 ~]# grep -i -E 'SimpleInvestigator.*false'
> >>>> /var/log/cloudstack/management/catalina.out
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-e2f91c9c
> >>> work-3)
> >>>> SimpleInvestigator found VM[DomainRouter|r-4-VM]to be alive? false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-729acf4f
> >>> work-7)
> >>>> SimpleInvestigator found VM[User|i-23-33-VM]to be alive? false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-a66a4941
> >>> work-8)
> >>>> SimpleInvestigator found VM[DomainRouter|r-36-VM]to be alive? false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-5977245e
> >>>> work-10) SimpleInvestigator found VM[User|i-17-26-VM]to be alive?
> false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-c7f39be0
> >>> work-9)
> >>>> SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-ad4f5fda
> >>>> work-10) SimpleInvestigator found VM[DomainRouter|r-46-VM]to be alive?
> >>>> false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-0257f5af
> >>>> work-11) SimpleInvestigator found VM[User|i-4-52-VM]to be alive? false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-7ddff382
> >>>> work-12) SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive?
> >>>> false
> >>>> INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-9f79917e
> >>>> work-13) SimpleInvestigator found VM[User|i-2-39-VM]to be alive? false
> >>>>
> >>>>
> >>>>
> >>>> KVM  host 'B' agent log (where the machine would be migrate):
> >>>>
> >>>> 2015-07-16 16:58:56,537 INFO  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-4:null) Live migration of instance i-2-39-VM
> >>>> initiated
> >>>> 2015-07-16 16:58:57,540 INFO  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
> >>>> complete, waited 1000ms
> >>>> 2015-07-16 16:58:58,541 INFO  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
> >>>> complete, waited 2000ms
> >>>> 2015-07-16 16:58:59,542 INFO  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
> >>>> complete, waited 3000ms
> >>>> 2015-07-16 16:59:00,543 INFO  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
> >>>> complete, waited 4000ms
> >>>> 2015-07-16 16:59:01,245 INFO  [kvm.resource.LibvirtComputingResource]
> >>>> (agentRequest-Handler-4:null) Migration thread for i-2-39-VM is done
> >>>>
> >>>> It said done for my i-2-39-VM instance, but I can´t ping this host.
> >>>>
> >>>> Luciano
> >>>>
> >>>
> >>> --
> >>> Luciano Castro
> >>>
> >>
>
>


-- 
Luciano Castro

Reply via email to