On 17/07/2015 22:26, Somesh Naidu wrote:
Perhaps, the management server don't reconize the host 3 totally down
(ping alive? or some quorum don't ok)
The only way to the mgt server to accept totally that the host 3 has a
real problem that the host 3 has been reboot (around 12:44)?
The host disconnect was triggered at 12:19 on host 3. Mgmt server was pretty 
sure the host is down (it was a graceful shutdown I believe) which is why it 
triggered a disconnect and notified other nodes. There was no 
checkhealth/checkonhost/etc. triggered; just the agent disconnected and all 
listeners (ping/etc.) notified.

At this time mgmt server should have scheduled HA on all VMs running on that 
host. The HA investigators would then work their way identifying whether the 
VMs are still running, if they need to be fenced, etc. But this never happened.


AFAIK, stopping the cloudstack-agent service don't allow to start the HA process for the VMs hosted by the node. Seems normal to me that the HA process don't start at this moment. If I would start the HA process on a node, I go to the Web UI (or cloudmonkey) to change the state of the Host from Up to Maintenance.


(after I can stop the CS-agent service if I need for exemple reboot a node)



Regards,
Somesh


-----Original Message-----
From: Milamber [mailto:milam...@apache.org]
Sent: Friday, July 17, 2015 6:01 PM
To: users@cloudstack.apache.org
Subject: Re: HA feature - KVM - CloudStack 4.5.1



On 17/07/2015 21:23, Somesh Naidu wrote:
Ok, so here are my findings.

1. Host ID 3 was shutdown around 2015-07-16 12:19:09 at which point management 
server called a disconnect.
2. Based on the logs, it seems VM IDs 32, 18, 39 and 46 were running on the 
host.
3. No HA tasks for any of these VMs at this time.
5. Management server restarted at around 2015-07-16 12:30:20.
6. Host ID 3 connected back at around 2015-07-16 12:44:08.
7. Management server identified the missing VMs and triggered HA on those.
8. The VMs were eventually started, all 4 of them.

I am not 100% sure why HA wasn't triggered until 2015-07-16 12:30 (#3), but I 
know that management server restart caused it not happen until the host was 
reconnected.
Perhaps, the management server don't reconize the host 3 totally down
(ping alive? or some quorum don't ok)
The only way to the mgt server to accept totally that the host 3 has a
real problem that the host 3 has been reboot (around 12:44)?

What is the storage subsystem? CLVMd?


Regards,
Somesh


-----Original Message-----
From: Luciano Castro [mailto:luciano.cas...@gmail.com]
Sent: Friday, July 17, 2015 12:13 PM
To: users@cloudstack.apache.org
Subject: Re: HA feature - KVM - CloudStack 4.5.1

No problems Somesh, thanks for your help.

Link of log:

https://dl.dropboxusercontent.com/u/6774061/management-server.log.2015-07-16.gz

Luciano

On Fri, Jul 17, 2015 at 12:00 PM, Somesh Naidu <somesh.na...@citrix.com>
wrote:

How large is the management server logs dated 2015-07-16? I would like to
review the logs. All the information I need from that incident should be in
there so I don't need any more testing.

Regards,
Somesh

-----Original Message-----
From: Luciano Castro [mailto:luciano.cas...@gmail.com]
Sent: Friday, July 17, 2015 7:58 AM
To: users@cloudstack.apache.org
Subject: Re: HA feature - KVM - CloudStack 4.5.1

Hi Somesh!









[root@1q2 ~]# zgrep -i -E

'SimpleIvestigator|KVMInvestigator|PingInvestigator|ManagementIPSysVMInvestigator'
/var/log/cloudstack/management/management-server.log.2015-07-16.gz |tail
-5000 > /tmp/management.txt
[root@1q2 ~]# cat /tmp/management.txt
2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.ExtensionRegistry] (main:null)
Registering extension [KVMInvestigator] in [Ha Investigators Registry]
2015-07-16 12:30:45,452 DEBUG [o.a.c.s.l.r.RegistryLifecycle] (main:null)
Registered com.cloud.ha.KVMInvestigator@57ceec9a
2015-07-16 12:30:45,927 DEBUG [o.a.c.s.l.r.ExtensionRegistry] (main:null)
Registering extension [PingInvestigator] in [Ha Investigators Registry]
2015-07-16 12:30:45,928 DEBUG [o.a.c.s.l.r.ExtensionRegistry] (main:null)
Registering extension [ManagementIPSysVMInvestigator] in [Ha Investigators
Registry]
2015-07-16 12:30:53,796 INFO  [o.a.c.s.l.r.DumpRegistry] (main:null)
Registry [Ha Investigators Registry] contains [SimpleInvestigator,
XenServerInvestigator, KVMInv

I  searched  this log before, but as I thought that had not nothing
special.

If you want propose to me another scenario of test, I can do it.

Thanks


On Thu, Jul 16, 2015 at 7:27 PM, Somesh Naidu <somesh.na...@citrix.com>
wrote:

What about other investigators, specifically " KVMInvestigator,
PingInvestigator"? They report the VMs as alive=false too?

Also, it is recommended that you look at the management-sever.log instead
of catalina.out (for one, the latter doesn’t have timestamp).

Regards,
Somesh


-----Original Message-----
From: Luciano Castro [mailto:luciano.cas...@gmail.com]
Sent: Thursday, July 16, 2015 1:14 PM
To: users@cloudstack.apache.org
Subject: Re: HA feature - KVM - CloudStack 4.5.1

Hi Somesh!


thanks for help.. I did again ,and I collected new logs:

My vm_instance name is i-2-39-VM. There was some routers in KVM host 'A'
(this one that I powered off now):


[root@1q2 ~]# grep -i -E 'SimpleInvestigator.*false'
/var/log/cloudstack/management/catalina.out
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-e2f91c9c
work-3)
SimpleInvestigator found VM[DomainRouter|r-4-VM]to be alive? false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-729acf4f
work-7)
SimpleInvestigator found VM[User|i-23-33-VM]to be alive? false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-a66a4941
work-8)
SimpleInvestigator found VM[DomainRouter|r-36-VM]to be alive? false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-5977245e
work-10) SimpleInvestigator found VM[User|i-17-26-VM]to be alive? false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-c7f39be0
work-9)
SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive? false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-3:ctx-ad4f5fda
work-10) SimpleInvestigator found VM[DomainRouter|r-46-VM]to be alive?
false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-0:ctx-0257f5af
work-11) SimpleInvestigator found VM[User|i-4-52-VM]to be alive? false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-7ddff382
work-12) SimpleInvestigator found VM[DomainRouter|r-32-VM]to be alive?
false
INFO  [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-1:ctx-9f79917e
work-13) SimpleInvestigator found VM[User|i-2-39-VM]to be alive? false



KVM  host 'B' agent log (where the machine would be migrate):

2015-07-16 16:58:56,537 INFO  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) Live migration of instance i-2-39-VM
initiated
2015-07-16 16:58:57,540 INFO  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
complete, waited 1000ms
2015-07-16 16:58:58,541 INFO  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
complete, waited 2000ms
2015-07-16 16:58:59,542 INFO  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
complete, waited 3000ms
2015-07-16 16:59:00,543 INFO  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) Waiting for migration of i-2-39-VM to
complete, waited 4000ms
2015-07-16 16:59:01,245 INFO  [kvm.resource.LibvirtComputingResource]
(agentRequest-Handler-4:null) Migration thread for i-2-39-VM is done

It said done for my i-2-39-VM instance, but I can´t ping this host.

Luciano


--
Luciano Castro



Reply via email to