cs 4.2 HA

Valery Ciareszka Tue, 10 Sep 2013 10:32:22 -0700

Hi all.

I'm testing HA and when I power down one node, VM with ha offering enabled
does not start on other node.


env used:
management + 2 hypervisors + nfs server
management/hypervisors: CentOS 6.4 + CS-4.2(revision 2852) + advanced zone
+ KVM

I have the following nullpointerexception in logs:

2013-09-10 15:06:48,583 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Processing HAWork[28-HA-246-Running-Investigating]
2013-09-10 15:06:48,586 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) HA on VM[User|hahaha]
2013-09-10 15:06:48,588 DEBUG [cloud.ha.CheckOnAgentInvestigator]
(HA-Worker-0:work-28) Unable to reach the agent for VM[User|hahaha]:
Resource [Host:4] is unreachable: Host 4: Host with specified id is not in
the right state: Down
2013-09-10 15:06:48,588 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) SimpleInvestigator found VM[User|hahaha]to be alive?
null
2013-09-10 15:06:48,593 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) XenServerInvestigator found VM[User|hahaha]to be
alive? null
2013-09-10 15:06:48,593 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) testing if VM[User|hahaha] is alive
2013-09-10 15:06:48,598 DEBUG [agent.manager.AgentManagerImpl]
(HA-Worker-0:work-28) Host with id null doesn't exist
2013-09-10 15:06:48,598 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) VM[User|hahaha] could not be pinged, returning that
it is unknown
2013-09-10 15:06:48,599 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400199: Sending  { Cmd , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.PingTestCommand":{"_routerIp":"169.254.0.190","_privateIp":"10.10.10.17","wait":20}}]
}
2013-09-10 15:06:52,804 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400199: Received:  { Ans: , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 10, { Answer } }
2013-09-10 15:06:52,804 DEBUG [agent.manager.AgentManagerImpl]
(HA-Worker-0:work-28) Details from executing class
com.cloud.agent.api.PingTestCommand: PING 10.10.10.17 (10.10.10.17): 56
data bytes64 bytes from 10.10.10.222: Destination Host UnreachableVr HL TOS
 Len   ID Flg  off TTL Pro  cks      Src      Dst Data 4  5  00 5400 0000
0 0040  40  01 a711 10.10.10.222  10.10.10.17 --- 10.10.10.17 ping
statistics ---1 packets transmitted, 0 packets received, 100% packet
lossUnable to ping the vm, exiting
2013-09-10 15:06:52,804 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) VM[User|hahaha] could not be pinged, returning that
it is unknown
2013-09-10 15:06:52,804 DEBUG [cloud.ha.UserVmDomRInvestigator]
(HA-Worker-0:work-28) Returning null since we're unable to determine state
of VM[User|hahaha]
2013-09-10 15:06:52,804 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) null found VM[User|hahaha]to be alive? null
2013-09-10 15:06:52,804 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator]
(HA-Worker-0:work-28) Not a System Vm, unable to determine state of
VM[User|hahaha] returning null
2013-09-10 15:06:52,804 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator]
(HA-Worker-0:work-28) Testing if VM[User|hahaha] is alive
2013-09-10 15:06:52,808 DEBUG [cloud.ha.ManagementIPSystemVMInvestigator]
(HA-Worker-0:work-28) Unable to find a management nic, cannot ping this
system VM, unable to determine state of VM[User|hahaha] returning null
2013-09-10 15:06:52,808 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) null found VM[User|hahaha]to be alive? null
2013-09-10 15:06:52,812 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400206: Sending  { Cmd , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.CheckOnHostCommand":{"host":{"guid":"6807c438-876d-3f73-ba01-8ad718fd774d-LibvirtComputingResource","privateNetwork":{"ip":"77.72.128.116","netmask":"255.255.255.240","mac":"00:25:90:36:20:6a","isSecurityGroupEnabled":false},"storageNetwork1":{"ip":"77.72.128.116","netmask":"255.255.255.240","mac":"00:25:90:36:20:6a","isSecurityGroupEnabled":false}},"wait":20}}]
}
2013-09-10 15:06:52,921 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400206: Received:  { Ans: , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 10, { Answer } }
2013-09-10 15:06:52,921 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) KVMInvestigator found VM[User|hahaha]to be alive? null
2013-09-10 15:06:52,921 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Fencing off VM that we don't know the state of
2013-09-10 15:06:52,921 DEBUG [cloud.ha.XenServerFencer]
(HA-Worker-0:work-28) Don't know how to fence non XenServer hosts KVM
2013-09-10 15:06:52,921 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Fencer null returned null
2013-09-10 15:06:52,926 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400207: Sending  { Cmd , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.FenceCommand":{"vmName":"i-2-246-VM","hostGuid":"6807c438-876d-3f73-ba01-8ad718fd774d-LibvirtComputingResource","hostIp":"77.72.128.116","inSeq":false,"wait":0}}]
}
2013-09-10 15:06:53,038 DEBUG [agent.transport.Request]
(HA-Worker-0:work-28) Seq 1-1213400207: Received:  { Ans: , MgmtId:
161332943028, via: 1, Ver: v1, Flags: 10, { FenceAnswer } }
2013-09-10 15:06:53,038 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Fencer KVMFenceBuilder returned true
2013-09-10 15:06:53,046 DEBUG [cloud.capacity.CapacityManagerImpl]
(HA-Worker-0:work-28) VM state transitted from :Running to Stopping with
event: StopRequestedvm's original host id: 4 new host id: 4 host id before
state transition: 4
2013-09-10 15:06:53,048 DEBUG [cloud.vm.UserVmManagerImpl]
(HA-Worker-0:work-28) Collect vm disk statistics from host before stopping
Vm
2013-09-10 15:06:53,052 DEBUG [agent.manager.AgentManagerImpl]
(HA-Worker-0:work-28) Can not send command
com.cloud.agent.api.GetVmDiskStatsCommand due to Host 4 is not up
2013-09-10 15:06:53,054 WARN  [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Unable to stop vm, agent unavailable:
com.cloud.exception.AgentUnavailableException: Resource [Host:4] is
unreachable: Host 4: Host with specified id is not in the right state: Down
2013-09-10 15:06:53,055 WARN  [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Unable to actually stop VM[User|hahaha] but continue
with release because it's a force stop
2013-09-10 15:06:53,058 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) VM[User|hahaha] is stopped on the host.  Proceeding
to release resource held.
2013-09-10 15:06:53,062 DEBUG [cloud.network.NetworkModelImpl]
(HA-Worker-0:work-28) Service SecurityGroup is not supported in the network
id=205
2013-09-10 15:06:53,065 DEBUG [cloud.network.NetworkManagerImpl]
(HA-Worker-0:work-28) Changing active number of nics for network id=205 on
-1
2013-09-10 15:06:53,070 DEBUG [cloud.network.NetworkManagerImpl]
(HA-Worker-0:work-28) Asking VirtualRouter to release
Nic[942-246-9bc94718-8d0d-4463-83c4-7780cdfbe7d9-10.10.10.17]
2013-09-10 15:06:53,070 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Successfully released network resources for the vm
VM[User|hahaha]
2013-09-10 15:06:53,071 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-0:work-28) Successfully released storage resources for the vm
VM[User|hahaha]
2013-09-10 15:06:53,084 DEBUG [cloud.network.NetworkModelImpl]
(HA-Worker-0:work-28) Service SecurityGroup is not supported in the network
id=205
2013-09-10 15:06:53,088 DEBUG [cloud.network.NetworkModelImpl]
(HA-Worker-0:work-28) Service SecurityGroup is not supported in the network
id=205
2013-09-10 15:06:53,096 DEBUG [cloud.capacity.CapacityManagerImpl]
(HA-Worker-0:work-28) VM state transitted from :Stopping to Stopped with
event: OperationSucceededvm's original host..
2013-09-10 15:06:53,114 ERROR [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-0:work-28) Terminating HAWork[28-HA-246-Running-Scheduled]
java.lang.NullPointerException
        at
com.cloud.storage.VolumeManagerImpl.canVmRestartOnAnotherServer(VolumeManagerImpl.java:2641)
        at
com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:516)
        at
com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831)

Is it a bug ?

-- 
Regards,
Valery

http://protocol.by/slayer

cs 4.2 HA

Reply via email to