Re: cs 4.5.1, vm stopped when putting xen hosts in maintenance

Rafael Weingärtner Mon, 30 Jan 2017 15:42:56 -0800

This seems to be one of those nasty bugs that are very hard to debug. It
seems to be cause to a combination of different factors, such as planets
alignments ;)


At class “VirtualMachineManagerImpl” line 2395 (method
“com.cloud.vm.VirtualMachineManagerImpl.orchestrateMigrateAway(String,
long, DeploymentPlanner)”) is the start of the code flow that triggers the
migration. The point here is that, if an error occurs during the migration,
the VM will be stopped; For some reason this is executed only for HA VMs.

Going a little bit further, we find the point that is triggering the error
in the method “com.cloud.vm.VirtualMachineManagerImpl.migrate(VMInstanceVO,
long, DeployDestination)” at line 1941. This line will call the method
“com.cloud.vm.VirtualMachineManagerImpl.changeState(VMInstanceVO, Event,
Long, ItWorkVO, Step)” to change the VM state to migrating/prepare
migration/or something like that (I am guessing here), but for some reason,
the state of the VM is not the one expected and this is causing a
“ConcurrentOperationException” to be thrown. The problem is that the
“com.cloud.vm.VirtualMachineManagerImpl.changeState(VMInstanceVO, Event,
Long, ItWorkVO, Step)” does not seem to provide much useful log entries.

Just to enable us to discuss the problem a little further, could you tell
us a little bit about your environment setting? How many ACS MSs do you
have? What is the VM workload being migrated (I mean, the number of VMs and
computing resources)? Do the other servers have enough computing resources
to support these VMs? Could you also check the XenServer logs, and check
what happened on both, source and destination hosts when the migration for
VM “i-638-1735-VM” was triggered?

Do you have the knowledge/skills to use an IDE (e.g. Eclipse) to connect to
a running JVM and debug a code? Because for me it will not be possible to
go further without looking at the variables in a running environment.


On Mon, Jan 30, 2017 at 4:28 AM, Francois Scheurer <
[email protected]> wrote:

> Dear Rafael
>
>
> Sorry for the late answer (was sick a couple of days).
>
> I confirm we are migrating inside the same xenserver pool (within a CS
> Cluster).
>
>
> Best Regards
> Francois
>
>
>
> On 26.01.2017 04:35, Rafael Weingärtner wrote:
>
>> Found the code that is executed.
>> I need you to confirm if you are migrating within a cluster or across
>> clusters?
>>
>> On Wed, Jan 25, 2017 at 11:57 AM, Rafael Weingärtner <
>> [email protected]> wrote:
>>
>> I do not have access to ACS code right now, but I suggest starting
>>> debugging here:
>>>
>>> 2017-01-24 18:05:28,427 DEBUG [c.c.c.CapacityManagerImpl]
>>>> (Work-Job-Executor-156:ctx-14b86f5e job-60216/job-191380 ctx-175d37df)
>>>> VM state transitted from :Running to Stopping with event:
>>>> StopRequestedvm's
>>>> original host id: 224 new host id: 15 host id before state transition:
>>>> 15
>>>>
>>>> I would try to understand first why ACS requested the hypervisor to stop
>>> the VM. I mean the following, check the source code for conditions that
>>> would make ACS request the shutdown of VMs.
>>> BTW: are you migrating within a cluster or across clusters?
>>>
>>> I will be able to analyze this further only after work hours, so if you
>>> find anything, keep me posted.
>>>
>>> On Wed, Jan 25, 2017 at 11:45 AM, Francois Scheurer <
>>> [email protected]> wrote:
>>>
>>> Hello Rafael
>>>>
>>>>
>>>> I think the important log lines are these:
>>>>
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 INFO
>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) Migration cancelled because state has
>>>> changed: VM[User|i-638-1736-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 DEBUG
>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) Unable to migrate VM due to:
>>>> Migration
>>>> cancelled because state has changed: VM[User|i-638-1736-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,427 DEBUG
>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) VM state transitted from :Running to
>>>> Stopping with event: StopRequestedvm's original host id: 224 new host
>>>> id:
>>>> 15 host id before state transition: 15
>>>>
>>>> The really disturbing thing is that CS stop again the VM if we start it
>>>> manually, apparently because the migratingjob is still running.
>>>>
>>>> Note the the VM HA Flag is enabled.
>>>>
>>>>
>>>>
>>>> Thank you.
>>>>
>>>>
>>>> Best Regards.
>>>>
>>>> Francois
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> more details:
>>>>
>>>>
>>>>
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,148 DEBUG
>>>> [o.a.c.e.o.VolumeOrchestrator] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) Preparing 3 volumes for
>>>> VM[User|i-638-1736-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,155 DEBUG
>>>> [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-122:ctx-a8bacc64
>>>> job-14953/job-191351) Done with run of VM work job:
>>>> com.cloud.vm.VmWorkMigrateAway for VM 1411, job origin: 14953
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,155 DEBUG
>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-122:ctx-a8bacc64
>>>> job-14953/job-191351) Done executing com.cloud.vm.VmWorkMigrateAway for
>>>> job-191351
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,160 INFO
>>>> [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-122:ctx-a8bacc64
>>>> job-14953/job-191351) Remove job-191351 from job monitoring
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,168 INFO
>>>> [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f3bcf43c
>>>> work-1989)
>>>> Completed HAWork[1989-Migration-1411-Running-Migrating]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,246 DEBUG
>>>> [c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) Seq 19-8526158519542516375:
>>>> Forwarding
>>>> Seq 19-8526158519542516375:  { Cmd , MgmtId: 345049103441, via:
>>>> 19(ewcstack-vh010-prod), Ver: v1, Flags: 100111, [{
>>>> "com.cloud.agent.api.PrepareForMigrationCommand":{"vm":{"id"
>>>> :1736,"name":"i-638-1736-VM","bootloader":"PyGrub","type":"U
>>>> ser","cpus":4,"minSpeed":2100,"maxSpeed":2100,"minRam":17179
>>>> 869184,"maxRam":17179869184,"arch":"x86_64","os":"Other
>>>> (64-bit)","platformEmulator":"Other install
>>>> media","bootArgs":"","enable
>>>> HA":true,"limitCpuUse":true,"enableDynamicallyScaleVm":true,
>>>> "vncPassword":"pC1WUC7h1DBH9J36q3H9pg==","params":{"memoryOv
>>>> ercommitRatio":"1.0","platform":"viridian:true;acpi:1;apic:t
>>>> rue;pae:true;nx:true;timeoffset:-3601","keyboard":"us","Mess
>>>> age.ReservedCapacityFreed.Flag":"false","cpuOvercommitRatio":"4.0","
>>>> hypervisortoolsversion":"xenserver61"},"uuid":"5bff01de-c033
>>>> -4925-9e47-30dd14539272","disks":[{"data":{"org.apache.cloud
>>>> stack.storage.to.VolumeObjectTO":{"uuid":"1ab514b2-f2de-
>>>> 4d85-8e0d-9768184a1349","volumeType":"ROOT","dataStore":{"org.
>>>> apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"b8751b89-00f
>>>> 6-5079-ed7a-c073579c7563","id":64,"poolType":"PreSetup","hos
>>>> t":"localhost","path":"/b8751b89-00f6-5079-ed7a-c073579c7563
>>>> ","port":0,"url":"PreSetup://localhost/b8751b89-00f6-5079-ed
>>>> 7a-c073579c7563/?ROLE=Primary&STOREUUID=b8751b89-00f6-5079-
>>>> ed7a-c073579c7563"}},"name":"ROOT-1736","size":53687091200,"path":"d3
>>>> f87e3c-1efe-42ad-aa93-609d5b980a34","volumeId":8655,"vmName"
>>>> :"i-638-1736-VM","accountId":638,"format":"VHD","provisionin
>>>> gType":"THIN","id":8655,"deviceId":0,"hypervisorType":"
>>>> XenServer"}},"diskSeq":0,"path":"d3f87e3c-1efe-42ad-aa93
>>>> -609d5b980a34","type":"ROOT","_details":{"managed":"false","
>>>> storagePort":"0",
>>>> "storageHost":"localhost","volumeSize":"53687091200"}},{"dat
>>>> a":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid
>>>> ":"733df9dc-5e24-4ab9-b22b-bce78322428e","volumeType":"DATAD
>>>> ISK","dataStore":{"org.apache.cloudstack.storage.to.PrimaryD
>>>> ataStoreTO":{"uuid":"b8751b89-00f6-5079-ed7a-c073579c7563","id":64,"
>>>> poolType":"PreSetup","host":"localhost","path":"/b8751b89-00
>>>> f6-5079-ed7a-c073579c7563","port":0,"url":"PreSetup://localh
>>>> ost/b8751b89-00f6-5079-ed7a-c073579c7563/?ROLE=Primary&STORE
>>>> UUID=b8751b89-00f6-5079-ed7a-c073579c7563"}},"name":"WELLA-
>>>> DB1-NCSA_D","size":268435456000,"path":"265f168a-bb92-4591-a32b-ee3e2
>>>> 2c8cb68","volumeId":8656,"vmName":"i-638-1736-VM","accountId
>>>> ":638,"format":"VHD","provisioningType":"THIN","id":8656,"
>>>> deviceId":1,"hypervisorType":"XenServer"}},"diskSeq":1,"
>>>> path":"265f168a-bb92-4591-a32b-ee3e22c8cb68","type":"DATADIS
>>>> K","_details":{"managed":"false","storagePort":"0","stor
>>>> ageHost":"localhos
>>>> t","volumeSize":"268435456000"}},{"data":{"org.apache.clouds
>>>> tack.storage.to.VolumeObjectTO":{"uuid":"bda6f335-b8a3-479c-
>>>> 8c0d-41b344d2a50d","volumeType":"DATADISK","dataStore":{"org.
>>>> apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid"
>>>> :"09df3ca6-0a20-e4f0-3d79-12a6ed67a13d","id":89,"poolType":"PreSetup","
>>>> host":"localhost","path":"/09df3ca6-0a20-e4f0-3d79-12a6ed67a
>>>> 13d","port":0,"url":"PreSetup://localhost/09df3ca6-0a20-e4f0
>>>> -3d79-12a6ed67a13d/?ROLE=Primary&STOREUUID=09df3ca6-0a20-
>>>>
>>>> e4f0-3d79-12a6ed67a13d"}},"name":"WELLA-DB1-NCSA_E","size
>>>> ":429496729600,"path":"612cb586-a70c-4896-b120-062ee7a67894"
>>>> ,"volumeId":86
>>>> 57,"vmName":"i-638-1736-VM","accountId":638,"format":"VHD","
>>>> provisioningType":"THIN","id":8657,"deviceId":2,"hypervisorT
>>>> ype":"XenServer"}},"diskSeq":2,"path":"612cb586-a70c-4896-b1
>>>> 20-062ee7a67894","type":"DATADISK","_details":{"managed":"
>>>> false","storagePort":"0","storageHost":"localhost","volumeSize":"42949
>>>> 6729600"}},{"data":{"org.apache.cloudstack.storage.to.Templa
>>>> teObjectTO":{"uuid":"0e5bf4ff-caca-4b08-abed-7bbcdf3c42ef","
>>>> id":201,"format":"ISO","accountId":1,"hvm":true,"displayText
>>>> ":"xen-pv-drv-iso","name":"xs-tools.iso","guestOsType":"CentOS 4.5
>>>> (32-bit)","hypervisorType":"XenServer"}},"diskSeq":3,"type":"
>>>> ISO"}],"nics":[{"deviceId":0,"networkRateMbps":1000,"default
>>>> Nic":true,"pxeDisable":false,"nicUuid":"3523c3d3-9f92-4076-a
>>>> d99-f8864d0f0edd","uuid":"e6fbb257-16e1-415f-88fb-9a79b263b3
>>>> a8","ip":"192.168.1.42","netmask":"255.255.255.0","
>>>> gateway":"192.168.1.1","mac":"02:00:0c:ae:00:0f","dns1":"8.8
>>>> .8.8","dns2":"8.8
>>>> .4.4","broadcastType":"Vlan","type":"Guest","broadcastUri":"
>>>> vlan://2285","isolationUri":"vlan://2285","isSecurityGroupEn
>>>> abled":false,"name":"bond0"}],"vcpuMaxLimit":16},"wait":0}}] } to
>>>> 345049098498
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG [c.c.a.t.Request]
>>>> (AgentManager-Handler-10:null) Seq 19-8526158519542516375: Processing:
>>>> {
>>>> Ans: , MgmtId: 345049103441, via: 19, Ver: v1, Flags: 110,
>>>> [{"com.cloud.agent.api.PrepareForMigrationAnswer":{"result":
>>>> true,"wait":0}}]
>>>> }
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG
>>>> [c.c.a.m.AgentAttache]
>>>> (AgentManager-Handler-10:null) Seq 19-8526158519542516375: No more
>>>> commands
>>>> found
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG [c.c.a.t.Request]
>>>> (Work-Job-Executor-156:ctx-14b86f5e job-60216/job-191380 ctx-175d37df)
>>>> Seq 19-8526158519542516375: Received:  { Ans: , MgmtId: 345049103441,
>>>> via:
>>>> 19, Ver: v1, Flags: 110, { PrepareForMigrationAnswer } }
>>>>
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 INFO
>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) Migration cancelled because state has
>>>> changed: VM[User|i-638-1736-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,407 DEBUG
>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) Unable to migrate VM due to:
>>>> Migration
>>>> cancelled because state has changed: VM[User|i-638-1736-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,427 DEBUG
>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) VM state transitted from :Running to
>>>> Stopping with event: StopRequestedvm's original host id: 224 new host
>>>> id:
>>>> 15 host id before state transition: 15
>>>>
>>>> ewcstack-man02-prod: 2017-01-24 18:05:28,438 DEBUG
>>>> [c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-156:ctx-14b86f5e
>>>> job-60216/job-191380 ctx-175d37df) Seq 15-6889944479923437643:
>>>> Forwarding
>>>> Seq 15-6889944479923437643:  { Cmd , MgmtId: 345049103441, via:
>>>> 15(ewcstack-vh011-prod), Ver: v1, Flags: 100011,
>>>> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"execut
>>>> eInSequence":false,"checkBeforeCleanup":false,"vmName":"i-
>>>> 638-1736-VM","wait":0}}]
>>>> } to 345049098498
>>>> ewcstack-man02-prod: 2017-01-24 18:05:30,335 DEBUG
>>>> [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-1:null) SeqA
>>>> 254-800498:
>>>> Processing Seq 254-800498:  { Cmd , MgmtId: -1, via: 254, Ver: v1,
>>>> Flags:
>>>> 11, [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_
>>>> proxyVmId":1530,"_loadInfo":"{\n
>>>> \"connections\": []\n}","wait":0}}] }
>>>> ewcstack-man02-prod: 2017-01-24 18:05:30,340 DEBUG
>>>> [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-1:null) SeqA
>>>> 254-800498:
>>>> Sending Seq 254-800498:  { Ans: , MgmtId: 345049103441, via: 254, Ver:
>>>> v1,
>>>> Flags: 100010, [{"com.cloud.agent.api.AgentCo
>>>> ntrolAnswer":{"result":true,"wait":0}}] }
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,447 DEBUG [c.c.a.t.Request]
>>>> (AgentManager-Handler-11:null) Seq 15-6889944479923437632: Processing:
>>>> {
>>>> Ans: , MgmtId: 345049103441, via: 15, Ver: v1, Flags: 10,
>>>> [{"com.cloud.agent.api.StopAnswer":{"platform":"viridian:tru
>>>> e;acpi:1;apic:true;pae:true;nx:true;timeoffset:3601","result
>>>> ":true,"details":"Stop
>>>> VM i-638-1735-VM Succeed","wait":0}}] }
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,447 DEBUG [c.c.a.t.Request]
>>>> (Work-Job-Executor-115:ctx-0d5db528 job-13733/job-191379 ctx-bbabfeae)
>>>> Seq 15-6889944479923437632: Received:  { Ans: , MgmtId: 345049103441,
>>>> via:
>>>> 15, Ver: v1, Flags: 10, { StopAnswer } }
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,470 DEBUG
>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) VM[User|i-638-1735-VM] is stopped on
>>>> the
>>>> host. Proceeding to release resource held.
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,475 DEBUG
>>>> [c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not
>>>> supported
>>>> in the network id=519
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,478 DEBUG
>>>> [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Changing active number of nics for
>>>> network id=519 on -1
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG
>>>> [o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Asking VpcVirtualRouter to release
>>>> NicProfile[17403-1735-3364083b-dcc0-4eb9-a7f9-f3e85bf8c7d3-1
>>>> 92.168.1.11-null
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG
>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Successfully released network
>>>> resources
>>>> for the vm VM[User|i-638-1735-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG
>>>> [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Successfully released storage
>>>> resources
>>>> for the vm VM[User|i-638-1735-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,496 DEBUG
>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) VM state transitted from :Stopping to
>>>> Stopped with event: OperationSucceededvm's original host id: 15 new host
>>>> id: null host id before state transition: 15
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Hosts's actual total CPU: 82976 and
>>>> CPU
>>>> after applying overprovisioning: 331904
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Hosts's actual total RAM:
>>>> 261961927808
>>>> and RAM after applying overprovisioning: 261961924608
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) release cpu from host: 15, old used:
>>>> 71300,reserved: 2100, actual total: 82976, total with overprovisioning:
>>>> 331904; new used: 67100,reserved:6300; movedfromreserved:
>>>> false,moveToReserveredtrue
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
>>>> [c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) release mem from host: 15, old used:
>>>> 98784247808,reserved: 1073741824, total: 261961924608; new used:
>>>> 94489280512,reserved:5368709120 <(53)%206870-9120>; movedfromreserved:
>>>>
>>>> false,moveToReserveredtrue
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,538 DEBUG
>>>> [c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not
>>>> supported
>>>> in the network id=519
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,557 DEBUG
>>>> [c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not
>>>> supported
>>>> in the network id=519
>>>>
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,588 ERROR
>>>> [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Invocation exception, caused by:
>>>> com.cloud.utils.exception.CloudRuntimeException: Unable to migrate
>>>> VM[User|i-638-1735-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,589 INFO
>>>> [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379 ctx-bbabfeae) Rethrow exception
>>>> com.cloud.utils.exception.CloudRuntimeException: Unable to migrate
>>>> VM[User|i-638-1735-VM]
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,589 DEBUG
>>>> [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Done with run of VM work job:
>>>> com.cloud.vm.VmWorkMigrateAway for VM 1735, job origin: 13733
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,589 ERROR
>>>> [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Unable to complete AsyncJobVO {id:191379, userId:
>>>> 1,
>>>> accountId: 1, instanceType: null, instanceId: null, cmd:
>>>> com.cloud.vm.VmWorkMigrateAway, cmdInfo: rO0ABXNyAB5jb20uY2xvdWQudm0uVm
>>>> 1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb2
>>>> 0uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2
>>>> VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZz
>>>> t4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAGx3QAGVZpcnR1YWxNYWNoaW5lTW
>>>> FuYWdlckltcGwAAAAAAAAA4A,
>>>> cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
>>>> result: null, initMsid: 345049098498, completeMsid: null, lastUpdated:
>>>> null, lastPolled: null, created: Tue Jan 24 18:05:00 CET 2017}, job
>>>> origin:13733
>>>> ewcstack-man02-prod: com.cloud.utils.exception.CloudRuntimeException:
>>>> Unable to migrate VM[User|i-638-1735-VM]
>>>> ewcstack-man02-prod:    at com.cloud.vm.VirtualMachineMan
>>>> agerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:2405)
>>>> ewcstack-man02-prod:    at com.cloud.vm.VirtualMachineMan
>>>> agerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:4517)
>>>> ewcstack-man02-prod:    at sun.reflect.GeneratedMethodAcc
>>>> essor402.invoke(Unknown
>>>> Source)
>>>> ewcstack-man02-prod:    at sun.reflect.DelegatingMethodAc
>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> ewcstack-man02-prod:    at java.lang.reflect.Method.invok
>>>> e(Method.java:606)
>>>> ewcstack-man02-prod:    at com.cloud.vm.VmWorkJobHandlerP
>>>> roxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
>>>> ewcstack-man02-prod:    at com.cloud.vm.VirtualMachineMan
>>>> agerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4636)
>>>> ewcstack-man02-prod:    at com.cloud.vm.VmWorkJobDispatch
>>>> er.runJob(VmWorkJobDispatcher.java:103)
>>>> ewcstack-man02-prod:    at org.apache.cloudstack.framewor
>>>> k.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManag
>>>> erImpl.java:537)
>>>> ewcstack-man02-prod:    at org.apache.cloudstack.managed.
>>>> context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>>>> ewcstack-man02-prod:    at org.apache.cloudstack.managed.
>>>> context.impl.DefaultManagedContext$1.call(DefaultManagedCont
>>>> ext.java:56)
>>>> ewcstack-man02-prod:    at org.apache.cloudstack.managed.
>>>> context.impl.DefaultManagedContext.callWithContext(DefaultMa
>>>> nagedContext.java:103)
>>>> ewcstack-man02-prod:    at org.apache.cloudstack.managed.
>>>> context.impl.DefaultManagedContext.runWithContext(DefaultMan
>>>> agedContext.java:53)
>>>> ewcstack-man02-prod:    at org.apache.cloudstack.managed.
>>>> context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>>>> ewcstack-man02-prod:    at org.apache.cloudstack.framewor
>>>> k.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
>>>> ewcstack-man02-prod:    at java.util.concurrent.Executors
>>>> $RunnableAdapter.call(Executors.java:471)
>>>> ewcstack-man02-prod:    at java.util.concurrent.FutureTas
>>>> k.run(FutureTask.java:262)
>>>> ewcstack-man02-prod:    at java.util.concurrent.ThreadPoo
>>>> lExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> ewcstack-man02-prod:    at java.util.concurrent.ThreadPoo
>>>> lExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> ewcstack-man02-prod:    at java.lang.Thread.run(Thread.java:745)
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,590 DEBUG
>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Complete async job-191379, jobStatus: FAILED,
>>>> resultCode: 0, result: rO0ABXNyAC9jb20uY2xvdWQudXRpbH
>>>> MuZXhjZXB0aW9uLkNsb3VkUnVudGltZUV4Y2VwdGlvbgAAAABWTU9yAwABSQ
>>>> ALY3NFcnJvckNvZGV4cgAaamF2YS5sYW5nLlJ1bnRpbWVFeGNlcHRpb26eXw
>>>> ZHCjSD5QIAAHhyABNqYXZhLmxhbmcuRXhjZXB0aW9u0P0fPho7HMQCAAB4cg
>>>> ATamF2YS5sYW5nLlRocm93YWJsZdXGNSc5d7jLAwAETAAFY2F1c2V0ABVMam
>>>> F2YS9sYW5nL1Rocm93YWJsZTtMAA1kZXRhaWxNZXNzYWdldAASTGphdmEvbG
>>>> FuZy9TdHJpbmc7WwAKc3RhY2tUcmFjZXQAHltMamF2YS9sYW5nL1N0YWNrVH
>>>> JhY2VFbGVtZW50O0wAFHN1cHByZXNzZWRFeGNlcHRpb25zdAAQTGphdmEvdX
>>>> RpbC9MaXN0O3hwcQB-AAh0AChVbmFibGUgdG8gbWlncmF0ZSBWTVtVc2VyfG
>>>> ktNjM4LTE3MzUtVk1ddXIAHltMamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW
>>>> 50OwJGKjw8_SI5AgAAeHAAAAAUc3IAG2phdmEubGFuZy5TdGFja1RyYWNlRW
>>>> xlbWVudGEJxZomNt2FAgAESQAKbGluZU51bWJlckwADmRlY2xhcmluZ0NsYX
>>>> NzcQB-AAVMAAhmaWxlTmFtZXEAfgAFTAAKbWV0aG9kTmFtZXEAfgAFeHAAAA
>>>> lldAAmY29tLmNsb3VkLnZtLlZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGx0AB
>>>> 5WaXJ0dWFsTWFjaGluZU1hbmFnZXJJbXBsLmphdmF0ABZvcmNoZXN0cmF0ZU
>>>> 1pZ3JhdGVBd2F5c3EAfgAMAAARpXEAfgAOcQB-AA9xAH4AEHNxAH4ADP____
>>>> 90ACZzdW4ucmVmbGVjdC5HZW5lcmF0ZWRNZXRob2RBY2Nlc3NvcjQwMnB0AA
>>>> ZpbnZva2VzcQB-AAwAAAArdAAoc3VuLnJlZmxlY3QuRGVsZWdhdGluZ01ldG
>>>> hvZEFjY2Vzc29ySW1wbHQAIURlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcG
>>>> wuamF2YXEAfgAUc3EAfgAMAAACXnQAGGphdmEubGFuZy5yZWZsZWN0Lk1ldG
>>>> hvZHQAC01ldGhvZC5qYXZhcQB-ABRzcQB-AAwAAABrdAAiY29tLmNsb3VkLn
>>>> ZtLlZtV29ya0pvYkhhbmRsZXJQcm94eXQAGlZtV29ya0pvYkhhbmRsZXJQcm
>>>> 94eS5qYXZhdAAPaGFuZGxlVm1Xb3JrSm9ic3EAfgAMAAASHHEAfgAOcQB-
>>>> AA9xAH4AHnNxAH4ADAAAAGd0ACBjb20uY2xvdWQudm0uVm1Xb3JrSm9iRGlz
>>>> cGF0Y2hlcnQAGFZtV29ya0pvYkRpc3BhdGNoZXIuamF2YXQABnJ1bkpvYnNx
>>>> AH4ADAAAAhl0AD9vcmcuYXBhY2hlLmNsb3Vkc3RhY2suZnJhbWV3b3JrLmpv
>>>> YnMuaW1wbC5Bc3luY0pvYk1hbmFnZXJJbXBsJDV0ABhBc3luY0pvYk1hbmFn
>>>> ZXJJbXBsLmphdmF0AAxydW5JbkNvbnRleHRzcQB-AAwAAAAxdAA-b3JnLmFw
>>>> YWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29udGV4
>>>> dFJ1bm5hYmxlJDF0ABtNYW5hZ2VkQ29udGV4dFJ1bm5hYmxlLmphdmF0AANy
>>>> dW5zcQB-AAwAAAA4dABCb3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQu
>>>> Y29udGV4dC5pbXBsLkRlZmF1bHRNYW5hZ2VkQ29udGV4dCQxdAAaRGVmYXVs
>>>> dE1hbmFnZWRDb250ZXh0LmphdmF0AARjYWxsc3EAfgAMAAAAZ3QAQG9yZy5h
>>>> cGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuaW1wbC5EZWZhdWx0
>>>> TWFuYWdlZENvbnRleHRxAH4ALnQAD2NhbGxXaXRoQ29udGV4dHNxAH4ADAAA
>>>> ADVxAH4AMXEAfgAudAAOcnVuV2l0aENvbnRleHRzcQB-AAwAAAAudAA8b3Jn
>>>> LmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29u
>>>> dGV4dFJ1bm5hYmxlcQB-ACpxAH4AK3NxAH4ADAAAAe5xAH4AJXEAfgAmcQB-
>>>> ACtzcQB-AAwAAAHXdAAuamF2YS51dGlsLmNvbmN1cnJlbnQuRXhlY3V0b3Jz
>>>> JFJ1bm5hYmxlQWRhcHRlcnQADkV4ZWN1dG9ycy5qYXZhcQB-AC9zcQB-AAwA
>>>> AAEGdAAfamF2YS51dGlsLmNvbmN1cnJlbnQuRnV0dXJlVGFza3QAD0Z1dHVy
>>>> ZVRhc2suamF2YXEAfgArc3EAfgAMAAAEeXQAJ2phdmEudXRpbC5jb25jdXJy
>>>> ZW50LlRocmVhZFBvb2xFeGVjdXRvcnQAF1RocmVhZFBvb2xFeGVjdXRvci5q
>>>> YXZhdAAJcnVuV29ya2Vyc3EAfgAMAAACZ3QALmphdmEudXRpbC5jb25jdXJy
>>>> ZW50LlRocmVhZFBvb2xFeGVjdXRvciRXb3JrZXJxAH4AQHEAfgArc3EAfgAM
>>>> AAAC6XQAEGphdmEubGFuZy5UaHJlYWR0AAtUaHJlYWQuamF2YXEAfgArc3IA
>>>> JmphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVMaXN0_A8lMbXs
>>>> jhACAAFMAARsaXN0cQB-AAd4cgAsamF2YS51dGlsLkNvbGxlY3Rpb25zJFVu
>>>> bW9kaWZpYWJsZUNvbGxlY3Rpb24ZQgCAy173HgIAAUwAAWN0ABZMamF2YS91
>>>> dGlsL0NvbGxlY3Rpb247eHBzcgATamF2YS51dGlsLkFycmF5TGlzdHiB0h2Z
>>>> x2GdAwABSQAEc2l6ZXhwAAAAAHcEAAAAAHhxAH4ATHgAABCadwgAAAAAAAAAAHg
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG
>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Publish async job-191379 complete on message bus
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG
>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Wake up jobs related to job-191379
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG
>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Update db status for job-191379
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,594 DEBUG
>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Wake up jobs joined with job-191379 and disjoin
>>>> all
>>>> subjobs created from job- 191379
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,604 DEBUG
>>>> [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Done executing com.cloud.vm.VmWorkMigrateAway for
>>>> job-191379
>>>> ewcstack-man02-prod: 2017-01-24 18:05:32,608 INFO
>>>> [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-115:ctx-0d5db528
>>>> job-13733/job-191379) Remove job-191379 from job monitoring
>>>>
>>>>
>>>>
>>>> On 25.01.2017 15:53, Rafael Weingärtner wrote:
>>>>
>>>> Did you check the ACS log files looking for some exceptions when the
>>>>> migrations for this specific VM started?
>>>>>
>>>>> On Wed, Jan 25, 2017 at 9:49 AM, Francois Scheurer <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Dear CS contributors
>>>>>
>>>>>>
>>>>>> We use CS 4.5.1 on a 3 Clusters with XenServer 6.5. and shared primary
>>>>>> storage.
>>>>>>
>>>>>> When we put a host in maintenance, cs evacuates all VM's on other
>>>>>> hosts.
>>>>>>
>>>>>> But it happens regularly that some VM's are stopped instead of live
>>>>>> migrated.
>>>>>>
>>>>>> If we restart that stopped VM's, CS will then stop them again later.
>>>>>>
>>>>>> We need to start them several times until they stays up.
>>>>>>
>>>>>> Is it a known issue? Is it fixed on CS 4.9.2 ?
>>>>>>
>>>>>>
>>>>>> Migrating manually all VM's is working but impracticable because too
>>>>>> much
>>>>>> time consuming.
>>>>>>
>>>>>>
>>>>>> Many thanks in advance for your help.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best Regards
>>>>>> Francois
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> EveryWare AG
>>>>>> François Scheurer
>>>>>> Senior Systems Engineer
>>>>>> Zurlindenstrasse 52a
>>>>>> CH-8003 Zürich
>>>>>>
>>>>>> tel: +41 44 466 60 00
>>>>>> fax: +41 44 466 60 10
>>>>>> mail: [email protected]
>>>>>> web: http://www.everyware.ch
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>
>>>>
>>>> EveryWare AG
>>>> François Scheurer
>>>> Senior Systems Engineer
>>>> Zurlindenstrasse 52a
>>>> CH-8003 Zürich
>>>>
>>>> tel: +41 44 466 60 00
>>>> fax: +41 44 466 60 10
>>>> mail: [email protected]
>>>> web: http://www.everyware.ch
>>>>
>>>>
>>>>
>>> --
>>> Rafael Weingärtner
>>>
>>>
>>
>>
> --
>
>
> EveryWare AG
> François Scheurer
> Senior Systems Engineer
> Zurlindenstrasse 52a
> CH-8003 Zürich
>
> tel: +41 44 466 60 00
> fax: +41 44 466 60 10
> mail: [email protected]
> web: http://www.everyware.ch
>



-- 
Rafael Weingärtner

Re: cs 4.5.1, vm stopped when putting xen hosts in maintenance

Reply via email to