Re: cs 4.5.1, vm stopped when putting xen hosts in maintenance

Francois Scheurer Tue, 31 Jan 2017 05:53:50 -0800

Dear Rafael


Many thanks for your deep analysis.
Nice humor with the planetsalignments BTW ;)

We also suspected the HA Flag to be related this issue, as we never got this issue in the past, until we enabled the HA on all VMs and Offerings.


Here is also the workers list:

mysql> select * from op_ha_work;
+------+-------------+-----------+--------------------+----------+----------------+---------+---------------------+-------+---------------------+-------+-------------+---------+

+------+-------------+-----------+--------------------+----------+----------------+---------+---------------------+-------+---------------------+-------+-------------+---------+

| 105 | 343 | HA | User | Running | 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056013 | 3 | | 109 | 407 | HA | ConsoleProxy | Running | 345049098498 | 33 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056013 | 3 | | 113 | 407 | HA | ConsoleProxy | Running | 345049098498 | 33 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056013 | 3 | | 117 | 407 | HA | ConsoleProxy | Running | 345049098498 | 33 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056013 | 3 | | 121 | 343 | HA | User | Running | 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056013 | 3 | | 125 | 405 | HA | SecondaryStorageVm | Running | 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056013 | 3 | | 129 | 393 | HA | User | Running | 345049098498 | 25 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056014 | 30 | | 133 | 405 | HA | SecondaryStorageVm | Running | 345049098498 | 29 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056014 | 3 | | 137 | 402 | HA | ConsoleProxy | Running | 345049098498 | 25 | 2015-03-28 03:19:03 | 2 | 2015-03-28 03:30:04 | Error | 1394056014 | 3 | | 345 | 402 | HA | ConsoleProxy | Starting | 345049098498 | 25 | 2015-07-06 13:09:26 | 1 | 2015-07-06 13:09:26 | Error | 1402527506 | 770 | | 349 | 687 | HA | SecondaryStorageVm | Starting | 345049098498 | 25 | 2015-07-06 13:09:26 | 1 | 2015-07-06 13:09:26 | Error | 1402527506 | 2 | | 2057 | 1178 | Migration | User | Running | 345049098498 | 33 | 2017-01-31 07:56:08 | 0 | 2017-01-31 07:56:08 | Done | 1451024774 | 95 | | 2058 | 1736 | Migration | User | Running | 345049098498 | 224 | 2017-01-31 07:56:08 | 0 | 2017-01-31 07:56:08 | Done | 1451024774 | 45 | | 2059 | 1240 | Migration | User | Running | 345049098498 | 7 | 2017-01-31 07:56:08 | 0 | 2017-01-31 07:56:08 | Done | 1451024774 | 61 | | 2060 | 1178 | Migration | User | Running | 345049098122 | 33 | 2017-01-31 07:56:08 | 1 | 2017-01-31 08:06:37 | Done | 1451025374 | 95 | | 2061 | 1178 | Migration | User | Stopped | NULL | 33 | 2017-01-31 07:56:09 | 5 | NULL | Done | 1451027900 | 109 | | 2062 | 1690 | Migration | User | Running | 345049098498 | 224 | 2017-01-31 07:56:09 | 0 | 2017-01-31 07:56:09 | Done | 1451024774 | 109 | | 2063 | 1177 | Migration | User | Running | 345049098498 | 29 | 2017-01-31 07:56:09 | 0 | 2017-01-31 07:56:09 | Done | 1451024774 | 82 | | 2064 | 1690 | Migration | User | Running | 345049098122 | 224 | 2017-01-31 07:56:09 | 1 | 2017-01-31 07:58:18 | Done | 1451024900 | 109 | | 2065 | 1240 | Migration | User | Running | 345049098122 | 7 | 2017-01-31 07:56:09 | 1 | 2017-01-31 07:58:18 | Done | 1451024900 | 61 | | 2066 | 1736 | Migration | User | Running | 345049098122 | 224 | 2017-01-31 07:56:09 | 1 | 2017-01-31 07:58:18 | Done | 1451024900 | 45 | | 2067 | 1240 | Migration | User | Stopped | NULL | 7 | 2017-01-31 07:56:09 | 5 | NULL | Done | 1451028003 | 80 | | 2068 | 1178 | Migration | User | Running | 345049098122 | 33 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | Done | 1451024781 | 95 | | 2069 | 1736 | Migration | User | Running | 345049098122 | 224 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | Done | 1451024781 | 45 | | 2070 | 1240 | Migration | User | Running | 345049098122 | 7 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | Done | 1451024781 | 61 | | 2071 | 1177 | Migration | User | Running | 345049098122 | 29 | 2017-01-31 07:56:16 | 0 | 2017-01-31 07:56:16 | Done | 1451024781 | 82 | | 2072 | 1177 | Migration | User | Stopped | NULL | 29 | 2017-01-31 07:56:16 | 5 | NULL | Done | 1451027973 | 101 | | 2073 | 1177 | Migration | User | Running | 345049098122 | 29 | 2017-01-31 07:56:16 | 1 | 2017-01-31 07:58:22 | Done | 1451024904 | 82 | | 2074 | 1690 | Migration | User | Stopped | NULL | 224 | 2017-01-31 07:56:16 | 5 | NULL | Done | 1451027464 | 123 | | 2075 | 1736 | Migration | User | Stopped | NULL | 224 | 2017-01-31 07:56:16 | 5 | NULL | Done | 1451027477 | 59 |

+------+-------------+-----------+--------------------+----------+----------------+---------+---------------------+-------+---------------------+-------+-------------+---------+

If we put a host in maintenance, it may unfortunately happen that a timeout 
occurs and CS aborts the migration.
As far as we can interpret the logs, it seems (we speculate) that CS detects that the VM 
is then still running on the original host, which is in maintenance, and decides somehow 
to stop it because of the "host not up" status.
We then need to start several times the VM, until CS gives up the stopping.

After restarting the MSs I guess that a similar issue may happen: the hosts are 
disconnected during the early stage and then reconnects. Maybe CS then again stops the 
VMs because the "host not up" status?
Cf. 2017-01-30 08:44:06,157 DEBUG [c.c.d.DeploymentPlanningManagerImpl] 
(Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) The last 
host of this VM is not UP or is not enabled, host status is: Disconnected, host 
resource state is: Enabled

But the problem after the MSs restart is definitely only impacting VMs listed in the Table op_ha_work.



To your questions:
        We have 3 ACS MSs managing several Zones. Each zone has a single pod 
and cluster.
        The Cluster of i-638-1735-VM has this summary:
         Total number of VMs: 121
         Total number hypervisors: 4
         Total allocated RAM: 522.0 / 972 GB (53.7 %)
         Total allocated cores: 211
         Total allocated storage: 11285 GB
        Used Memory on the hosts is now varying from 28 to 77%, so there is 
enough RAM.
        I did not check the XS logs. The same Problem has arisen this morning 
again (shortly after a restart of all MSs), so I will look in the XS logs now.

        I have some fair experience with Eclipse and Java, I will ask my boss 
if we can go this way.

The main issue for now, is that restarting the MSs is reproducing the problem, 
which is quite annoying for the impacted VM's.
I thought to stop the MSs, delete all HA/Migration entries in op_ha_work and 
start the MSs.
Is that a viable option? (or should I delete the Migration entries only? or 
never touch this table?)
We will also very probably remove the HA Flag on all VMs if it does not work 
reliably.

Best Regards
Francois Scheurer



Here are new logs regarding the same issue of today.
I tried to get the important parts related to the stop of i-189-1178-VM.

2017-01-30 08:44:04,440 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Executing AsyncJobVO {id:192989, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZU F3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAEmnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAHQ, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resu ltCode: 0, result: null, initMsid: 345049103441, completeMsid: null, lastUpdated: null, lastPolled: null, created: Mon Jan 30 08:44:03 CET 2017} 2017-01-30 08:44:04,440 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Run VM work job: com.cloud.vm.VmWorkMigrateAway for VM 1178, job origin: 15393

2017-01-30 08:44:05,217 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Deploy avoids pods: [], clusters: [], hosts: [29] 2017-01-30 08:44:05,220 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentConnectTaskPool-4:ctx-5b61f2e5) create ClusteredAgentAttache for 232 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) DeploymentPlanner allocation algorithm: com.cloud.deploy.UserDispersingPlanner@1734b2e2 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Trying to allocate a host and storage pools from dc:17, pod:25,cluster:25, requested cpu: 4200, requested ram: 4294967296 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Is ROOT volume READY (pool already allocated)?: Yes 2017-01-30 08:44:05,220 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) This VM has last host_id specified, trying to choose the same host: 25

2017-01-30 08:44:05,614 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The last host of this VM is not UP or is not enabled, host status is: Disconnected, host resource state is: Enabled 2017-01-30 08:44:05,614 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Cannot choose the last host to deploy this VM 2017-01-30 08:44:05,614 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Searching resources only under specified Cluster: 25

2017-01-30 08:44:05,777 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea FirstFitRoutingAllocator) Found 0 hosts for allocation after prioritization: [] 2017-01-30 08:44:05,777 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea FirstFitRoutingAllocator) Looking for speed=4200Mhz, Ram=4096 2017-01-30 08:44:05,777 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea FirstFitRoutingAllocator) Host Allocator returning 0 suitable hosts 2017-01-30 08:44:05,777 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No suitable hosts found 2017-01-30 08:44:05,777 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No suitable hosts found under this Cluster: 25 2017-01-30 08:44:05,787 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Could not find suitable Deployment Destination for this VM under any clusters, returning. 2017-01-30 08:44:05,787 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Searching resources only under specified Cluster: 25 2017-01-30 08:44:05,798 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The specified cluster is in avoid set, returning. 2017-01-30 08:44:05,798 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Unable to find destination for migrating the vm VM[User|i-189-1178-VM] 2017-01-30 08:44:05,798 WARN [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Failed to deploy vm 1178 with original planner, sending HAPlanner

2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Deploy avoids pods: [], clusters: [], hosts: [29] 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) DeploymentPlanner allocation algorithm: com.cloud.deploy.SkipHeuresticsPlanner@4fdf86ce 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Trying to allocate a host and storage pools from dc:17, pod:25,cluster:25, requested cpu: 4200, requested ram: 4294967296 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Is ROOT volume READY (pool already allocated)?: Yes 2017-01-30 08:44:05,985 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) This VM has last host_id specified, trying to choose the same host: 25

2017-01-30 08:44:06,022 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The last host of this VM is not UP or is not enabled, host status is: Disconnected, host resource state is: Enabled 2017-01-30 08:44:06,022 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Cannot choose the last host to deploy this VM 2017-01-30 08:44:06,022 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Searching resources only under specified Cluster: 25 2017-01-30 08:44:06,039 DEBUG [c.c.d.SkipHeuresticsPlanner] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Deploying vm during HA process, so skipping disable threshold check

2017-01-30 08:44:06,151 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea FirstFitRoutingAllocator) Found 0 hosts for allocation after prioritization: [] 2017-01-30 08:44:06,151 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea FirstFitRoutingAllocator) Looking for speed=4200Mhz, Ram=4096 2017-01-30 08:44:06,151 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea FirstFitRoutingAllocator) Host Allocator returning 0 suitable hosts 2017-01-30 08:44:06,151 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No suitable hosts found 2017-01-30 08:44:06,151 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) No suitable hosts found under this Cluster: 25 2017-01-30 08:44:06,157 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Publish async job-192986 complete on message bus 2017-01-30 08:44:06,157 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Wake up jobs related to job-192986 2017-01-30 08:44:06,157 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Update db status for job-192986 2017-01-30 08:44:06,157 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) The last host of this VM is not UP or is not enabled, host status is: Disconnected, host resource state is: Enabled 2017-01-30 08:44:06,157 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) Cannot choose the last host to deploy this VM 2017-01-30 08:44:06,157 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) Searching resources only under specified Cluster: 35 2017-01-30 08:44:06,160 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Could not find suitable Deployment Destination for this VM under any clusters, returning. 2017-01-30 08:44:06,160 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Searching resources only under specified Cluster: 25 2017-01-30 08:44:06,163 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-eb81fc0b) Execute sync-queue item: SyncQueueItemVO {id:63629, queueId: 64423, contentType: AsyncJob, contentId: 192991, lastProcessMsid: 345049098498, lastprocessNumber: 27, lastProcessTime: Mon Jan 30 08:44:06 CET 2017, created: Mon Jan 30 08:44:05 CET 2017} 2017-01-30 08:44:06,169 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-8:ctx-381e8d1a job-14953/job-192992) Add job-192992 into job monitoring 2017-01-30 08:44:06,184 DEBUG [c.c.d.SkipHeuresticsPlanner] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) Deploying vm during HA process, so skipping disable threshold check 2017-01-30 08:44:06,186 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) The specified cluster is in avoid set, returning. 2017-01-30 08:44:06,186 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Unable to find destination for migrating the vm VM[User|i-189-1178-VM] 2017-01-30 08:44:06,186 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Invocation exception, caused by: com.cloud.exception.InsufficientServerCapacityException: Unable to find a server to migrate to.Scope=interface com.cloud.org.Cluster; id=25 2017-01-30 08:44:06,187 INFO [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989 ctx-ea83d6ea) Rethrow exception com.cloud.exception.InsufficientServerCapacityException: Unable to find a server to migrate to.Scope=interface com.cloud.org.Cluster; id=25 2017-01-30 08:44:06,187 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Done with run of VM work job: com.cloud.vm.VmWorkMigrateAway for VM 1178, job origin: 15393 2017-01-30 08:44:06,187 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Unable to complete AsyncJobVO {id:192989, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAEmnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAHQ, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 345049103441, completeMsid: null, lastUpdated: null, lastPolled: null, created: Mon Jan 30 08:44:03 CET 2017}, job origin:15393 com.cloud.exception.InsufficientServerCapacityException: Unable to find a server to migrate to.Scope=interface com.cloud.org.Cluster; id=25 at com.cloud.vm.VirtualMachineManagerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:2390) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:4520)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:606)

at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4636) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:103) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)

2017-01-30 08:44:06,188 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-eb81fc0b) Schedule queued job-192991 2017-01-30 08:44:06,192 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Wake up jobs joined with job-192986 and disjoin all subjobs created from job- 192986 2017-01-30 08:44:06,196 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Complete async job-192989, jobStatus: FAILED, resultCode: 0, result: rO0ABXNyADdjb20uY2xvdWQuZXhjZXB0aW9uLkluc3VmZmljaWVudFNlcnZlckNhcGFjaXR5RXhjZXB0aW9uAAAAAFZNT3ACAAFaABVhZmZpbml0eUdyb3Vwc0FwcGxpZWR4cgAxY29tLmNsb3VkLmV4Y2VwdGlvbi5JbnN1ZmZpY2llbnRDYXBhY2l0eUV4Y2VwdGlvbgAAAABWTU91AgACTAACaWR0ABBMamF2YS9sYW5nL0xvbmc7TAAFc2NvcGV0ABFMamF2YS9sYW5nL0NsYXNzO3hyACJjb20uY2xvdWQuZXhjZXB0aW9uLkNsb3VkRXhjZXB0aW9ueeiOG_HkRMcCAAJMAAtjc0Vycm9yQ29kZXQAE0xqYXZhL2xhbmcvSW50ZWdlcjtMAAZpZExpc3R0ABVMamF2YS91dGlsL0FycmF5TGlzdDt4cgATamF2YS5sYW5nLkV4Y2VwdGlvbtD9Hz4aOxzEAgAAeHIAE2phdmEubGFuZy5UaHJvd2FibGXVxjUnOXe4ywMABEwABWNhdXNldAAVTGphdmEvbGFuZy9UaHJvd2FibGU7TAANZGV0YWlsTWVzc2FnZXQAEkxqYXZhL2xhbmcvU3RyaW5nO1sACnN0YWNrVHJhY2V0AB5bTGphdmEvbGFuZy9TdGFja1RyYWNlRWxlbWVudDtMABRzdXBwcmVzc2VkRXhjZXB0aW9uc3QAEExqYXZhL3V0aWwvTGlzdDt4cHEAfgANdAAmVW5hYmxlIHRvIGZpbmQgYSBzZXJ2ZXIgdG8gbWlncmF0ZSB0by51cgAeW0xqYXZhLmxhbmcuU3RhY2tUcmFjZUVsZW1lbnQ7AkYqPDz9IjkCAAB4cAAAABVzcgAbamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW50YQnFmiY23YUCAARJAApsaW5lTnVtYmVyTAAOZGVjbGFyaW5nQ2xhc3NxAH4ACkwACGZpbGVOYW1lcQB-AApMAAptZXRob2ROYW1lcQB-AAp4cAAACVZ0ACZjb20uY2xvdWQudm0uVmlydHVhbE1hY2hpbmVNYW5hZ2VySW1wbHQAHlZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwuamF2YXQAFm9yY2hlc3RyYXRlTWlncmF0ZUF3YXlzcQB-ABEAABGocQB-ABNxAH4AFHEAfgAVc3EAfgAR_____nQAJHN1bi5yZWZsZWN0Lk5hdGl2ZU1ldGhvZEFjY2Vzc29ySW1wbHQAHU5hdGl2ZU1ldGhvZEFjY2Vzc29ySW1wbC5qYXZhdAAHaW52b2tlMHNxAH4AEQAAADlxAH4AGHEAfgAZdAAGaW52b2tlc3EAfgARAAAAK3QAKHN1bi5yZWZsZWN0LkRlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcGx0ACFEZWxlZ2F0aW5nTWV0aG9kQWNjZXNzb3JJbXBsLmphdmFxAH4AHHNxAH4AEQAAAl50ABhqYXZhLmxhbmcucmVmbGVjdC5NZXRob2R0AAtNZXRob2QuamF2YXEAfgAcc3EAfgARAAAAa3QAImNvbS5jbG91ZC52bS5WbVdvcmtKb2JIYW5kbGVyUHJveHl0ABpWbVdvcmtKb2JIYW5kbGVyUHJveHkuamF2YXQAD2hhbmRsZVZtV29ya0pvYnNxAH4AEQAAEhxxAH4AE3EAfgAUcQB-ACZzcQB-ABEAAABndAAgY29tLmNsb3VkLnZtLlZtV29ya0pvYkRpc3BhdGNoZXJ0ABhWbVdvcmtKb2JEaXNwYXRjaGVyLmphdmF0AAZydW5Kb2JzcQB-ABEAAAIZdAA_b3JnLmFwYWNoZS5jbG91ZHN0YWNrLmZyYW1ld29yay5qb2JzLmltcGwuQXN5bmNKb2JNYW5hZ2VySW1wbCQ1dAAYQXN5bmNKb2JNYW5hZ2VySW1wbC5qYXZhdAAMcnVuSW5Db250ZXh0c3EAfgARAAAAMXQAPm9yZy5hcGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuTWFuYWdlZENvbnRleHRSdW5uYWJsZSQxdAAbTWFuYWdlZENvbnRleHRSdW5uYWJsZS5qYXZhdAADcnVuc3EAfgARAAAAOHQAQm9yZy5hcGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuaW1wbC5EZWZhdWx0TWFuYWdlZENvbnRleHQkMXQAGkRlZmF1bHRNYW5hZ2VkQ29udGV4dC5qYXZhdAAEY2FsbHNxAH4AEQAAAGd0AEBvcmcuYXBhY2hlLmNsb3Vkc3RhY2subWFuYWdlZC5jb250ZXh0LmltcGwuRGVmYXVsdE1hbmFnZWRDb250ZXh0cQB-ADZ0AA9jYWxsV2l0aENvbnRleHRzcQB-ABEAAAA1cQB-ADlxAH4ANnQADnJ1bldpdGhDb250ZXh0c3EAfgARAAAALnQAPG9yZy5hcGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuTWFuYWdlZENvbnRleHRSdW5uYWJsZXEAfgAycQB-ADNzcQB-ABEAAAHucQB-AC1xAH4ALnEAfgAzc3EAfgARAAAB13QALmphdmEudXRpbC5jb25jdXJyZW50LkV4ZWN1dG9ycyRSdW5uYWJsZUFkYXB0ZXJ0AA5FeGVjdXRvcnMuamF2YXEAfgA3c3EAfgARAAABBnQAH2phdmEudXRpbC5jb25jdXJyZW50LkZ1dHVyZVRhc2t0AA9GdXR1cmVUYXNrLmphdmFxAH4AM3NxAH4AEQAABHl0ACdqYXZhLnV0aWwuY29uY3VycmVudC5UaHJlYWRQb29sRXhlY3V0b3J0ABdUaHJlYWRQb29sRXhlY3V0b3IuamF2YXQACXJ1bldvcmtlcnNxAH4AEQAAAmd0AC5qYXZhLnV0aWwuY29uY3VycmVudC5UaHJlYWRQb29sRXhlY3V0b3IkV29ya2VycQB-AEhxAH4AM3NxAH4AEQAAAul0ABBqYXZhLmxhbmcuVGhyZWFkdAALVGhyZWFkLmphdmFxAH4AM3NyACZqYXZhLnV0aWwuQ29sbGVjdGlvbnMkVW5tb2RpZmlhYmxlTGlzdPwPJTG17I4QAgABTAAEbGlzdHEAfgAMeHIALGphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVDb2xsZWN0aW9uGUIAgMte9x4CAAFMAAFjdAAWTGphdmEvdXRpbC9Db2xsZWN0aW9uO3hwc3IAE2phdmEudXRpbC5BcnJheUxpc3R4gdIdmcdhnQMAAUkABHNpemV4cAAAAAB3BAAAAAB4cQB-AFR4c3IAEWphdmEubGFuZy5JbnRlZ2VyEuKgpPeBhzgCAAFJAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAEO9zcQB-AFMAAAAAdwQAAAAAeHNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cQB-AFYAAAAAAAAAGXZyABVjb20uY2xvdWQub3JnLkNsdXN0ZXKqoV-omaW2FAIAAHhwAA 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) Deploy avoids pods: [], clusters: [], hosts: [3] 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) DeploymentPlanner allocation algorithm: com.cloud.deploy.SkipHeuresticsPlanner@4fdf86ce 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) Trying to allocate a host and storage pools from dc:14, pod:17,cluster:17, requested cpu: 4200, requested ram: 4294967296 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) Is ROOT volume READY (pool already allocated)?: Yes 2017-01-30 08:44:06,201 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) This VM has last host_id specified, trying to choose the same host: 213 2017-01-30 08:44:06,213 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Publish async job-192989 complete on message bus 2017-01-30 08:44:06,213 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Wake up jobs related to job-192989 2017-01-30 08:44:06,213 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Update db status for job-192989 2017-01-30 08:44:06,219 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f) Checking resources in Cluster: 35 under Pod: 31 2017-01-30 08:44:06,227 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f FirstFitRoutingAllocator) Looking for hosts in dc: 23 pod:31 cluster:35 2017-01-30 08:44:06,229 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-8:ctx-381e8d1a job-14953/job-192992) Executing AsyncJobVO {id:192992, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAGwXQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAEw, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 345049098122, completeMsid: null, lastUpdated: null, lastPolled: null, created: Mon Jan 30 08:44:05 CET 2017} 2017-01-30 08:44:06,229 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-8:ctx-381e8d1a job-14953/job-192992) Run VM work job: com.cloud.vm.VmWorkMigrateAway for VM 1729, job origin: 14953 2017-01-30 08:44:06,230 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (AsyncJobMgr-Heartbeat-1:ctx-eb81fc0b) Execute sync-queue item: SyncQueueItemVO {id:63630, queueId: 67180, contentType: AsyncJob, contentId: 192990, lastProcessMsid: 345049098498, lastprocessNumber: 61, lastProcessTime: Mon Jan 30 08:44:06 CET 2017, created: Mon Jan 30 08:44:05 CET 2017} 2017-01-30 08:44:06,244 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) The last host of this VM is not UP or is not enabled, host status is: Disconnected, host resource state is: Enabled 2017-01-30 08:44:06,262 DEBUG [c.c.d.DeploymentPlanningManagerImpl] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) Cannot choose the last host to deploy this VM 2017-01-30 08:44:06,263 DEBUG [c.c.d.FirstFitPlanner] (Work-Job-Executor-7:ctx-bcf531e9 job-13729/job-192987 ctx-4c93bc31) Searching resources only under specified Cluster: 17 2017-01-30 08:44:06,262 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-9:ctx-f4d09efb job-14965/job-192991) Add job-192991 into job monitoring 2017-01-30 08:44:06,246 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Done executing com.cloud.vm.VmWorkMigrateAway for job-192986 2017-01-30 08:44:06,244 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Wake up jobs joined with job-192989 and disjoin all subjobs created from job- 192989 2017-01-30 08:44:06,269 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-4:ctx-017064e7 job-13733/job-192986) Remove job-192986 from job monitoring 2017-01-30 08:44:06,279 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Done executing com.cloud.vm.VmWorkMigrateAway for job-192989 2017-01-30 08:44:06,305 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-9:ctx-f4d09efb job-14965/job-192991) Executing AsyncJobVO {id:192991, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkMigrateAway, cmdInfo: rO0ABXNyAB5jb20uY2xvdWQudm0uVm1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb20uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZzt4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAGRXQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAABw, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 345049098122, completeMsid: null, lastUpdated: null, lastPolled: null, created: Mon Jan 30 08:44:05 CET 2017} 2017-01-30 08:44:06,305 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-9:ctx-f4d09efb job-14965/job-192991) Run VM work job: com.cloud.vm.VmWorkMigrateAway for VM 1605, job origin: 14965 2017-01-30 08:44:06,306 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-13:null) Ping from 258 2017-01-30 08:44:06,319 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f FirstFitRoutingAllocator) List of hosts in ascending order of number of VMs: [224, 19, 21, 15] 2017-01-30 08:44:06,322 DEBUG [c.c.a.m.a.i.FirstFitAllocator] (Work-Job-Executor-6:ctx-0d738a8e job-13737/job-192988 ctx-6e9c753f FirstFitRoutingAllocator) FirstFitAllocator has 0 hosts to check for allocation: [] 2017-01-30 08:44:06,322 INFO [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-13:null) PingMap for agent: 258 will not be updated because agent is no longer in the PingMap 2017-01-30 08:44:06,326 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-5:ctx-32f1e087 job-15393/job-192989) Remove job-192989 from job monitoring

2017-01-30 08:54:38,139 INFO [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) Migration cancelled because state has changed: VM[User|i-189-1178-VM] 2017-01-30 08:54:38,139 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) Unable to migrate VM due to: Migration cancelled because state has changed: VM[User|i-189-1178-VM] 2017-01-30 08:54:38,167 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) VM state transitted from :Running to Stopping with event: StopRequestedvm's original host id: 29 new host id: 33 host id before state transition: 33 2017-01-30 08:54:38,177 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-325:ctx-e540d7f5) Seq 19-2355664080091480198: Response Received: 2017-01-30 08:54:38,177 DEBUG [c.c.a.t.Request] (DirectAgent-325:ctx-e540d7f5) Seq 19-2355664080091480198: Processing: { Ans: , MgmtId: 345049098498, via: 19, Ver: v1, Flags: 10, [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-1706-VM","bytesSent":12276848087,"bytesReceived":5226621450,"result":true,"details":"","wait":0}}] } 2017-01-30 08:54:38,177 DEBUG [c.c.a.t.Request] (RouterMonitor-1:ctx-2e80f405) Seq 19-2355664080091480198: Received: { Ans: , MgmtId: 345049098498, via: 19, Ver: v1, Flags: 10, { NetworkUsageAnswer } } 2017-01-30 08:54:38,177 DEBUG [c.c.a.m.AgentManagerImpl] (RouterMonitor-1:ctx-2e80f405) Details from executing class com.cloud.agent.api.NetworkUsageCommand: 2017-01-30 08:54:38,178 DEBUG [c.c.a.t.Request] (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) Seq 33-1917970491306410035: Sending { Cmd , MgmtId: 345049098498, via: 33(ewcstack-vh003-prod), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":false,"vmName":"i-189-1178-VM","wait":0}}] } 2017-01-30 08:54:38,178 DEBUG [c.c.a.t.Request] (Work-Job-Executor-20:ctx-a988090f job-14965/job-193019 ctx-052eb4f5) Seq 33-1917970491306410035: Executing: { Cmd , MgmtId: 345049098498, via: 33(ewcstack-vh003-prod), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":false,"checkBeforeCleanup":false,"vmName":"i-189-1178-VM","wait":0}}] } 2017-01-30 08:54:38,178 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-43:ctx-da30f109) Seq 33-1917970491306410035: Executing request 2017-01-30 08:54:38,202 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-326:ctx-4fb88aa7) Seq 21-1035546439318503505: Executing request

2017-01-30 09:05:55,628 INFO [c.c.h.HighAvailabilityManagerImpl] (HA-Worker-4:ctx-36aae733 work-2049) Processing HAWork[2049-Migration-1178-Stopped-Migrating] 2017-01-30 09:05:55,633 WARN [o.a.c.f.j.AsyncJobExecutionContext] (HA-Worker-4:ctx-36aae733 work-2049) Job is executed without a context, setup psudo job for the executing thread 2017-01-30 09:05:55,645 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (HA-Worker-4:ctx-36aae733 work-2049) Sync job-193045 execution on object VmWorkJobQueue.1178




On 31.01.2017 00:42, Rafael Weingärtner wrote:

This seems to be one of those nasty bugs that are very hard to debug. It
seems to be cause to a combination of different factors, such as planets
alignments ;)

At class “VirtualMachineManagerImpl” line 2395 (method
“com.cloud.vm.VirtualMachineManagerImpl.orchestrateMigrateAway(String,
long, DeploymentPlanner)”) is the start of the code flow that triggers the
migration. The point here is that, if an error occurs during the migration,
the VM will be stopped; For some reason this is executed only for HA VMs.

Going a little bit further, we find the point that is triggering the error
in the method “com.cloud.vm.VirtualMachineManagerImpl.migrate(VMInstanceVO,
long, DeployDestination)” at line 1941. This line will call the method
“com.cloud.vm.VirtualMachineManagerImpl.changeState(VMInstanceVO, Event,
Long, ItWorkVO, Step)” to change the VM state to migrating/prepare
migration/or something like that (I am guessing here), but for some reason,
the state of the VM is not the one expected and this is causing a
“ConcurrentOperationException” to be thrown. The problem is that the
“com.cloud.vm.VirtualMachineManagerImpl.changeState(VMInstanceVO, Event,
Long, ItWorkVO, Step)” does not seem to provide much useful log entries.

Just to enable us to discuss the problem a little further, could you tell
us a little bit about your environment setting? How many ACS MSs do you
have? What is the VM workload being migrated (I mean, the number of VMs and
computing resources)? Do the other servers have enough computing resources
to support these VMs? Could you also check the XenServer logs, and check
what happened on both, source and destination hosts when the migration for
VM “i-638-1735-VM” was triggered?

Do you have the knowledge/skills to use an IDE (e.g. Eclipse) to connect to
a running JVM and debug a code? Because for me it will not be possible to
go further without looking at the variables in a running environment.


On Mon, Jan 30, 2017 at 4:28 AM, Francois Scheurer <
francois.scheu...@everyware.ch> wrote:

Dear Rafael


Sorry for the late answer (was sick a couple of days).

I confirm we are migrating inside the same xenserver pool (within a CS
Cluster).


Best Regards
Francois



On 26.01.2017 04:35, Rafael Weingärtner wrote:

Found the code that is executed.
I need you to confirm if you are migrating within a cluster or across
clusters?

On Wed, Jan 25, 2017 at 11:57 AM, Rafael Weingärtner <
rafaelweingart...@gmail.com> wrote:

I do not have access to ACS code right now, but I suggest starting

debugging here:

2017-01-24 18:05:28,427 DEBUG [c.c.c.CapacityManagerImpl]

(Work-Job-Executor-156:ctx-14b86f5e job-60216/job-191380 ctx-175d37df)
VM state transitted from :Running to Stopping with event:
StopRequestedvm's
original host id: 224 new host id: 15 host id before state transition:
15

I would try to understand first why ACS requested the hypervisor to stop

the VM. I mean the following, check the source code for conditions that
would make ACS request the shutdown of VMs.
BTW: are you migrating within a cluster or across clusters?

I will be able to analyze this further only after work hours, so if you
find anything, keep me posted.

On Wed, Jan 25, 2017 at 11:45 AM, Francois Scheurer <
francois.scheu...@everyware.ch> wrote:

Hello Rafael


I think the important log lines are these:

ewcstack-man02-prod: 2017-01-24 18:05:28,407 INFO
[c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) Migration cancelled because state has
changed: VM[User|i-638-1736-VM]
ewcstack-man02-prod: 2017-01-24 18:05:28,407 DEBUG
[c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) Unable to migrate VM due to:
Migration
cancelled because state has changed: VM[User|i-638-1736-VM]
ewcstack-man02-prod: 2017-01-24 18:05:28,427 DEBUG
[c.c.c.CapacityManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) VM state transitted from :Running to
Stopping with event: StopRequestedvm's original host id: 224 new host
id:
15 host id before state transition: 15

The really disturbing thing is that CS stop again the VM if we start it
manually, apparently because the migratingjob is still running.

Note the the VM HA Flag is enabled.



Thank you.


Best Regards.

Francois






more details:



ewcstack-man02-prod: 2017-01-24 18:05:28,148 DEBUG
[o.a.c.e.o.VolumeOrchestrator] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) Preparing 3 volumes for
VM[User|i-638-1736-VM]
ewcstack-man02-prod: 2017-01-24 18:05:28,155 DEBUG
[c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-122:ctx-a8bacc64
job-14953/job-191351) Done with run of VM work job:
com.cloud.vm.VmWorkMigrateAway for VM 1411, job origin: 14953
ewcstack-man02-prod: 2017-01-24 18:05:28,155 DEBUG
[o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-122:ctx-a8bacc64
job-14953/job-191351) Done executing com.cloud.vm.VmWorkMigrateAway for
job-191351
ewcstack-man02-prod: 2017-01-24 18:05:28,160 INFO
[o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-122:ctx-a8bacc64
job-14953/job-191351) Remove job-191351 from job monitoring
ewcstack-man02-prod: 2017-01-24 18:05:28,168 INFO
[c.c.h.HighAvailabilityManagerImpl] (HA-Worker-2:ctx-f3bcf43c
work-1989)
Completed HAWork[1989-Migration-1411-Running-Migrating]
ewcstack-man02-prod: 2017-01-24 18:05:28,246 DEBUG
[c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) Seq 19-8526158519542516375:
Forwarding
Seq 19-8526158519542516375:  { Cmd , MgmtId: 345049103441, via:
19(ewcstack-vh010-prod), Ver: v1, Flags: 100111, [{
"com.cloud.agent.api.PrepareForMigrationCommand":{"vm":{"id"
:1736,"name":"i-638-1736-VM","bootloader":"PyGrub","type":"U
ser","cpus":4,"minSpeed":2100,"maxSpeed":2100,"minRam":17179
869184,"maxRam":17179869184,"arch":"x86_64","os":"Other
(64-bit)","platformEmulator":"Other install
media","bootArgs":"","enable
HA":true,"limitCpuUse":true,"enableDynamicallyScaleVm":true,
"vncPassword":"pC1WUC7h1DBH9J36q3H9pg==","params":{"memoryOv
ercommitRatio":"1.0","platform":"viridian:true;acpi:1;apic:t
rue;pae:true;nx:true;timeoffset:-3601","keyboard":"us","Mess
age.ReservedCapacityFreed.Flag":"false","cpuOvercommitRatio":"4.0","
hypervisortoolsversion":"xenserver61"},"uuid":"5bff01de-c033
-4925-9e47-30dd14539272","disks":[{"data":{"org.apache.cloud
stack.storage.to.VolumeObjectTO":{"uuid":"1ab514b2-f2de-
4d85-8e0d-9768184a1349","volumeType":"ROOT","dataStore":{"org.
apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"b8751b89-00f
6-5079-ed7a-c073579c7563","id":64,"poolType":"PreSetup","hos
t":"localhost","path":"/b8751b89-00f6-5079-ed7a-c073579c7563
","port":0,"url":"PreSetup://localhost/b8751b89-00f6-5079-ed
7a-c073579c7563/?ROLE=Primary&STOREUUID=b8751b89-00f6-5079-
ed7a-c073579c7563"}},"name":"ROOT-1736","size":53687091200,"path":"d3
f87e3c-1efe-42ad-aa93-609d5b980a34","volumeId":8655,"vmName"
:"i-638-1736-VM","accountId":638,"format":"VHD","provisionin
gType":"THIN","id":8655,"deviceId":0,"hypervisorType":"
XenServer"}},"diskSeq":0,"path":"d3f87e3c-1efe-42ad-aa93
-609d5b980a34","type":"ROOT","_details":{"managed":"false","
storagePort":"0",
"storageHost":"localhost","volumeSize":"53687091200"}},{"dat
a":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid
":"733df9dc-5e24-4ab9-b22b-bce78322428e","volumeType":"DATAD
ISK","dataStore":{"org.apache.cloudstack.storage.to.PrimaryD
ataStoreTO":{"uuid":"b8751b89-00f6-5079-ed7a-c073579c7563","id":64,"
poolType":"PreSetup","host":"localhost","path":"/b8751b89-00
f6-5079-ed7a-c073579c7563","port":0,"url":"PreSetup://localh
ost/b8751b89-00f6-5079-ed7a-c073579c7563/?ROLE=Primary&STORE
UUID=b8751b89-00f6-5079-ed7a-c073579c7563"}},"name":"WELLA-
DB1-NCSA_D","size":268435456000,"path":"265f168a-bb92-4591-a32b-ee3e2
2c8cb68","volumeId":8656,"vmName":"i-638-1736-VM","accountId
":638,"format":"VHD","provisioningType":"THIN","id":8656,"
deviceId":1,"hypervisorType":"XenServer"}},"diskSeq":1,"
path":"265f168a-bb92-4591-a32b-ee3e22c8cb68","type":"DATADIS
K","_details":{"managed":"false","storagePort":"0","stor
ageHost":"localhos
t","volumeSize":"268435456000"}},{"data":{"org.apache.clouds
tack.storage.to.VolumeObjectTO":{"uuid":"bda6f335-b8a3-479c-
8c0d-41b344d2a50d","volumeType":"DATADISK","dataStore":{"org.
apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid"
:"09df3ca6-0a20-e4f0-3d79-12a6ed67a13d","id":89,"poolType":"PreSetup","
host":"localhost","path":"/09df3ca6-0a20-e4f0-3d79-12a6ed67a
13d","port":0,"url":"PreSetup://localhost/09df3ca6-0a20-e4f0
-3d79-12a6ed67a13d/?ROLE=Primary&STOREUUID=09df3ca6-0a20-

e4f0-3d79-12a6ed67a13d"}},"name":"WELLA-DB1-NCSA_E","size
":429496729600,"path":"612cb586-a70c-4896-b120-062ee7a67894"
,"volumeId":86
57,"vmName":"i-638-1736-VM","accountId":638,"format":"VHD","
provisioningType":"THIN","id":8657,"deviceId":2,"hypervisorT
ype":"XenServer"}},"diskSeq":2,"path":"612cb586-a70c-4896-b1
20-062ee7a67894","type":"DATADISK","_details":{"managed":"
false","storagePort":"0","storageHost":"localhost","volumeSize":"42949
6729600"}},{"data":{"org.apache.cloudstack.storage.to.Templa
teObjectTO":{"uuid":"0e5bf4ff-caca-4b08-abed-7bbcdf3c42ef","
id":201,"format":"ISO","accountId":1,"hvm":true,"displayText
":"xen-pv-drv-iso","name":"xs-tools.iso","guestOsType":"CentOS 4.5
(32-bit)","hypervisorType":"XenServer"}},"diskSeq":3,"type":"
ISO"}],"nics":[{"deviceId":0,"networkRateMbps":1000,"default
Nic":true,"pxeDisable":false,"nicUuid":"3523c3d3-9f92-4076-a
d99-f8864d0f0edd","uuid":"e6fbb257-16e1-415f-88fb-9a79b263b3
a8","ip":"192.168.1.42","netmask":"255.255.255.0","
gateway":"192.168.1.1","mac":"02:00:0c:ae:00:0f","dns1":"8.8
.8.8","dns2":"8.8
.4.4","broadcastType":"Vlan","type":"Guest","broadcastUri":"
vlan://2285","isolationUri":"vlan://2285","isSecurityGroupEn
abled":false,"name":"bond0"}],"vcpuMaxLimit":16},"wait":0}}] } to
345049098498
ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG [c.c.a.t.Request]
(AgentManager-Handler-10:null) Seq 19-8526158519542516375: Processing:
{
Ans: , MgmtId: 345049103441, via: 19, Ver: v1, Flags: 110,
[{"com.cloud.agent.api.PrepareForMigrationAnswer":{"result":
true,"wait":0}}]
}
ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG
[c.c.a.m.AgentAttache]
(AgentManager-Handler-10:null) Seq 19-8526158519542516375: No more
commands
found
ewcstack-man02-prod: 2017-01-24 18:05:28,399 DEBUG [c.c.a.t.Request]
(Work-Job-Executor-156:ctx-14b86f5e job-60216/job-191380 ctx-175d37df)
Seq 19-8526158519542516375: Received:  { Ans: , MgmtId: 345049103441,
via:
19, Ver: v1, Flags: 110, { PrepareForMigrationAnswer } }

ewcstack-man02-prod: 2017-01-24 18:05:28,407 INFO
[c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) Migration cancelled because state has
changed: VM[User|i-638-1736-VM]
ewcstack-man02-prod: 2017-01-24 18:05:28,407 DEBUG
[c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) Unable to migrate VM due to:
Migration
cancelled because state has changed: VM[User|i-638-1736-VM]
ewcstack-man02-prod: 2017-01-24 18:05:28,427 DEBUG
[c.c.c.CapacityManagerImpl] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) VM state transitted from :Running to
Stopping with event: StopRequestedvm's original host id: 224 new host
id:
15 host id before state transition: 15

ewcstack-man02-prod: 2017-01-24 18:05:28,438 DEBUG
[c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-156:ctx-14b86f5e
job-60216/job-191380 ctx-175d37df) Seq 15-6889944479923437643:
Forwarding
Seq 15-6889944479923437643:  { Cmd , MgmtId: 345049103441, via:
15(ewcstack-vh011-prod), Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.StopCommand":{"isProxy":false,"execut
eInSequence":false,"checkBeforeCleanup":false,"vmName":"i-
638-1736-VM","wait":0}}]
} to 345049098498
ewcstack-man02-prod: 2017-01-24 18:05:30,335 DEBUG
[c.c.a.m.AgentManagerImpl] (AgentManager-Handler-1:null) SeqA
254-800498:
Processing Seq 254-800498:  { Cmd , MgmtId: -1, via: 254, Ver: v1,
Flags:
11, [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_
proxyVmId":1530,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
ewcstack-man02-prod: 2017-01-24 18:05:30,340 DEBUG
[c.c.a.m.AgentManagerImpl] (AgentManager-Handler-1:null) SeqA
254-800498:
Sending Seq 254-800498:  { Ans: , MgmtId: 345049103441, via: 254, Ver:
v1,
Flags: 100010, [{"com.cloud.agent.api.AgentCo
ntrolAnswer":{"result":true,"wait":0}}] }
ewcstack-man02-prod: 2017-01-24 18:05:32,447 DEBUG [c.c.a.t.Request]
(AgentManager-Handler-11:null) Seq 15-6889944479923437632: Processing:
{
Ans: , MgmtId: 345049103441, via: 15, Ver: v1, Flags: 10,
[{"com.cloud.agent.api.StopAnswer":{"platform":"viridian:tru
e;acpi:1;apic:true;pae:true;nx:true;timeoffset:3601","result
":true,"details":"Stop
VM i-638-1735-VM Succeed","wait":0}}] }
ewcstack-man02-prod: 2017-01-24 18:05:32,447 DEBUG [c.c.a.t.Request]
(Work-Job-Executor-115:ctx-0d5db528 job-13733/job-191379 ctx-bbabfeae)
Seq 15-6889944479923437632: Received:  { Ans: , MgmtId: 345049103441,
via:
15, Ver: v1, Flags: 10, { StopAnswer } }
ewcstack-man02-prod: 2017-01-24 18:05:32,470 DEBUG
[c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) VM[User|i-638-1735-VM] is stopped on
the
host. Proceeding to release resource held.
ewcstack-man02-prod: 2017-01-24 18:05:32,475 DEBUG
[c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not
supported
in the network id=519
ewcstack-man02-prod: 2017-01-24 18:05:32,478 DEBUG
[o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Changing active number of nics for
network id=519 on -1
ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG
[o.a.c.e.o.NetworkOrchestrator] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Asking VpcVirtualRouter to release
NicProfile[17403-1735-3364083b-dcc0-4eb9-a7f9-f3e85bf8c7d3-1
92.168.1.11-null
ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG
[c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Successfully released network
resources
for the vm VM[User|i-638-1735-VM]
ewcstack-man02-prod: 2017-01-24 18:05:32,488 DEBUG
[c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Successfully released storage
resources
for the vm VM[User|i-638-1735-VM]
ewcstack-man02-prod: 2017-01-24 18:05:32,496 DEBUG
[c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) VM state transitted from :Stopping to
Stopped with event: OperationSucceededvm's original host id: 15 new host
id: null host id before state transition: 15
ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
[c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Hosts's actual total CPU: 82976 and
CPU
after applying overprovisioning: 331904
ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
[c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Hosts's actual total RAM:
261961927808
and RAM after applying overprovisioning: 261961924608
ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
[c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) release cpu from host: 15, old used:
71300,reserved: 2100, actual total: 82976, total with overprovisioning:
331904; new used: 67100,reserved:6300; movedfromreserved:
false,moveToReserveredtrue
ewcstack-man02-prod: 2017-01-24 18:05:32,505 DEBUG
[c.c.c.CapacityManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) release mem from host: 15, old used:
98784247808,reserved: 1073741824, total: 261961924608; new used:
94489280512,reserved:5368709120 <(53)%206870-9120>; movedfromreserved:

false,moveToReserveredtrue
ewcstack-man02-prod: 2017-01-24 18:05:32,538 DEBUG
[c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not
supported
in the network id=519
ewcstack-man02-prod: 2017-01-24 18:05:32,557 DEBUG
[c.c.n.NetworkModelImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Service SecurityGroup is not
supported
in the network id=519

ewcstack-man02-prod: 2017-01-24 18:05:32,588 ERROR
[c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Invocation exception, caused by:
com.cloud.utils.exception.CloudRuntimeException: Unable to migrate
VM[User|i-638-1735-VM]
ewcstack-man02-prod: 2017-01-24 18:05:32,589 INFO
[c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379 ctx-bbabfeae) Rethrow exception
com.cloud.utils.exception.CloudRuntimeException: Unable to migrate
VM[User|i-638-1735-VM]
ewcstack-man02-prod: 2017-01-24 18:05:32,589 DEBUG
[c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Done with run of VM work job:
com.cloud.vm.VmWorkMigrateAway for VM 1735, job origin: 13733
ewcstack-man02-prod: 2017-01-24 18:05:32,589 ERROR
[c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Unable to complete AsyncJobVO {id:191379, userId:
1,
accountId: 1, instanceType: null, instanceId: null, cmd:
com.cloud.vm.VmWorkMigrateAway, cmdInfo: rO0ABXNyAB5jb20uY2xvdWQudm0uVm
1Xb3JrTWlncmF0ZUF3YXmt4MX4jtcEmwIAAUoACXNyY0hvc3RJZHhyABNjb2
0uY2xvdWQudm0uVm1Xb3Jrn5m2VvAlZ2sCAARKAAlhY2NvdW50SWRKAAZ1c2
VySWRKAAR2bUlkTAALaGFuZGxlck5hbWV0ABJMamF2YS9sYW5nL1N0cmluZz
t4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAGx3QAGVZpcnR1YWxNYWNoaW5lTW
FuYWdlckltcGwAAAAAAAAA4A,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049098498, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Tue Jan 24 18:05:00 CET 2017}, job
origin:13733
ewcstack-man02-prod: com.cloud.utils.exception.CloudRuntimeException:
Unable to migrate VM[User|i-638-1735-VM]
ewcstack-man02-prod:    at com.cloud.vm.VirtualMachineMan
agerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:2405)
ewcstack-man02-prod:    at com.cloud.vm.VirtualMachineMan
agerImpl.orchestrateMigrateAway(VirtualMachineManagerImpl.java:4517)
ewcstack-man02-prod:    at sun.reflect.GeneratedMethodAcc
essor402.invoke(Unknown
Source)
ewcstack-man02-prod:    at sun.reflect.DelegatingMethodAc
cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
ewcstack-man02-prod:    at java.lang.reflect.Method.invok
e(Method.java:606)
ewcstack-man02-prod:    at com.cloud.vm.VmWorkJobHandlerP
roxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
ewcstack-man02-prod:    at com.cloud.vm.VirtualMachineMan
agerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4636)
ewcstack-man02-prod:    at com.cloud.vm.VmWorkJobDispatch
er.runJob(VmWorkJobDispatcher.java:103)
ewcstack-man02-prod:    at org.apache.cloudstack.framewor
k.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManag
erImpl.java:537)
ewcstack-man02-prod:    at org.apache.cloudstack.managed.
context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
ewcstack-man02-prod:    at org.apache.cloudstack.managed.
context.impl.DefaultManagedContext$1.call(DefaultManagedCont
ext.java:56)
ewcstack-man02-prod:    at org.apache.cloudstack.managed.
context.impl.DefaultManagedContext.callWithContext(DefaultMa
nagedContext.java:103)
ewcstack-man02-prod:    at org.apache.cloudstack.managed.
context.impl.DefaultManagedContext.runWithContext(DefaultMan
agedContext.java:53)
ewcstack-man02-prod:    at org.apache.cloudstack.managed.
context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
ewcstack-man02-prod:    at org.apache.cloudstack.framewor
k.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
ewcstack-man02-prod:    at java.util.concurrent.Executors
$RunnableAdapter.call(Executors.java:471)
ewcstack-man02-prod:    at java.util.concurrent.FutureTas
k.run(FutureTask.java:262)
ewcstack-man02-prod:    at java.util.concurrent.ThreadPoo
lExecutor.runWorker(ThreadPoolExecutor.java:1145)
ewcstack-man02-prod:    at java.util.concurrent.ThreadPoo
lExecutor$Worker.run(ThreadPoolExecutor.java:615)
ewcstack-man02-prod:    at java.lang.Thread.run(Thread.java:745)
ewcstack-man02-prod: 2017-01-24 18:05:32,590 DEBUG
[o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Complete async job-191379, jobStatus: FAILED,
resultCode: 0, result: rO0ABXNyAC9jb20uY2xvdWQudXRpbH
MuZXhjZXB0aW9uLkNsb3VkUnVudGltZUV4Y2VwdGlvbgAAAABWTU9yAwABSQ
ALY3NFcnJvckNvZGV4cgAaamF2YS5sYW5nLlJ1bnRpbWVFeGNlcHRpb26eXw
ZHCjSD5QIAAHhyABNqYXZhLmxhbmcuRXhjZXB0aW9u0P0fPho7HMQCAAB4cg
ATamF2YS5sYW5nLlRocm93YWJsZdXGNSc5d7jLAwAETAAFY2F1c2V0ABVMam
F2YS9sYW5nL1Rocm93YWJsZTtMAA1kZXRhaWxNZXNzYWdldAASTGphdmEvbG
FuZy9TdHJpbmc7WwAKc3RhY2tUcmFjZXQAHltMamF2YS9sYW5nL1N0YWNrVH
JhY2VFbGVtZW50O0wAFHN1cHByZXNzZWRFeGNlcHRpb25zdAAQTGphdmEvdX
RpbC9MaXN0O3hwcQB-AAh0AChVbmFibGUgdG8gbWlncmF0ZSBWTVtVc2VyfG
ktNjM4LTE3MzUtVk1ddXIAHltMamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW
50OwJGKjw8_SI5AgAAeHAAAAAUc3IAG2phdmEubGFuZy5TdGFja1RyYWNlRW
xlbWVudGEJxZomNt2FAgAESQAKbGluZU51bWJlckwADmRlY2xhcmluZ0NsYX
NzcQB-AAVMAAhmaWxlTmFtZXEAfgAFTAAKbWV0aG9kTmFtZXEAfgAFeHAAAA
lldAAmY29tLmNsb3VkLnZtLlZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGx0AB
5WaXJ0dWFsTWFjaGluZU1hbmFnZXJJbXBsLmphdmF0ABZvcmNoZXN0cmF0ZU
1pZ3JhdGVBd2F5c3EAfgAMAAARpXEAfgAOcQB-AA9xAH4AEHNxAH4ADP____
90ACZzdW4ucmVmbGVjdC5HZW5lcmF0ZWRNZXRob2RBY2Nlc3NvcjQwMnB0AA
ZpbnZva2VzcQB-AAwAAAArdAAoc3VuLnJlZmxlY3QuRGVsZWdhdGluZ01ldG
hvZEFjY2Vzc29ySW1wbHQAIURlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcG
wuamF2YXEAfgAUc3EAfgAMAAACXnQAGGphdmEubGFuZy5yZWZsZWN0Lk1ldG
hvZHQAC01ldGhvZC5qYXZhcQB-ABRzcQB-AAwAAABrdAAiY29tLmNsb3VkLn
ZtLlZtV29ya0pvYkhhbmRsZXJQcm94eXQAGlZtV29ya0pvYkhhbmRsZXJQcm
94eS5qYXZhdAAPaGFuZGxlVm1Xb3JrSm9ic3EAfgAMAAASHHEAfgAOcQB-
AA9xAH4AHnNxAH4ADAAAAGd0ACBjb20uY2xvdWQudm0uVm1Xb3JrSm9iRGlz
cGF0Y2hlcnQAGFZtV29ya0pvYkRpc3BhdGNoZXIuamF2YXQABnJ1bkpvYnNx
AH4ADAAAAhl0AD9vcmcuYXBhY2hlLmNsb3Vkc3RhY2suZnJhbWV3b3JrLmpv
YnMuaW1wbC5Bc3luY0pvYk1hbmFnZXJJbXBsJDV0ABhBc3luY0pvYk1hbmFn
ZXJJbXBsLmphdmF0AAxydW5JbkNvbnRleHRzcQB-AAwAAAAxdAA-b3JnLmFw
YWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29udGV4
dFJ1bm5hYmxlJDF0ABtNYW5hZ2VkQ29udGV4dFJ1bm5hYmxlLmphdmF0AANy
dW5zcQB-AAwAAAA4dABCb3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQu
Y29udGV4dC5pbXBsLkRlZmF1bHRNYW5hZ2VkQ29udGV4dCQxdAAaRGVmYXVs
dE1hbmFnZWRDb250ZXh0LmphdmF0AARjYWxsc3EAfgAMAAAAZ3QAQG9yZy5h
cGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuaW1wbC5EZWZhdWx0
TWFuYWdlZENvbnRleHRxAH4ALnQAD2NhbGxXaXRoQ29udGV4dHNxAH4ADAAA
ADVxAH4AMXEAfgAudAAOcnVuV2l0aENvbnRleHRzcQB-AAwAAAAudAA8b3Jn
LmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29u
dGV4dFJ1bm5hYmxlcQB-ACpxAH4AK3NxAH4ADAAAAe5xAH4AJXEAfgAmcQB-
ACtzcQB-AAwAAAHXdAAuamF2YS51dGlsLmNvbmN1cnJlbnQuRXhlY3V0b3Jz
JFJ1bm5hYmxlQWRhcHRlcnQADkV4ZWN1dG9ycy5qYXZhcQB-AC9zcQB-AAwA
AAEGdAAfamF2YS51dGlsLmNvbmN1cnJlbnQuRnV0dXJlVGFza3QAD0Z1dHVy
ZVRhc2suamF2YXEAfgArc3EAfgAMAAAEeXQAJ2phdmEudXRpbC5jb25jdXJy
ZW50LlRocmVhZFBvb2xFeGVjdXRvcnQAF1RocmVhZFBvb2xFeGVjdXRvci5q
YXZhdAAJcnVuV29ya2Vyc3EAfgAMAAACZ3QALmphdmEudXRpbC5jb25jdXJy
ZW50LlRocmVhZFBvb2xFeGVjdXRvciRXb3JrZXJxAH4AQHEAfgArc3EAfgAM
AAAC6XQAEGphdmEubGFuZy5UaHJlYWR0AAtUaHJlYWQuamF2YXEAfgArc3IA
JmphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVMaXN0_A8lMbXs
jhACAAFMAARsaXN0cQB-AAd4cgAsamF2YS51dGlsLkNvbGxlY3Rpb25zJFVu
bW9kaWZpYWJsZUNvbGxlY3Rpb24ZQgCAy173HgIAAUwAAWN0ABZMamF2YS91
dGlsL0NvbGxlY3Rpb247eHBzcgATamF2YS51dGlsLkFycmF5TGlzdHiB0h2Z
x2GdAwABSQAEc2l6ZXhwAAAAAHcEAAAAAHhxAH4ATHgAABCadwgAAAAAAAAAAHg
ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG
[o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Publish async job-191379 complete on message bus
ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG
[o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Wake up jobs related to job-191379
ewcstack-man02-prod: 2017-01-24 18:05:32,592 DEBUG
[o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Update db status for job-191379
ewcstack-man02-prod: 2017-01-24 18:05:32,594 DEBUG
[o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Wake up jobs joined with job-191379 and disjoin
all
subjobs created from job- 191379
ewcstack-man02-prod: 2017-01-24 18:05:32,604 DEBUG
[o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Done executing com.cloud.vm.VmWorkMigrateAway for
job-191379
ewcstack-man02-prod: 2017-01-24 18:05:32,608 INFO
[o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-115:ctx-0d5db528
job-13733/job-191379) Remove job-191379 from job monitoring



On 25.01.2017 15:53, Rafael Weingärtner wrote:

Did you check the ACS log files looking for some exceptions when the

migrations for this specific VM started?

On Wed, Jan 25, 2017 at 9:49 AM, Francois Scheurer <
francois.scheu...@everyware.ch> wrote:

Dear CS contributors

We use CS 4.5.1 on a 3 Clusters with XenServer 6.5. and shared primary
storage.

When we put a host in maintenance, cs evacuates all VM's on other
hosts.

But it happens regularly that some VM's are stopped instead of live
migrated.

If we restart that stopped VM's, CS will then stop them again later.

We need to start them several times until they stays up.

Is it a known issue? Is it fixed on CS 4.9.2 ?


Migrating manually all VM's is working but impracticable because too
much
time consuming.


Many thanks in advance for your help.



Best Regards
Francois






--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch

--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch

--
Rafael Weingärtner

--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch


--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch

smime.p7s
Description: S/MIME cryptographic signature

Re: cs 4.5.1, vm stopped when putting xen hosts in maintenance

Reply via email to