Ilya, Point to be noted is that my job didn't failed coz of the timeout, but rather coz of some VDI parameter at XENServer with below exception.
[SR_BACKEND_FAILURE_80, , Failed to mark VDI hidden [opterr=SR 96e879bf-93aa-47ca-e2d5-e595afbab294: error aborting existing process]] I am still digging on this error from SMlogs etc on XEN server. But in reality volume was migrated and I think that's important. I, off course, faced timeout error during initial testing and after some trial and error I realised that there is this "not so properly named parameter" called *wait* (1800 default value) that needs to be modified in end to make timeout error go away. So all in all I modified parameters as below:- migratewait: 36000 storage.pool.max.waitseconds: 36000 vm.op.cancel.interval: 36000 vm.op.cleanup.wait: 36000 wait:18000 -- Best, Makrand On Tue, Aug 9, 2016 at 6:07 AM, ilya <[email protected]> wrote: > this happened to us on non XEN hypervisor as well. > > CloudStack has a timeout for a long running jobs - which i assume in > your case - it has exceeded. > > Changing volumes table should be enough by referencing proper pool_id. > Just make sure that data size matches on both ends. > > consider changing > "copy.volume.wait" (if that does not help) also "vm.job.timeout" > > > Regards > ilya > > On 8/8/16 3:54 AM, Makrand wrote: > > Guys, > > > > My setup:- ACS 4.4.2. Hypervisor: XENserver 6.2. > > > > I tried moving a volume in running VM from primary storage A to primary > > storage B (using GUI of cloudstack). Please note, primary storage A LUN > > (LUN7)is coming out of one storage box and primary storage B LUN > (LUN14) > > is from another. > > > > For VM1 with 250GB data volume (51 GB used space), I was able to move > this > > volume without any glitch in about 26mins. > > > > But for VM2 with 250Gb data volume (182 GB used space), the migration > > continued for about ~110 mins and then failed with follwing exception in > > very end with message like:- > > > > 2016-08-06 14:30:57,481 WARN [c.c.h.x.r.CitrixResourceBase] > > (DirectAgent-192:ctx-5716ad6d) Task failed! Task record: > > uuid: 308a8326-2622-e4c5-2019-3beb > > 87b0d183 > > nameLabel: Async.VDI.pool_migrate > > nameDescription: > > allowedOperations: [] > > currentOperations: {} > > created: Sat Aug 06 12:36:27 UTC 2016 > > finished: Sat Aug 06 14:30:32 UTC 2016 > > status: failure > > residentOn: com.xensource.xenapi.Host@f242d3ca > > progress: 1.0 > > type: <none/> > > result: > > errorInfo: [SR_BACKEND_FAILURE_80, , Failed to mark VDI hidden > > [opterr=SR 96e879bf-93aa-47ca-e2d5-e595afbab294: error aborting existing > > process]] > > otherConfig: {} > > subtaskOf: com.xensource.xenapi.Task@aaf13f6f > > subtasks: [] > > > > > > So cloudstack just removed the JOB telling it failed, says the mangement > > server log. > > > > A) But when I am checking it at hyeprvisor level, the volume is on new SR > > i.e. on LUN14. Strange huh? So now the new uuid for this volume from XE > cli > > is like > > > > [root@gcx-bom-compute1 ~]# xe vbd-list > > vm-uuid=3fcb3070-e373-3cf9-d0aa-0a657142a38d > > uuid ( RO) : f15dc54a-3868-8de8-5427-314e341879c6 > > vm-uuid ( RO): 3fcb3070-e373-3cf9-d0aa-0a657142a38d > > vm-name-label ( RO): i-22-803-VM > > vdi-uuid ( RO): cc1f8e83-f224-44b7-9359-282a1c1e3db1 > > empty ( RO): false > > device ( RO): hdb > > > > B) But luckily I had the entry taken before migration and it shows > like:- > > > > uuid ( RO) : f15dc54a-3868-8de8-5427-314e341879c6 > > vm-uuid ( RO): 3fcb3070-e373-3cf9-d0aa-0a657142a38d > > vm-name-label ( RO): i-22-803-VM > > vdi-uuid ( RO): 7c073522-a077-41a0-b9a7-7b61847d413b > > empty ( RO): false > > device ( RO): hdb > > > > C) Since this failed at cloudstack, the DB is still holding old value. > > Here is current volume table entry in DB > > > > id: 1004 > >> account_id: 22 > >> domain_id: 15 > >> pool_id: 18 > >> last_pool_id: NULL > >> instance_id: 803 > >> device_id: 1 > >> name: > >> cloudx_globalcloudxchange_com_W2797T2808S3112_V1462960751 > >> uuid: a8f01042-d0de-4496-98fa-a0b13648bef7 > >> size: 268435456000 > >> folder: NULL > >> path: 7c073522-a077-41a0-b9a7-7b61847d413b > >> pod_id: NULL > >> data_center_id: 2 > >> iscsi_name: NULL > >> host_ip: NULL > >> volume_type: DATADISK > >> pool_type: NULL > >> disk_offering_id: 6 > >> template_id: NULL > >> first_snapshot_backup_uuid: NULL > >> recreatable: 0 > >> created: 2016-05-11 09:59:12 > >> attached: 2016-05-11 09:59:21 > >> updated: 2016-08-06 14:30:57 > >> removed: NULL > >> state: Ready > >> chain_info: NULL > >> update_count: 42 > >> disk_type: NULL > >> vm_snapshot_chain_size: NULL > >> iso_id: NULL > >> display_volume: 1 > >> format: VHD > >> min_iops: NULL > >> max_iops: NULL > >> hv_ss_reserve: 0 > >> 1 row in set (0.00 sec) > >> > > > > > > So the path variable shows value as 7c073522-a077-41a0-b9a7-7b61847d413b > > and pool id as 18. > > > > The VM is running as of now, but I am sure the moment I will reboot, this > > volume will be gone or worst VM won't boot. This is production VM BTW. > > > > D) So I think I need to edit volume table for path and pool_id parameters > > and need to place new values in place and then reboot VM. Do I need to > make > > any more changes in DB in some other tables for same? Any comment/help is > > much appreciated. > > > > > > > > > > -- > > Best, > > Makrand > > >
