Re: Mess after volume migration.

Yiping Zhang Tue, 09 Aug 2016 15:28:56 -0700

I encountered the same problem a few months ago.  With help from this list, I 
fixed my problems without any data loss, and posted my solution on the list.  
If you search the following subject line “corrupt DB after VM live migration 
with storage migration”,  you should see my posts.


Good luck

Yiping

On 8/9/16, 3:30 AM, "Makrand" <[email protected]> wrote:

    Ilya,
    
    Point to be noted is that my job didn't failed coz of the  timeout, but
    rather coz of some VDI parameter at XENServer with below exception.
    
    [SR_BACKEND_FAILURE_80, , Failed to mark VDI hidden [opterr=SR
    96e879bf-93aa-47ca-e2d5-e595afbab294: error aborting existing process]]
    
    I am still digging on this error from SMlogs etc on XEN server. But in
    reality volume was migrated and I think that's important.
    
    
    I, off course, faced timeout error during initial testing and after some
    trial and error I realised that there is this "not so properly named
    parameter" called *wait* (1800 default value) that needs to be modified in
    end to make timeout error go away.
    
    So all in all I modified parameters as below:-
    
    migratewait: 36000
    storage.pool.max.waitseconds: 36000
    vm.op.cancel.interval: 36000
    vm.op.cleanup.wait: 36000
    wait:18000
    
    
    
    
    
    --
    Best,
    Makrand
    
    
    On Tue, Aug 9, 2016 at 6:07 AM, ilya <[email protected]> wrote:
    
    > this happened to us on non XEN hypervisor as well.
    >
    > CloudStack has a timeout for a long running jobs - which i assume in
    > your case - it has exceeded.
    >
    > Changing volumes table should be enough by referencing proper pool_id.
    > Just make sure that data size matches on both ends.
    >
    > consider changing
    > "copy.volume.wait" (if that does not help) also "vm.job.timeout"
    >
    >
    > Regards
    > ilya
    >
    > On 8/8/16 3:54 AM, Makrand wrote:
    > > Guys,
    > >
    > > My setup:- ACS 4.4.2. Hypervisor: XENserver 6.2.
    > >
    > > I tried moving a volume in running VM from primary storage A to primary
    > > storage B (using GUI of cloudstack). Please note, primary storage A LUN
    > > (LUN7)is coming out of one storage box and  primary storage  B LUN
    > (LUN14)
    > > is from another.
    > >
    > > For VM1 with 250GB data volume (51 GB used space), I was able to move
    > this
    > > volume without any glitch in about 26mins.
    > >
    > > But for VM2 with 250Gb data volume (182 GB used space), the migration
    > >  continued for about ~110 mins and then failed with follwing exception 
in
    > > very end with message like:-
    > >
    > > 2016-08-06 14:30:57,481 WARN  [c.c.h.x.r.CitrixResourceBase]
    > > (DirectAgent-192:ctx-5716ad6d) Task failed! Task record:
    > > uuid: 308a8326-2622-e4c5-2019-3beb
    > > 87b0d183
    > >            nameLabel: Async.VDI.pool_migrate
    > >      nameDescription:
    > >    allowedOperations: []
    > >    currentOperations: {}
    > >              created: Sat Aug 06 12:36:27 UTC 2016
    > >             finished: Sat Aug 06 14:30:32 UTC 2016
    > >               status: failure
    > >           residentOn: com.xensource.xenapi.Host@f242d3ca
    > >             progress: 1.0
    > >                 type: <none/>
    > >               result:
    > >            errorInfo: [SR_BACKEND_FAILURE_80, , Failed to mark VDI 
hidden
    > > [opterr=SR 96e879bf-93aa-47ca-e2d5-e595afbab294: error aborting existing
    > > process]]
    > >          otherConfig: {}
    > >            subtaskOf: com.xensource.xenapi.Task@aaf13f6f
    > >             subtasks: []
    > >
    > >
    > > So cloudstack just removed the JOB telling it failed, says the mangement
    > > server log.
    > >
    > > A) But when I am checking it at hyeprvisor level, the volume is on new 
SR
    > > i.e. on LUN14. Strange huh? So now the new uuid for this volume from XE
    > cli
    > > is like
    > >
    > > [root@gcx-bom-compute1 ~]# xe vbd-list
    > > vm-uuid=3fcb3070-e373-3cf9-d0aa-0a657142a38d
    > > uuid ( RO)             : f15dc54a-3868-8de8-5427-314e341879c6
    > >           vm-uuid ( RO): 3fcb3070-e373-3cf9-d0aa-0a657142a38d
    > >     vm-name-label ( RO): i-22-803-VM
    > >          vdi-uuid ( RO): cc1f8e83-f224-44b7-9359-282a1c1e3db1
    > >             empty ( RO): false
    > >            device ( RO): hdb
    > >
    > > B) But luckily I had the entry taken before migration  and it shows
    > like:-
    > >
    > > uuid ( RO) : f15dc54a-3868-8de8-5427-314e341879c6
    > > vm-uuid ( RO): 3fcb3070-e373-3cf9-d0aa-0a657142a38d
    > > vm-name-label ( RO): i-22-803-VM
    > > vdi-uuid ( RO): 7c073522-a077-41a0-b9a7-7b61847d413b
    > > empty ( RO): false
    > > device ( RO): hdb
    > >
    > > C) Since this failed at cloudstack, the DB is still holding old value.
    > > Here is current volume table entry in DB
    > >
    > > id: 1004
    > >>                 account_id: 22
    > >>                  domain_id: 15
    > >>                    pool_id: 18
    > >>               last_pool_id: NULL
    > >>                instance_id: 803
    > >>                  device_id: 1
    > >>                       name:
    > >> cloudx_globalcloudxchange_com_W2797T2808S3112_V1462960751
    > >>                       uuid: a8f01042-d0de-4496-98fa-a0b13648bef7
    > >>                       size: 268435456000
    > >>                     folder: NULL
    > >>                       path: 7c073522-a077-41a0-b9a7-7b61847d413b
    > >>                     pod_id: NULL
    > >>             data_center_id: 2
    > >>                 iscsi_name: NULL
    > >>                    host_ip: NULL
    > >>                volume_type: DATADISK
    > >>                  pool_type: NULL
    > >>           disk_offering_id: 6
    > >>                template_id: NULL
    > >> first_snapshot_backup_uuid: NULL
    > >>                recreatable: 0
    > >>                    created: 2016-05-11 09:59:12
    > >>                   attached: 2016-05-11 09:59:21
    > >>                    updated: 2016-08-06 14:30:57
    > >>                    removed: NULL
    > >>                      state: Ready
    > >>                 chain_info: NULL
    > >>               update_count: 42
    > >>                  disk_type: NULL
    > >>     vm_snapshot_chain_size: NULL
    > >>                     iso_id: NULL
    > >>             display_volume: 1
    > >>                     format: VHD
    > >>                   min_iops: NULL
    > >>                   max_iops: NULL
    > >>              hv_ss_reserve: 0
    > >> 1 row in set (0.00 sec)
    > >>
    > >
    > >
    > > So the path variable shows value as 7c073522-a077-41a0-b9a7-7b61847d413b
    > > and pool id as 18.
    > >
    > > The VM is running as of now, but I am sure the moment I will reboot, 
this
    > > volume will be gone or worst VM won't boot. This is production VM BTW.
    > >
    > > D) So I think I need to edit volume table for path and pool_id 
parameters
    > > and need to place new values in place and then reboot VM. Do I need to
    > make
    > > any more changes in DB in some other tables for same? Any comment/help 
is
    > > much appreciated.
    > >
    > >
    > >
    > >
    > > --
    > > Best,
    > > Makrand
    > >
    >

Re: Mess after volume migration.

Reply via email to