[ovirt-users] Re: Active-Passive DR: mutual for different storage domains possible?

Eyal Shenitzky Mon, 29 Jul 2019 01:18:32 -0700

On Thu, Jul 25, 2019 at 6:24 PM Gianluca Cecchi <[email protected]>
wrote:


> On Thu, Jul 25, 2019 at 2:21 PM Eyal Shenitzky <[email protected]>
> wrote:
>
>> On Thu, Jul 25, 2019 at 3:02 PM Gianluca Cecchi <
>> [email protected]> wrote:
>>
>>> On Thu, Jul 25, 2019 at 1:54 PM Eyal Shenitzky <[email protected]>
>>> wrote:
>>>
>>>>
>>>> Please notice that a automation python scripts created in order to
>>>> facilitate the DR process.
>>>> You can find them under - path/to/your/dr/folder/files.
>>>>
>>>> You can use those scripts to generate the mapping, test the generated
>>>> mapping and start the failover/failback.
>>>>
>>>> I strongly recommend to use it.
>>>>
>>>>
>>> Yes, I have used it to create the disaster_recovery_vars.yml mapping
>>> file and then populating it with the secondary site information, thanks.
>>> My doubt was about any difference in playbook actions between "failover"
>>> (3.3) and "discreet failover test" (B.1), as the executed playbook and
>>> tags are the same.
>>>
>>
>> No, the only difference is that you disable the storage replication by
>> yourself, this way you can test the failover while the other "primary" site
>> is still active.
>>
>>
> First "discreet failover test" was a success!!! Great.
> Storage domain attached, templates imported and the only VM defined at
> source correctly started (at source I configured link down for the VM,
> inherited at target, so no collisions).
> Elapsed between beginning of ovirt connection, until first template import
> has been about 6 minutes.
> ...
> Template TOL76 has been successfully imported from the given
> configuration. 7/25/19 3:26:58 PM
> Storage Domain ovsd3910 was attached to Data Center SVIZ3-DR by
> admin@internal-authz 7/25/19 3:26:46 PM
> Storage Domains were attached to Data Center SVIZ3-DR by
> admin@internal-authz 7/25/19 3:26:46 PM
> Storage Domain ovsd3910 (Data Center SVIZ3-DR) was activated by
> admin@internal-authz 7/25/19 3:26:46 PM
> ...
> Storage Pool Manager runs on Host ovh201. (Address: ovh201.), Data Center
> SVIZ3-DR. 7/25/19 3:26:36 PM
> Data Center is being initialized, please wait for initialization to
> complete. 7/25/19 3:23:53 PM
> Storage Domain ovsd3910 was added by admin@internal-authz 7/25/19 3:20:43
> PM
> Disk Profile ovsd3910 was successfully added (User: admin@internal-authz).
> 7/25/19 3:20:42 PM
> User admin@internal-authz connecting from '10.4.192.43' using session
> 'xxx' logged in. 7/25/19 3:20:35 PM
>
> Some notes:
>
> 1) iSCSI multipath
> my storage domains are iSCSI based and my hosts have two network cards to
> reach the storage.
> I'm using EQL that doesn't support bonding and has one portal that all
> initiators use.
> So in my primary env I configured "iSCSI Multipathing" tab in Compute -->
> Datacenter --> Datacenter_Name window.
> But this tab appears only when you activate the storage.
> So during the ansible playbook run the iSCSI connection has been activated
> through the "default" iscsi interface
> I can then:
> - configure "iSCSI Multipathing"
> - shutdown VM
> - put host into maintenance
> - remove the default iSCSI session that has not been removed on host
> iscsiadm -m session -r 6 -u
> - activate host
> now I have:
> [root@ov201 ~]# iscsiadm -m session
> tcp: [10] 10.10.100.8:3260,1
> iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910
> (non-flash)
> tcp: [9] 10.10.100.8:3260,1
> iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910
> (non-flash)
> [root@ov201 ~]#
> with
> # multipath -l
> 364817197c52fd899acbe051e0377dd5b dm-29 EQLOGIC ,100E-00
> size=1.0T features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=0 status=active
>   |- 23:0:0:0 sdb 8:16 active undef running
>   `- 24:0:0:0 sdc 8:32 active undef running
> - start vm
>
> The I do a cleanup:
> 1. Detach the storage domains from the secondary site.
> 2. Enable storage replication between the primary and secondary storage
> domains.
>
> The storage domain remains as "unattached" in DR environment
>
> Then I executed the test again and during connection I got this error
> about 40 seconds after run of playbook
>
> TASK [oVirt.disaster-recovery : Import iSCSI storage domain]
> ***************************************
> An exception occurred during task execution. To see the full traceback,
> use -vvv. The error was: Error: Fault reason is "Operation Failed". Fault
> detail is "[]". HTTP response code is 400.
> failed: [localhost]
> (item=iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910)
> => {"ansible_loop_var": "dr_target", "changed": false, "dr_target":
> "iqn.2001-05.com.equallogic:4-771816-99d82fc59-5bdd77031e05beac-ovsd3910",
> "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP
> response code is 400."}
>
> In webadmin gui of DR env I see:
>
> VDSM ov201 command CleanStorageDomainMetaDataVDS failed: Cannot obtain
> lock: "id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
> Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
> offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
> by another host')" 7/25/19 4:50:43 PM
> What could be the cause of this?
>
> In vdsm.log:
>
> 2019-07-25 16:50:43,196+0200 INFO  (jsonrpc/1) [vdsm.api] FINISH
> forcedDetachStorageDomain error=Cannot obtain lock:
> "id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
> Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
> offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
> by another host')" from=::ffff:10.4.192.79,49038, flow_id=4bd330d1,
> task_id=c0dfac81-5c58-427d-a7d0-e8c695448d27 (api:52)
> 2019-07-25 16:50:43,196+0200 ERROR (jsonrpc/1) [storage.TaskManager.Task]
> (Task='c0dfac81-5c58-427d-a7d0-e8c695448d27') Unexpected error (task:875)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882,
> in _run
>     return fn(*args, **kargs)
>   File "<string>", line 2, in forcedDetachStorageDomain
>   File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 50, in
> method
>     ret = func(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 856,
> in forcedDetachStorageDomain
>     self._detachStorageDomainFromOldPools(sdUUID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 834,
> in _detachStorageDomainFromOldPools
>     dom.acquireClusterLock(host_id)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 910, in
> acquireClusterLock
>     self._manifest.acquireDomainLock(hostID)
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/sd.py", line 527, in
> acquireDomainLock
>     self._domainLock.acquire(hostID, self.getDomainLease())
>   File "/usr/lib/python2.7/site-packages/vdsm/storage/clusterlock.py",
> line 419, in acquire
>     "Cannot acquire %s" % (lease,), str(e))
> AcquireLockFailure: Cannot obtain lock:
> "id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
> Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
> offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
> by another host')"
> 2019-07-25 16:50:43,196+0200 INFO  (jsonrpc/1) [storage.TaskManager.Task]
> (Task='c0dfac81-5c58-427d-a7d0-e8c695448d27') aborting: Task is aborted:
> 'Cannot obtain lock: "id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243,
> out=Cannot acquire Lease(name=\'SDM\',
> path=\'/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases\', offset=1048576),
> err=(-243, \'Sanlock resource not acquired\', \'Lease is held by another
> host\')"' - code 651 (task:1181)
> 2019-07-25 16:50:43,197+0200 ERROR (jsonrpc/1) [storage.Dispatcher] FINISH
> forcedDetachStorageDomain error=Cannot obtain lock:
> "id=56eadc97-5731-40cf-8409-aff58d8ffd11, rc=-243, out=Cannot acquire
> Lease(name='SDM', path='/dev/56eadc97-5731-40cf-8409-aff58d8ffd11/leases',
> offset=1048576), err=(-243, 'Sanlock resource not acquired', 'Lease is held
> by another host')" (dispatcher:83)
> 2019-07-25 16:50:43,197+0200 INFO  (jsonrpc/1) [jsonrpc.JsonRpcServer] RPC
> call StorageDomain.detach failed (error 651) in 24.12 seconds (__init__:312)
> 2019-07-25 16:50:44,180+0200 INFO  (jsonrpc/6) [api.host] START getStats()
> from=::ffff:10.4.192.79,49038 (api:48)
> 2019-07-25 16:50:44,222+0200 INFO  (jsonrpc/6) [vdsm.api] START
> repoStats(domains=()) from=::ffff:10.4.192.79,49038,
> task_id=8a7a0302-4ee3-49a8-a3f7-f9636a123765 (api:48)
> 2019-07-25 16:50:44,222+0200 INFO  (jsonrpc/6) [vdsm.api] FINISH repoStats
> return={} from=::ffff:10.4.192.79,49038,
> task_id=8a7a0302-4ee3-49a8-a3f7-f9636a123765 (api:54)
> 2019-07-25 16:50:44,223+0200 INFO  (jsonrpc/6) [vdsm.api] START
> multipath_health() from=::ffff:10.4.192.79,49038,
> task_id=fb09923c-0888-4c3f-9b8a-a7750592da22 (api:48)
> 2019-07-25 16:50:44,223+0200 INFO  (jsonrpc/6) [vdsm.api] FINISH
> multipath_health return={} from=::ffff:10.4.192.79,49038,
> task_id=fb09923c-0888-4c3f-9b8a-a7750592da22 (api:54)
>
> After putting host into maintenance + reboot of the host and re-run of
> playbook, all went well again..
>

This error is related to Sanlock, it looks like you needed to wait longer
for the leases to expire.



>
> 2) Mac Address pools
> I notice that the imported VM has preserved the link state (down in my
> case) but not the mac address.
> The mac address is the one defined in my target engine, that is different
> from source engine to avoid overlap of mac adresses
> Is this an option I can customize?
>

You can customize the MAC of the VM in :
compute -> virtual machines -> network interfaces -> create new with custom
mac

In the DR, in defaults/main.yaml you can set the following variable to
false -
# Indicate whether to reset a mac pool of a VM on register.
dr_reset_mac_pool: "True"


> In general a VM could have problems when changing its mac address...
>
> 3) Clean up dest DC after "discreet failover test"
> The guide says:
> 1. Detach the storage domains from the secondary site.
> 2. Enable storage replication between the primary and secondary storage
> domains.
>
> better to also restart the DR hosts?
>

No need for that.


>
> 4) VM consistency
> Can we say that all the imported VMs will be "crash consistent"?
>

I am not sure what you mean by "crash consistent", is it means "highly
available" is oVirt language?



>
> Thanks ,
> Gianluca
>
>

-- 
Regards,
Eyal Shenitzky

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/5SDDECNRDDYKCE7HYMGPC4H37JVE4MZY/

[ovirt-users] Re: Active-Passive DR: mutual for different storage domains possible?

Reply via email to