[ovirt-users] Re: DR on hyperconverged deployment

2020-04-13 Thread Eyal Shenitzky
Sorry for the late response, please see my comments inline.


On Fri, 3 Apr 2020 at 02:30, wodel youchi  wrote:

> Hi,
> Thank you for your reply, but I already did all of that but I didn't
> understand everything and I got several problems on the way of doing the
> fail-over then a fail-back. I am writing this mail in hope to clarify those
> things.
> I will try to express my self correctly and give as much details as that I
> can.
>
> *The LAB :*
> My LAB contains two single-host oVirt-HCI platforms, one to act as the 
> *primary
> site  (the source)* the second as the *disaster-recovery site (the
> target)*.
> Each HCI site contains one data domain, the domain is comprised of a
> gluster volume which is backed by one brick. The volumes (source and
> target) have the same size, and they have been created within the process
> of the HCI deployment.
> *At the end of the deployment, I detached the deleted the gluster data
> domain on the target site, but I didn't delete the target volume.*
>
> My goal is to test the disaster recovery (active-passive DR to be precise)
> process on an HCI implementation. To test the fail-over and the fail-back
> process entirely.
>
> *Documentation*
>
> RHHI 1.7
> Maintaining_Red_Hat_Hyperconverged_Infrastructure_for_Virtualization-en-US
> and I started my implementation
>
> I prepared all the ansible playbooks.
>
>
> *The Test procedure:*
>
> *Fail-over*
>
> 1 - Create a Windows10 VM on the source volume.
>
> 2 - Replicate to the DR site.
>
> 3 - Execute the fail-over procedure and test if the WM is usable in the
> target platform.
>
> 4 - Detach and Delete the data domain in the target platform without
> touching the target volume
>
> 5 - Make changes to the Win10 VM on the source volume (creating files and
> installing software)
>
> 6 - Replicate again to the DR site then execute another fail-over and see
> if the modification were synced.
>
>
> *Fail-back*
>
> 1 - Make changes to the Win10 VM on the target volume (deleting files) *and
> especially creating a snapshot*
>
> 2 - Detach and Delete the data domain in the source platform without
> touching the source volume.
>
> 3 - Replicate to the source site.
>
> 4 - Execute the clean up playbook
>
> 5 - Execute the fail-over and WM is usable in the source platform and that
> the modifications were synced especially the snapshot
>
>
> *Things I need to confirm :*
>
> 1 - When creating the geo-replication from the primary site to the target
> site, we get to a point where we have to create "*Scheduling regular
> backups using geo-replication*", from my understanding it's like a cron
> job that starts the geo-replication at a specific time (or day time), and
> from my testing, the geo-replication starts syncing at that precise time
> and when its "*CRAWL STATUS*" reaches "Changelog Crawl" it stops the
> synchronization. In other terms when the geo-replication reaches the same
> date as the check-point (the specific time).
>
> The smallest time you can get from the configuration window is 24hours,
> which means in the event of a disaster, you can at most recover the data
> from the day before. *Is this correct?*
>

If I understand correctly, you are talking about the storage replication
that performed on the storage layer.
This question should refer to the Gluster team/community.


>
> *Problems encountered during the test:*
>
> *Fail-over*
>
> 1 - When executing the fail-over the first time (ansible-playbook
> dr-rhv-failover.yml --tags "fail_over"), the import of the target data
> domain failed with the error : *An exception occurred during task
> execution. To see the full traceback, use -vvv. The error was:
> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
> "[Error in creating a Storage Domain. The selected storage path is not
> empty (probably contains another Storage Domain). Either remove the
> existing Storage Domain from this path, or change the Storage path).]".
> HTTP response code is 400. *I tried manually to import the domain from
> oVirt's admin console and I got the same error. so I did the following
>
> - I deleted the target volume and the brick and the sub-directory of the
> brick.
>
> - I recreated the volume from scratch.
>
> - I redid the geo-replication synchronization from the source.
>
> - I executed the fail-over and this time the target data domain was
> imported correctly and the Win10 VM was started correctly.
>

First, I suggest you to use the python scripts that help you to automate
the DR process, you can find them under -
../your-dr-folder/files  -> please use './ovirt-dr -h' to see the
available options.

According to the error, it seems like maybe you didn't wait for the sanlock
lease to expire, you must wait around 80 seconds before you are trying to
use it.



> 2 - I detached then deleted the target data domain without touching the
> target volume, then I made change to the Win10 VM on the source site, then
> I created a new schedule of geo-replication, and after the 

[ovirt-users] Re: DR on hyperconverged deployment

2020-04-02 Thread wodel youchi
Hi,
Thank you for your reply, but I already did all of that but I didn't
understand everything and I got several problems on the way of doing the
fail-over then a fail-back. I am writing this mail in hope to clarify those
things.
I will try to express my self correctly and give as much details as that I
can.

*The LAB :*
My LAB contains two single-host oVirt-HCI platforms, one to act as the *primary
site  (the source)* the second as the *disaster-recovery site (the target)*.
Each HCI site contains one data domain, the domain is comprised of a
gluster volume which is backed by one brick. The volumes (source and
target) have the same size, and they have been created within the process
of the HCI deployment.
*At the end of the deployment, I detached the deleted the gluster data
domain on the target site, but I didn't delete the target volume.*

My goal is to test the disaster recovery (active-passive DR to be precise)
process on an HCI implementation. To test the fail-over and the fail-back
process entirely.

*Documentation*

RHHI 1.7
Maintaining_Red_Hat_Hyperconverged_Infrastructure_for_Virtualization-en-US
and I started my implementation

I prepared all the ansible playbooks.


*The Test procedure:*

*Fail-over*

1 - Create a Windows10 VM on the source volume.

2 - Replicate to the DR site.

3 - Execute the fail-over procedure and test if the WM is usable in the
target platform.

4 - Detach and Delete the data domain in the target platform without
touching the target volume

5 - Make changes to the Win10 VM on the source volume (creating files and
installing software)

6 - Replicate again to the DR site then execute another fail-over and see
if the modification were synced.


*Fail-back*

1 - Make changes to the Win10 VM on the target volume (deleting files) *and
especially creating a snapshot*

2 - Detach and Delete the data domain in the source platform without
touching the source volume.

3 - Replicate to the source site.

4 - Execute the clean up playbook

5 - Execute the fail-over and WM is usable in the source platform and that
the modifications were synced especially the snapshot


*Things I need to confirm :*

1 - When creating the geo-replication from the primary site to the target
site, we get to a point where we have to create "*Scheduling regular
backups using geo-replication*", from my understanding it's like a cron job
that starts the geo-replication at a specific time (or day time), and from
my testing, the geo-replication starts syncing at that precise time and
when its "*CRAWL STATUS*" reaches "Changelog Crawl" it stops the
synchronization. In other terms when the geo-replication reaches the same
date as the check-point (the specific time).

The smallest time you can get from the configuration window is 24hours,
which means in the event of a disaster, you can at most recover the data
from the day before. *Is this correct?*



*Problems encountered during the test:*

*Fail-over*

1 - When executing the fail-over the first time (ansible-playbook
dr-rhv-failover.yml --tags "fail_over"), the import of the target data
domain failed with the error : *An exception occurred during task
execution. To see the full traceback, use -vvv. The error was:
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
"[Error in creating a Storage Domain. The selected storage path is not
empty (probably contains another Storage Domain). Either remove the
existing Storage Domain from this path, or change the Storage path).]".
HTTP response code is 400. *I tried manually to import the domain from
oVirt's admin console and I got the same error. so I did the following

- I deleted the target volume and the brick and the sub-directory of the
brick.

- I recreated the volume from scratch.

- I redid the geo-replication synchronization from the source.

- I executed the fail-over and this time the target data domain was
imported correctly and the Win10 VM was started correctly.


2 - I detached then deleted the target data domain without touching the
target volume, then I made change to the Win10 VM on the source site, then
I created a new schedule of geo-replication, and after the replication I
executed another fail-over.

- The Win10 VM started successfully and the changes made were synced.


*Fail-back*
1 - The documentation doesn't explain the fail-back procedure thoroughly.
It doesn't explain what does the dr-cleanup.yml do?

2 - When launching the fail-back playbook at some point I get this message :


*TASK [oVirt.disaster-recovery : Failback Replication Sync pause]
[oVirt.disaster-recovery
: Failback Replication Sync pause][Failback Replication Sync] Please press
ENTER once the destination storage domains are ready to be used for the
destination setup:*
What does this mean?

3 - I did some changed on the Win10 VM and I created snapshot of that VM.

4.a - To replicate the data 

[ovirt-users] Re: DR on hyperconverged deployment

2020-04-02 Thread Eyal Shenitzky
If you intention is to use active-passive disaster recovery solution, you
can have a look at the following guild:
https://ovirt.org/documentation/disaster-recovery-guide/active_passive_overview.html

On Wed, 1 Apr 2020 at 16:42, wodel youchi  wrote:

> Hi,
>
> I am trying to configure and test disaster recovery on ovirt HCI
>
> And to understand how it works
> What is the minimum RPO and its relationship with checkpoint
> And what are the steps to fail back
>
> Regards
>
> Le mer. 1 avr. 2020 14:16, Eyal Shenitzky  a écrit :
>
>> Hi Wodel,
>>
>> Can you please explain what you are trying to do?
>> I am not sure I understand it from your question.
>>
>> On Wed, 1 Apr 2020 at 12:55, wodel youchi  wrote:
>>
>>> Hi,
>>>
>>> I re-did the test and it seems that the minimum RPO is one day and if
>>> someone could confirm that would be great
>>>
>>> As for the snapshot this time it was synced
>>>
>>> Then I tried to test the fail back and I found that the documentation is
>>> not clear :
>>> - it is not clear what is the purpose of the dr-clear playbook
>>> - it is not clear what does mean : put the target volume in read write
>>> mode and source volume in read-only mode
>>> - Do we have to sync back using a new georeplication link from the dr
>>> volume to source volume?
>>> I tried to so, in my first trial I forced the creation of the back
>>> georeplication without deleting the content of the source volume then I
>>> started the replication manually  (I didn't use the checkpoint) and I
>>> stopped the replication once it reached the changelog state, but I couldn't
>>> import the source volume I got the error : volume is not empty
>>>
>>> In my second trial I deleted and recreated the source volume from
>>> scratch and the i started the replication back manually at the end I got
>>> the error
>>>
>>> In my third trial I deleted the source volume and recreated it from
>>> scratch but I replicated back using the check point method and this time
>>> the fail back worked.
>>>
>>>  Could someone sheds some light on this?
>>>
>>> Thank you
>>> Regards.
>>>
>>> Le dim. 29 mars 2020 19:19, wodel youchi  a
>>> écrit :
>>>
 Hi,

 Need to understand somethings about DR on oVirt-HI


- What does mean : Scheduling regular backups using geo-replication
(point 3.3.4 RHHI 1.7 Doc Maintaining RHHI) :
   - Does this mean creating a check-point?
   - If yes, does this mean that the geo-replication process will
   sync data up to that check-point and then stops the synchronization, 
 then
   repeat the same cycle the day after? does this mean that the minimum 
 RPO is
   one day?
- I created a snapshot of a VM on the source Manager, I synced the
volume then I executed a DR, The VM was started on the Target Manager 
 but
the VM didn't have its snapshot, any idea???


 Regards, be safe.

>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2MSZUYT2GE33IVUKGVYHLAO33ZFMJ7N/
>>>
>>
>>
>> --
>> Regards,
>> Eyal Shenitzky
>>
>

-- 
Regards,
Eyal Shenitzky
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LSPFJDYXF4A3LL275EQ6WFT7UZDNUPTR/


[ovirt-users] Re: DR on hyperconverged deployment

2020-04-01 Thread wodel youchi
Hi,

I am trying to configure and test disaster recovery on ovirt HCI

And to understand how it works
What is the minimum RPO and its relationship with checkpoint
And what are the steps to fail back

Regards

Le mer. 1 avr. 2020 14:16, Eyal Shenitzky  a écrit :

> Hi Wodel,
>
> Can you please explain what you are trying to do?
> I am not sure I understand it from your question.
>
> On Wed, 1 Apr 2020 at 12:55, wodel youchi  wrote:
>
>> Hi,
>>
>> I re-did the test and it seems that the minimum RPO is one day and if
>> someone could confirm that would be great
>>
>> As for the snapshot this time it was synced
>>
>> Then I tried to test the fail back and I found that the documentation is
>> not clear :
>> - it is not clear what is the purpose of the dr-clear playbook
>> - it is not clear what does mean : put the target volume in read write
>> mode and source volume in read-only mode
>> - Do we have to sync back using a new georeplication link from the dr
>> volume to source volume?
>> I tried to so, in my first trial I forced the creation of the back
>> georeplication without deleting the content of the source volume then I
>> started the replication manually  (I didn't use the checkpoint) and I
>> stopped the replication once it reached the changelog state, but I couldn't
>> import the source volume I got the error : volume is not empty
>>
>> In my second trial I deleted and recreated the source volume from scratch
>> and the i started the replication back manually at the end I got the error
>>
>> In my third trial I deleted the source volume and recreated it from
>> scratch but I replicated back using the check point method and this time
>> the fail back worked.
>>
>>  Could someone sheds some light on this?
>>
>> Thank you
>> Regards.
>>
>> Le dim. 29 mars 2020 19:19, wodel youchi  a
>> écrit :
>>
>>> Hi,
>>>
>>> Need to understand somethings about DR on oVirt-HI
>>>
>>>
>>>- What does mean : Scheduling regular backups using geo-replication
>>>(point 3.3.4 RHHI 1.7 Doc Maintaining RHHI) :
>>>   - Does this mean creating a check-point?
>>>   - If yes, does this mean that the geo-replication process will
>>>   sync data up to that check-point and then stops the synchronization, 
>>> then
>>>   repeat the same cycle the day after? does this mean that the minimum 
>>> RPO is
>>>   one day?
>>>- I created a snapshot of a VM on the source Manager, I synced the
>>>volume then I executed a DR, The VM was started on the Target Manager but
>>>the VM didn't have its snapshot, any idea???
>>>
>>>
>>> Regards, be safe.
>>>
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2MSZUYT2GE33IVUKGVYHLAO33ZFMJ7N/
>>
>
>
> --
> Regards,
> Eyal Shenitzky
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GYILSVZYU7CPLJF3EMD36TLBZPED5TS4/


[ovirt-users] Re: DR on hyperconverged deployment

2020-04-01 Thread Eyal Shenitzky
Hi Wodel,

Can you please explain what you are trying to do?
I am not sure I understand it from your question.

On Wed, 1 Apr 2020 at 12:55, wodel youchi  wrote:

> Hi,
>
> I re-did the test and it seems that the minimum RPO is one day and if
> someone could confirm that would be great
>
> As for the snapshot this time it was synced
>
> Then I tried to test the fail back and I found that the documentation is
> not clear :
> - it is not clear what is the purpose of the dr-clear playbook
> - it is not clear what does mean : put the target volume in read write
> mode and source volume in read-only mode
> - Do we have to sync back using a new georeplication link from the dr
> volume to source volume?
> I tried to so, in my first trial I forced the creation of the back
> georeplication without deleting the content of the source volume then I
> started the replication manually  (I didn't use the checkpoint) and I
> stopped the replication once it reached the changelog state, but I couldn't
> import the source volume I got the error : volume is not empty
>
> In my second trial I deleted and recreated the source volume from scratch
> and the i started the replication back manually at the end I got the error
>
> In my third trial I deleted the source volume and recreated it from
> scratch but I replicated back using the check point method and this time
> the fail back worked.
>
>  Could someone sheds some light on this?
>
> Thank you
> Regards.
>
> Le dim. 29 mars 2020 19:19, wodel youchi  a
> écrit :
>
>> Hi,
>>
>> Need to understand somethings about DR on oVirt-HI
>>
>>
>>- What does mean : Scheduling regular backups using geo-replication
>>(point 3.3.4 RHHI 1.7 Doc Maintaining RHHI) :
>>   - Does this mean creating a check-point?
>>   - If yes, does this mean that the geo-replication process will
>>   sync data up to that check-point and then stops the synchronization, 
>> then
>>   repeat the same cycle the day after? does this mean that the minimum 
>> RPO is
>>   one day?
>>- I created a snapshot of a VM on the source Manager, I synced the
>>volume then I executed a DR, The VM was started on the Target Manager but
>>the VM didn't have its snapshot, any idea???
>>
>>
>> Regards, be safe.
>>
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2MSZUYT2GE33IVUKGVYHLAO33ZFMJ7N/
>


-- 
Regards,
Eyal Shenitzky
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WUMLTR5DDQOPM4SOPNHRGW5LTCLF63NN/


[ovirt-users] Re: DR on hyperconverged deployment

2020-04-01 Thread wodel youchi
Hi,

I re-did the test and it seems that the minimum RPO is one day and if
someone could confirm that would be great

As for the snapshot this time it was synced

Then I tried to test the fail back and I found that the documentation is
not clear :
- it is not clear what is the purpose of the dr-clear playbook
- it is not clear what does mean : put the target volume in read write mode
and source volume in read-only mode
- Do we have to sync back using a new georeplication link from the dr
volume to source volume?
I tried to so, in my first trial I forced the creation of the back
georeplication without deleting the content of the source volume then I
started the replication manually  (I didn't use the checkpoint) and I
stopped the replication once it reached the changelog state, but I couldn't
import the source volume I got the error : volume is not empty

In my second trial I deleted and recreated the source volume from scratch
and the i started the replication back manually at the end I got the error

In my third trial I deleted the source volume and recreated it from scratch
but I replicated back using the check point method and this time the fail
back worked.

 Could someone sheds some light on this?

Thank you
Regards.

Le dim. 29 mars 2020 19:19, wodel youchi  a écrit :

> Hi,
>
> Need to understand somethings about DR on oVirt-HI
>
>
>- What does mean : Scheduling regular backups using geo-replication
>(point 3.3.4 RHHI 1.7 Doc Maintaining RHHI) :
>   - Does this mean creating a check-point?
>   - If yes, does this mean that the geo-replication process will sync
>   data up to that check-point and then stops the synchronization, then 
> repeat
>   the same cycle the day after? does this mean that the minimum RPO is one
>   day?
>- I created a snapshot of a VM on the source Manager, I synced the
>volume then I executed a DR, The VM was started on the Target Manager but
>the VM didn't have its snapshot, any idea???
>
>
> Regards, be safe.
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/N2MSZUYT2GE33IVUKGVYHLAO33ZFMJ7N/