[ovirt-users] Re: Fix corrupt self-hosted engine

Yedidyah Bar David Sat, 21 Nov 2020 22:42:51 -0800

On Thu, Nov 19, 2020 at 11:33 PM Alex K <rightkickt...@gmail.com> wrote:


> For the records,
>
> After having fixed the major fs issues with guestfish and since the DB was
> not starting up, I removed everything from DB data dir and recreated it as
> below:
>
> rm -rf /var/opt/rh/rh-postgresql10/lib/pgsql/data/*
> /opt/rh/rh-postgresql10/root/usr/bin/postgresql-setup --initdb
> systemctl restart rh-postgresql10-postgresql.service
>

Generally speaking, this should not be needed. --provision-all-databases
should do this for you.


>
> Then proceeded with the restoration, where I requested to provision all
> missing databases:
> engine-backup --mode=restore --file=engine-backup.gz
> --provision-all-databases \
> --log=restore.log --restore-permissions
>
> Following this, ran engine-setup, as instructed from the restore
> operation.
> Gained engine web access and saw the same running VMs were shown as up
> without issues.
> I only observed one VM not able to start due to illegal volume, but that's
> another story.
>

Glad to hear that, thanks for the report!

Best regards,


>
>
> On Thu, Nov 19, 2020 at 9:42 PM Alex K <rightkickt...@gmail.com> wrote:
>
>>
>>
>> On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkickt...@gmail.com> wrote:
>>
>>> Hi Didi,
>>>
>>> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <d...@redhat.com>
>>> wrote:
>>>
>>>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a corrupt self-hosted engine (with several file system errors,
>>>>> postgres not able to start) and thus it does not give access to the web 
>>>>> UI.
>>>>> This happened following an unlucky split brain resolution (I am running 2
>>>>> nodes). The two hosts are running VMs also which I would like to keep
>>>>> running as they are needed.
>>>>>
>>>>> When trying to boot into rescue mode (using
>>>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing
>>>>> else.
>>>>>
>>>>
>>>> This means that more than just the DB is corrupt...
>>>>
>>>>
>>>>>
>>>>> I have backups of engine files with scope all (using the engine-backup
>>>>> tool).
>>>>> What is the best approach to try and fix the engine or redeploy.
>>>>>
>>>>
>>>> If you are careful, and know what you are doing, you can try something
>>>> like the following. I am not giving many details, hopefully you can find on
>>>> the net tutorials about how to use the things I suggest:
>>>>
>>>> 1. Move to global maintenance
>>>>
>>>> 2. Stop the current dead vm (if needed)
>>>>
>>>> 3. Find current vm conf, edit it to boot from a rescue iso image of
>>>> your preference or from net/PXE etc., and start the vm with '--vm-conf'
>>>> pointing to your edited file.
>>>>
>>>> 4. Connect a console (hosted-engine --console, or 'virsh console', or
>>>> use '--add-console-password' and remote viewer, if needed)
>>>>
>>>> 5. Clean the disk and install the OS, oVirt, etc.
>>>>
>>>> 6. Copy your backup into the vm and restore with engine-backup
>>>>
>>>> 7. Then cleanly stop the machine, exit global maint, and let HA start
>>>> it (or start it yourself with --vm-start).
>>>>
>>>> At the time, we had a bug [1] to document this. The result is [2]. It
>>>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db is
>>>> dead but fs is ok).
>>>> For something somewhat similar to what you want, see also [3], which
>>>> uses guestfish. Might be useful, depending on how badly your disk is
>>>> corrupted.
>>>>
>>> I went with the guestfish approach. It has fixed some fs issues and now
>>> the yum etc seem fine apart from postgres.
>>> I had tried previously to uninstall/install packages so I ended
>>> installing them again with yum install ovirt\*setup\*.
>>> Now I think I have to run engine-setup but I get the error:
>>>
>>>  Failed to execute stage 'Environment setup': Cannot connect to Engine
>>> database using existing credentials: engine@localhost:5432
>>>
>> Seems that I need to have psql running to be able to run engine-backup
>> --mode=restore. Are there any steps how one could manually prepare pgsql
>> for ovirt so as to attempt restoration?
>>
>>>
>>> So I guess I need to follow [2]. What do you think?
>>>
>>>
>>>> How did you run into a split brain? There is a lock on the shared
>>>> storage that should prevent this.
>>>>
>>>> Good luck and best regards,
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710
>>>> [2]
>>>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine
>>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4
>>>> --
>>>> Didi
>>>>
>>> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/SU6V565Y5GAZ67FF5MUDGFLEJ2L2LZV7/
>


-- 
Didi

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZFK2MTVCTZOTLYAXR37AAGTFBWLPKD6S/

[ovirt-users] Re: Fix corrupt self-hosted engine

Reply via email to