On Sun, Nov 22, 2020 at 8:57 AM Yedidyah Bar David <d...@redhat.com> wrote:

> On Thu, Nov 19, 2020 at 9:43 PM Alex K <rightkickt...@gmail.com> wrote:
>
>>
>>
>> On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkickt...@gmail.com> wrote:
>>
>>> Hi Didi,
>>>
>>> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <d...@redhat.com>
>>> wrote:
>>>
>>>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a corrupt self-hosted engine (with several file system errors,
>>>>> postgres not able to start) and thus it does not give access to the web 
>>>>> UI.
>>>>> This happened following an unlucky split brain resolution (I am running 2
>>>>> nodes). The two hosts are running VMs also which I would like to keep
>>>>> running as they are needed.
>>>>>
>>>>> When trying to boot into rescue mode (using
>>>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing
>>>>> else.
>>>>>
>>>>
>>>> This means that more than just the DB is corrupt...
>>>>
>>>>
>>>>>
>>>>> I have backups of engine files with scope all (using the engine-backup
>>>>> tool).
>>>>> What is the best approach to try and fix the engine or redeploy.
>>>>>
>>>>
>>>> If you are careful, and know what you are doing, you can try something
>>>> like the following. I am not giving many details, hopefully you can find on
>>>> the net tutorials about how to use the things I suggest:
>>>>
>>>> 1. Move to global maintenance
>>>>
>>>> 2. Stop the current dead vm (if needed)
>>>>
>>>> 3. Find current vm conf, edit it to boot from a rescue iso image of
>>>> your preference or from net/PXE etc., and start the vm with '--vm-conf'
>>>> pointing to your edited file.
>>>>
>>>> 4. Connect a console (hosted-engine --console, or 'virsh console', or
>>>> use '--add-console-password' and remote viewer, if needed)
>>>>
>>>> 5. Clean the disk and install the OS, oVirt, etc.
>>>>
>>>> 6. Copy your backup into the vm and restore with engine-backup
>>>>
>>>> 7. Then cleanly stop the machine, exit global maint, and let HA start
>>>> it (or start it yourself with --vm-start).
>>>>
>>>> At the time, we had a bug [1] to document this. The result is [2]. It
>>>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db is
>>>> dead but fs is ok).
>>>> For something somewhat similar to what you want, see also [3], which
>>>> uses guestfish. Might be useful, depending on how badly your disk is
>>>> corrupted.
>>>>
>>> I went with the guestfish approach. It has fixed some fs issues and now
>>> the yum etc seem fine apart from postgres.
>>> I had tried previously to uninstall/install packages so I ended
>>> installing them again with yum install ovirt\*setup\*.
>>> Now I think I have to run engine-setup but I get the error:
>>>
>>>  Failed to execute stage 'Environment setup': Cannot connect to Engine
>>> database using existing credentials: engine@localhost:5432
>>>
>> Seems that I need to have psql running to be able to run engine-backup
>> --mode=restore. Are there any steps how one could manually prepare pgsql
>> for ovirt so as to attempt restoration?
>>
>
> Replying again, also to conclude this part of your episode: Generally
> speaking, that's not needed. restore --provision-all-databases should do
> that for you.
>
Seems that when pgsql is down nothing can be done. You need at least pgsql
up and running (e clean state will do) so as to be able to proceed with
restoration.

>
> I replied to all your interim emails in private, since you replied in
> private.
>
Did not notice I was replying in private :)

>
> Thanks for the final message to the list.
>
> It would be nice if you send another summary of the main obstacles you ran
> into, what worked and didn't work, and especially what ideas you can think
> of to improve the code/doc for the next time something similar happens
> (also to you :-) ).
>
> If you feel like that, and have time, it sounds like a nice opportunity
> for a blog post :-) (I know I (almost?) never wrote any myself, sorry, but
> I like reading them - and they are much more approachable and useful, over
> the long run, compared to just posting to the list).
>
Noted. Will check to put this in a blog.  Generally the missing part from
the docs was that one cannot proceed with the restoration if pgsql is not
able to start. So I had to clean re-install pgsql and initialize its data
store before proceeding with the restoration.

>
> Best regards,
>
>
>>
>>> So I guess I need to follow [2]. What do you think?
>>>
>>>
>>>> How did you run into a split brain? There is a lock on the shared
>>>> storage that should prevent this.
>>>>
>>>> Good luck and best regards,
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710
>>>> [2]
>>>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine
>>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4
>>>> --
>>>> Didi
>>>>
>>>
>
> --
> Didi
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6QZ4OKZTHPE7LLOHNKGJC2HMMBK662GN/

Reply via email to