On Mon, Nov 23, 2020 at 9:54 AM Alex K <rightkickt...@gmail.com> wrote:
>
>
>
> On Sun, Nov 22, 2020 at 8:57 AM Yedidyah Bar David <d...@redhat.com> wrote:
>>
>> On Thu, Nov 19, 2020 at 9:43 PM Alex K <rightkickt...@gmail.com> wrote:
>>>
>>>
>>>
>>> On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkickt...@gmail.com> wrote:
>>>>
>>>> Hi Didi,
>>>>
>>>> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <d...@redhat.com> wrote:
>>>>>
>>>>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a corrupt self-hosted engine (with several file system errors, 
>>>>>> postgres not able to start) and thus it does not give access to the web 
>>>>>> UI. This happened following an unlucky split brain resolution (I am 
>>>>>> running 2 nodes). The two hosts are running VMs also which I would like 
>>>>>> to keep running as they are needed.
>>>>>>
>>>>>> When trying to boot into rescue mode (using 
>>>>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing 
>>>>>> else.
>>>>>
>>>>>
>>>>> This means that more than just the DB is corrupt...
>>>>>
>>>>>>
>>>>>>
>>>>>> I have backups of engine files with scope all (using the engine-backup 
>>>>>> tool).
>>>>>> What is the best approach to try and fix the engine or redeploy.
>>>>>
>>>>>
>>>>> If you are careful, and know what you are doing, you can try something 
>>>>> like the following. I am not giving many details, hopefully you can find 
>>>>> on the net tutorials about how to use the things I suggest:
>>>>>
>>>>> 1. Move to global maintenance
>>>>>
>>>>> 2. Stop the current dead vm (if needed)
>>>>>
>>>>> 3. Find current vm conf, edit it to boot from a rescue iso image of your 
>>>>> preference or from net/PXE etc., and start the vm with '--vm-conf' 
>>>>> pointing to your edited file.
>>>>>
>>>>> 4. Connect a console (hosted-engine --console, or 'virsh console', or use 
>>>>> '--add-console-password' and remote viewer, if needed)
>>>>>
>>>>> 5. Clean the disk and install the OS, oVirt, etc.
>>>>>
>>>>> 6. Copy your backup into the vm and restore with engine-backup
>>>>>
>>>>> 7. Then cleanly stop the machine, exit global maint, and let HA start it 
>>>>> (or start it yourself with --vm-start).
>>>>>
>>>>> At the time, we had a bug [1] to document this. The result is [2]. It 
>>>>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db 
>>>>> is dead but fs is ok).
>>>>> For something somewhat similar to what you want, see also [3], which uses 
>>>>> guestfish. Might be useful, depending on how badly your disk is corrupted.
>>>>
>>>> I went with the guestfish approach. It has fixed some fs issues and now 
>>>> the yum etc seem fine apart from postgres.
>>>> I had tried previously to uninstall/install packages so I ended installing 
>>>> them again with yum install ovirt\*setup\*.
>>>> Now I think I have to run engine-setup but I get the error:
>>>>
>>>>  Failed to execute stage 'Environment setup': Cannot connect to Engine 
>>>> database using existing credentials: engine@localhost:5432
>>>
>>> Seems that I need to have psql running to be able to run engine-backup 
>>> --mode=restore. Are there any steps how one could manually prepare pgsql 
>>> for ovirt so as to attempt restoration?
>>
>>
>> Replying again, also to conclude this part of your episode: Generally 
>> speaking, that's not needed. restore --provision-all-databases should do 
>> that for you.
>
> Seems that when pgsql is down nothing can be done. You need at least pgsql up 
> and running (e clean state will do) so as to be able to proceed with 
> restoration.

Do you still have logs from this? Both engine-backup's (default to
/var/log/ovirt-engine-backup/something if you do not pass --log) and
ovirt-engine-provisiondb which it runs (at
/var/log/ovirt-engine/setup).

Not sure what you mean in "a clean state will do". If you just install
PG, it is not enabled by default, so is not "up and running".

Generally speaking:

If you never started/inited PG (e.g. on a clean machine), restore,
with --provision-all-databases, does this for you. Are you sure you
passed this?

If you did, and created DB/user with the same name it wants to restore
to, but left the DB empty, it will use it.

If you populated the DB, it will fail with a suitable error message.

These are the states that are intended to be supported.

Anything else might break it in other ways.

>>
>>
>> I replied to all your interim emails in private, since you replied in 
>> private.
>
> Did not notice I was replying in private :)

NP :-)

>>
>>
>> Thanks for the final message to the list.
>>
>> It would be nice if you send another summary of the main obstacles you ran 
>> into, what worked and didn't work, and especially what ideas you can think 
>> of to improve the code/doc for the next time something similar happens (also 
>> to you :-) ).
>>
>> If you feel like that, and have time, it sounds like a nice opportunity for 
>> a blog post :-) (I know I (almost?) never wrote any myself, sorry, but I 
>> like reading them - and they are much more approachable and useful, over the 
>> long run, compared to just posting to the list).
>
> Noted. Will check to put this in a blog.  Generally the missing part from the 
> docs was that one cannot proceed with the restoration if pgsql is not able to 
> start. So I had to clean re-install pgsql and initialize its data store 
> before proceeding with the restoration.

Well, I'd definitely not want a blog post saying you must manually
init PG - if you indeed must, that's a bug, so I'd rather fix it
first.

Thanks and best regards,

>>
>>
>> Best regards,
>>
>>>>
>>>>
>>>> So I guess I need to follow [2]. What do you think?
>>>>
>>>>>
>>>>> How did you run into a split brain? There is a lock on the shared storage 
>>>>> that should prevent this.
>>>>>
>>>>> Good luck and best regards,
>>>>>
>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710
>>>>> [2] 
>>>>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine
>>>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4
>>>>> --
>>>>> Didi
>>
>>
>>
>> --
>> Didi
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/6QZ4OKZTHPE7LLOHNKGJC2HMMBK662GN/



-- 
Didi
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/V67QQOHP3CTEYOELHMMO4UESWVHB4SY7/

Reply via email to