On Mon, Nov 23, 2020 at 9:54 AM Alex K <rightkickt...@gmail.com> wrote: > > > > On Sun, Nov 22, 2020 at 8:57 AM Yedidyah Bar David <d...@redhat.com> wrote: >> >> On Thu, Nov 19, 2020 at 9:43 PM Alex K <rightkickt...@gmail.com> wrote: >>> >>> >>> >>> On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkickt...@gmail.com> wrote: >>>> >>>> Hi Didi, >>>> >>>> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <d...@redhat.com> wrote: >>>>> >>>>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I have a corrupt self-hosted engine (with several file system errors, >>>>>> postgres not able to start) and thus it does not give access to the web >>>>>> UI. This happened following an unlucky split brain resolution (I am >>>>>> running 2 nodes). The two hosts are running VMs also which I would like >>>>>> to keep running as they are needed. >>>>>> >>>>>> When trying to boot into rescue mode (using >>>>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing >>>>>> else. >>>>> >>>>> >>>>> This means that more than just the DB is corrupt... >>>>> >>>>>> >>>>>> >>>>>> I have backups of engine files with scope all (using the engine-backup >>>>>> tool). >>>>>> What is the best approach to try and fix the engine or redeploy. >>>>> >>>>> >>>>> If you are careful, and know what you are doing, you can try something >>>>> like the following. I am not giving many details, hopefully you can find >>>>> on the net tutorials about how to use the things I suggest: >>>>> >>>>> 1. Move to global maintenance >>>>> >>>>> 2. Stop the current dead vm (if needed) >>>>> >>>>> 3. Find current vm conf, edit it to boot from a rescue iso image of your >>>>> preference or from net/PXE etc., and start the vm with '--vm-conf' >>>>> pointing to your edited file. >>>>> >>>>> 4. Connect a console (hosted-engine --console, or 'virsh console', or use >>>>> '--add-console-password' and remote viewer, if needed) >>>>> >>>>> 5. Clean the disk and install the OS, oVirt, etc. >>>>> >>>>> 6. Copy your backup into the vm and restore with engine-backup >>>>> >>>>> 7. Then cleanly stop the machine, exit global maint, and let HA start it >>>>> (or start it yourself with --vm-start). >>>>> >>>>> At the time, we had a bug [1] to document this. The result is [2]. It >>>>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db >>>>> is dead but fs is ok). >>>>> For something somewhat similar to what you want, see also [3], which uses >>>>> guestfish. Might be useful, depending on how badly your disk is corrupted. >>>> >>>> I went with the guestfish approach. It has fixed some fs issues and now >>>> the yum etc seem fine apart from postgres. >>>> I had tried previously to uninstall/install packages so I ended installing >>>> them again with yum install ovirt\*setup\*. >>>> Now I think I have to run engine-setup but I get the error: >>>> >>>> Failed to execute stage 'Environment setup': Cannot connect to Engine >>>> database using existing credentials: engine@localhost:5432 >>> >>> Seems that I need to have psql running to be able to run engine-backup >>> --mode=restore. Are there any steps how one could manually prepare pgsql >>> for ovirt so as to attempt restoration? >> >> >> Replying again, also to conclude this part of your episode: Generally >> speaking, that's not needed. restore --provision-all-databases should do >> that for you. > > Seems that when pgsql is down nothing can be done. You need at least pgsql up > and running (e clean state will do) so as to be able to proceed with > restoration.
Do you still have logs from this? Both engine-backup's (default to /var/log/ovirt-engine-backup/something if you do not pass --log) and ovirt-engine-provisiondb which it runs (at /var/log/ovirt-engine/setup). Not sure what you mean in "a clean state will do". If you just install PG, it is not enabled by default, so is not "up and running". Generally speaking: If you never started/inited PG (e.g. on a clean machine), restore, with --provision-all-databases, does this for you. Are you sure you passed this? If you did, and created DB/user with the same name it wants to restore to, but left the DB empty, it will use it. If you populated the DB, it will fail with a suitable error message. These are the states that are intended to be supported. Anything else might break it in other ways. >> >> >> I replied to all your interim emails in private, since you replied in >> private. > > Did not notice I was replying in private :) NP :-) >> >> >> Thanks for the final message to the list. >> >> It would be nice if you send another summary of the main obstacles you ran >> into, what worked and didn't work, and especially what ideas you can think >> of to improve the code/doc for the next time something similar happens (also >> to you :-) ). >> >> If you feel like that, and have time, it sounds like a nice opportunity for >> a blog post :-) (I know I (almost?) never wrote any myself, sorry, but I >> like reading them - and they are much more approachable and useful, over the >> long run, compared to just posting to the list). > > Noted. Will check to put this in a blog. Generally the missing part from the > docs was that one cannot proceed with the restoration if pgsql is not able to > start. So I had to clean re-install pgsql and initialize its data store > before proceeding with the restoration. Well, I'd definitely not want a blog post saying you must manually init PG - if you indeed must, that's a bug, so I'd rather fix it first. Thanks and best regards, >> >> >> Best regards, >> >>>> >>>> >>>> So I guess I need to follow [2]. What do you think? >>>> >>>>> >>>>> How did you run into a split brain? There is a lock on the shared storage >>>>> that should prevent this. >>>>> >>>>> Good luck and best regards, >>>>> >>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710 >>>>> [2] >>>>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine >>>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4 >>>>> -- >>>>> Didi >> >> >> >> -- >> Didi > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/6QZ4OKZTHPE7LLOHNKGJC2HMMBK662GN/ -- Didi _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/V67QQOHP3CTEYOELHMMO4UESWVHB4SY7/