On Thu, Nov 19, 2020 at 11:33 PM Alex K <rightkickt...@gmail.com> wrote:
> For the records, > > After having fixed the major fs issues with guestfish and since the DB was > not starting up, I removed everything from DB data dir and recreated it as > below: > > rm -rf /var/opt/rh/rh-postgresql10/lib/pgsql/data/* > /opt/rh/rh-postgresql10/root/usr/bin/postgresql-setup --initdb > systemctl restart rh-postgresql10-postgresql.service > Generally speaking, this should not be needed. --provision-all-databases should do this for you. > > Then proceeded with the restoration, where I requested to provision all > missing databases: > engine-backup --mode=restore --file=engine-backup.gz > --provision-all-databases \ > --log=restore.log --restore-permissions > > Following this, ran engine-setup, as instructed from the restore > operation. > Gained engine web access and saw the same running VMs were shown as up > without issues. > I only observed one VM not able to start due to illegal volume, but that's > another story. > Glad to hear that, thanks for the report! Best regards, > > > On Thu, Nov 19, 2020 at 9:42 PM Alex K <rightkickt...@gmail.com> wrote: > >> >> >> On Thu, Nov 19, 2020 at 5:31 PM Alex K <rightkickt...@gmail.com> wrote: >> >>> Hi Didi, >>> >>> On Thu, Nov 19, 2020 at 5:13 PM Yedidyah Bar David <d...@redhat.com> >>> wrote: >>> >>>> On Thu, Nov 19, 2020 at 4:37 PM Alex K <rightkickt...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I have a corrupt self-hosted engine (with several file system errors, >>>>> postgres not able to start) and thus it does not give access to the web >>>>> UI. >>>>> This happened following an unlucky split brain resolution (I am running 2 >>>>> nodes). The two hosts are running VMs also which I would like to keep >>>>> running as they are needed. >>>>> >>>>> When trying to boot into rescue mode (using >>>>> systemd.unit=emergency.target boot parameter) I get a cursor and nothing >>>>> else. >>>>> >>>> >>>> This means that more than just the DB is corrupt... >>>> >>>> >>>>> >>>>> I have backups of engine files with scope all (using the engine-backup >>>>> tool). >>>>> What is the best approach to try and fix the engine or redeploy. >>>>> >>>> >>>> If you are careful, and know what you are doing, you can try something >>>> like the following. I am not giving many details, hopefully you can find on >>>> the net tutorials about how to use the things I suggest: >>>> >>>> 1. Move to global maintenance >>>> >>>> 2. Stop the current dead vm (if needed) >>>> >>>> 3. Find current vm conf, edit it to boot from a rescue iso image of >>>> your preference or from net/PXE etc., and start the vm with '--vm-conf' >>>> pointing to your edited file. >>>> >>>> 4. Connect a console (hosted-engine --console, or 'virsh console', or >>>> use '--add-console-password' and remote viewer, if needed) >>>> >>>> 5. Clean the disk and install the OS, oVirt, etc. >>>> >>>> 6. Copy your backup into the vm and restore with engine-backup >>>> >>>> 7. Then cleanly stop the machine, exit global maint, and let HA start >>>> it (or start it yourself with --vm-start). >>>> >>>> At the time, we had a bug [1] to document this. The result is [2]. It >>>> does not detail how to boot/reinstall os/etc., only restore (if e.g. db is >>>> dead but fs is ok). >>>> For something somewhat similar to what you want, see also [3], which >>>> uses guestfish. Might be useful, depending on how badly your disk is >>>> corrupted. >>>> >>> I went with the guestfish approach. It has fixed some fs issues and now >>> the yum etc seem fine apart from postgres. >>> I had tried previously to uninstall/install packages so I ended >>> installing them again with yum install ovirt\*setup\*. >>> Now I think I have to run engine-setup but I get the error: >>> >>> Failed to execute stage 'Environment setup': Cannot connect to Engine >>> database using existing credentials: engine@localhost:5432 >>> >> Seems that I need to have psql running to be able to run engine-backup >> --mode=restore. Are there any steps how one could manually prepare pgsql >> for ovirt so as to attempt restoration? >> >>> >>> So I guess I need to follow [2]. What do you think? >>> >>> >>>> How did you run into a split brain? There is a lock on the shared >>>> storage that should prevent this. >>>> >>>> Good luck and best regards, >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482710 >>>> [2] >>>> https://www.ovirt.org/documentation/administration_guide/#Overwriting_a_Self-Hosted_Engine >>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1569827#c4 >>>> -- >>>> Didi >>>> >>> _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/SU6V565Y5GAZ67FF5MUDGFLEJ2L2LZV7/ > -- Didi
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZFK2MTVCTZOTLYAXR37AAGTFBWLPKD6S/