Hello, On Mon, Feb 7, 2022 at 2:25 PM Yedidyah Bar David <d...@redhat.com> wrote:
> On Mon, Feb 7, 2022 at 1:27 PM Gilboa Davara <gilb...@gmail.com> wrote: > > > > Hello, > > > > On Mon, Feb 7, 2022 at 8:45 AM Yedidyah Bar David <d...@redhat.com> > wrote: > >> > >> On Sun, Feb 6, 2022 at 5:09 PM Gilboa Davara <gilb...@gmail.com> wrote: > >> > > >> > Unlike my predecessor, I not only lost my vmengine, I also lost the > vdsm services on all hosts. > >> > All seem to be hitting the same issue - read, the certs under > /etc/pki/vdsm/certs and /etc/pki/ovirt* all expired a couple of days ago. > >> > As such, the hosted engine cannot go into global maintenance mode, > >> > >> What do you mean by that? What happens if you 'hosted-engine > >> --set-maintenance --mode=global'? > > > > > > Failed, stating the cluster is not in global maintenance mode. > > Please clarify, and/or share relevant logs, if you have them. > Sadly enough, no. When I zapped the old vmegine and hosts configuration, I forgot to save the logs. (In my defense, it was 4am...) That said, the fix proposed in BZ#1700460 (Let the user skip the global maintenance check) might have saved my cluster. > > You had a semi-working existing HE cluster. > You ran engine-backup on it, took a backup, while it was _not_ in > global maintenance. > It was rather odd. One of the hosts was still active and running the HE engine. After I updated the apache certs, I could connect to the WebUI, but the WebUI failed to access the nodes, spewing SSL handshake errors. I then processed to replace the hosts certs, which seems to work, (E.g. vdsm-client Host getCapabilities worked), hosted-engine --vm-status worked and I could see all 3 hosts, but the engine failed to communicate with the hosts, hence, even though I had a working cluster and engine, and I could get the cluster into global maintenance mode, engine-setup --offline continued to spew "not-in-global-maintenance-mode' errors. At this stage I decided to simply zap the hosted engine and ovirt-hosted-engine-cleanup the hosts. As my brain was half dead, I decided to do a fresh deployment, and not use the daily backup. > That's ok and expected. > > Then you took one of the hosts and evacuated it (or just a new one), > (re)installed the OS (or somehow cleaned it up), and ran > 'hosted-engine --deploy --import-from-file' with the backup you took. > This failed? Where exactly and with what error? > Didn't use the backup. Clean hosted-engine --deploy failed due to qemu-6.1 failure. (I believe it's a known BZ#). Once I remembered to downgrade it to 6.0, everything worked as advertised (minus one export domain, see another email). > > If it's the engine-setup running inside the engine VM, with the same > error as when running 'engine-setup' (perhaps with --offline) manually, > then this shouldn't happen at this point: > - engine-backup --mode=restore sets vdc option in the db 'DbJustRestored' > - engine-setup checks this and sets its own env[JUST_RESTORED] accordingly > > > (Understandable, given two of 3 hosts were offline due to certificate > issues...) > > > > > >> > >> > >> > preventing engine-setup --offline from running. > >> > >> Actually just a few days ago I pushed a patch for: > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1700460 > >> > >> But: > >> > >> If you really have a problem that you can't set global maintenance, > >> using this is a risk - HA might intervene in the middle and shutdown > >> the VM. So either make sure global maintenance does work, or stop > >> all HA services on all hosts. > >> > >> > Two questions: > >> > 1. Is there any automated method to renew the vdsm certificates? > >> > >> You mean, without an engine? > >> > >> I think that if you have a functional engine one way or another, > >> you can automate this somehow, didn't check. Try checking e.g. the > >> python sdk examples - there might be there something you can base > >> on. > >> > >> > 2. Assuming the previous answer is "no", assuming I'm somewhat versed > in using openssl, how can I manually renew them? > >> > >> I'd rather not try to invent from memory how this is supposed to work, > >> and doing this methodically and verifying before replying is quite > >> an effort. > >> > >> If this is really what you want, I suggest something like: > >> > >> 1. Set up a test env with an engine and one host > >> 2. Backup (or use git on) /etc on both > >> 3. Renew the host cert from the UI > >> 4. Check what changed > >> > >> You should find, IMO, that the key(s) on the host didn't > >> change. I guess you might also find CSRs on one or both of them. > >> So basically it should be something like: > >> 1. Create a CSR on the host for the existing key (one or more, > >> not sure). > >> 2. Copy and sign this on the engine using pki-enroll-request.sh > >> (I think you can find examples for it scattered around, perhaps > >> even in the main guides) > >> 3. Copy back the generated certs to the host > >> 4. Perhaps restart one or more services there (vdsm, imageio?, > >> ovn, etc.) > >> > >> You can check the code in > >> /usr/share/ovirt-engine/ansible-runner-service-project/project > >> to see how it's done when initiated from the UI. > >> > >> Good luck and best regards, > > > > > > I more of less found a document stating the above somewhere in the > middle of the night. > > Tried it. > > Got the WebUI working again. > > However, for the life of me I couldn't get the hosts to work to talk to > the engine. (Even though I could use openssl s_client -showcerts -connect > host and got valid certs). > > In the end, @around ~4am, I decided to take the brute force route, clean > the hosts, upgrade them to -streams, and redeploy the engine again (3'rd > attempt, after sufficient amount of coffee reminded me the qemu-6.1 is > broken, and needed to be downgraded before trying to deploy the HE...). > > Either way, when I finish importing the VMs, I'll open a RFE to add > BIG-WARNING-IN-BOLD-LETTERS in the WebUI to notify the admin that the > certificates are about to expire. > > You should have already received them, no? > > https://bugzilla.redhat.com/show_bug.cgi?id=1258585 > > Best regards, > -- > Didi > >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/MSFGRKYTMSLK43L75DPUQYE5B3N2WNFR/