On Thu, Feb 4, 2021 at 12:15 PM Roderick Mooi <[email protected]> wrote: > > Thanks so much, this worked! > > For the record/list benefit, I first put the host into maintenance and then > selected Enroll Certificate - this regenerated the certs. > (VDSM cert can be checked with: certtool -i --infile > /etc/pki/vdsm/certs/vdsmcert.pem) > > I then took these steps on the affected (incorrectly reported) host to update > the hosted-engine --vm-status: > 1. hosted-engine --set-maintenance --mode=global > 2. systemctl stop ovirt-ha-agent.service > 3. hosted-engine --clean-metadata > 4. systemctl start ovirt-ha-agent.service > 5. hosted-engine --vm-status (after a minute or two - to verify that the host > details are now correct)
Mooi! (pun intended) (Yes, I speak a little dutch) Thanks for the report! > > Cheers :) > > On 2021/02/04 10:17, Yedidyah Bar David wrote: > > On Thu, Feb 4, 2021 at 10:09 AM Roderick Mooi <[email protected]> wrote: > >> > >> Hi Didi! > >> > >> Ok, I started the clean metadata process and then found the real issue - I > >> had copied the certs (just /etc/pki/vdsm; other pki folders were intact) > >> from a working host (host 2) to host 1 following the re-deploy cleanup as > >> part of the process to get it online again. The problem is the cert > >> contains the hostname (so now the cert on host 1 contains as Subject CN > >> the hostname of host 2). > > > > Right. Sorry I didn't remember that. > > > >> I found some docs on the certs for libvirt but it's not clear what I need > >> to do to correctly re-generate the vdsm certs on host 1. Can you help? PS > >> I presume I need to re-generate client certs for that host as well and > >> copy to the engine? > > > > Easiest is to put the host to maintenance, then "Enroll Certificate" - > > IIRC this should be enough. If you want to make sure, perhaps better > > remove all certs/keys and do 'Reinstall' instead, and make sure you > > choose 'Deploy' for 'Hosted Engine'. > > > > Good luck, > > > >> > >> Appreciated, > >> > >> Roderick > >> > >> > >> On 2021/02/03 16:58, Yedidyah Bar David wrote: > >>> On Wed, Feb 3, 2021 at 4:52 PM Roderick Mooi <[email protected]> > >>> wrote: > >>>> > >>>> Thanks, > >>>> > >>>>> I didn't check, but am pretty certain that it's not related to the > >>>>> engine db. Do you see such duplicates there as well (using the web ui > >>>>> or sql against it)? If so, fix these first. If no other means, put the > >>>>> host to maintenance and reinstall with the correct name. > >>>> > >>>> Not seeing duplicates in the web UI, only in the --vm-status. Can you > >>>> please assist me with the sql commands or reference to the database > >>>> schema + where to check? I'd like to check that first before doing > >>>> anything too drastic. > >>> > >>> /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c 'select * from vds' > >>> > >>>> > >>>> Note: it only duplicated the hostname after I changed the host_id, > >>>> before that it had the correct hostname but duplicate host_id. > >>>> > >>>> PS I have a recent backup of the database (just before which I could > >>>> restore if you think that'll do the trick without breaking anything? > >>>> > >>>> > >>>> On 2021/02/03 16:33, Yedidyah Bar David wrote: > >>>>> On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi <[email protected]> > >>>>> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>>> Any idea how this happened? > >>>>>> > >>>>>> Somehow related to the power being "pulled" at the wrong time? > >>>>>> > >>>>>>> Perhaps this is a backup done by emacs? > >>>>>> > >>>>>> Not sure what does it but I'm glad it did ;) > >>>>>> > >>>>>>> Please compare it to your other hosts. It should be (mostly?) > >>>>>>> identical, but make sure that host_id= is unique per host. It should > >>>>>>> match the spm host id for this host in the engine database. > >>>>>> > >>>>>> I had to restore one of my hosts (host 1) manually due a cleanup > >>>>>> during my re-deploy attempts. I managed to do this successfully by > >>>>>> copying the missing files from another host (host 2) but the first > >>>>>> time the host ID matched one of the other hosts (which made at least > >>>>>> hosted-engine --vm-status unhappy) [I hadn't seen your email yet :(]. > >>>>>> I subsequently corrected the host_id and rebooted the guilty host. > >>>>>> Things mostly seem to be working now except that in hosted-engine > >>>>>> --vm-status my first two hosts (the one I copied the .conf from as > >>>>>> well as the one I copied it to [without changing the ID :O]) now have > >>>>>> the same hostname :-/ I'm assuming there's a mismatch in the engine > >>>>>> database - where/how do I fix that? > >>>>>> > >>>>> > >>>>> I didn't check, but am pretty certain that it's not related to the > >>>>> engine db. Do you see such duplicates there as well (using the web ui > >>>>> or sql against it)? If so, fix these first. If no other means, put the > >>>>> host to maintenance and reinstall with the correct name. > >>>>> > >>>>> If it's just the shared storage, you can try the following. Carefully. > >>>>> Didn't try myself. Try on a test system first. > >>>>> > >>>>> 1. Set global maintenance > >>>>> > >>>>> 2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd > >>>>> > >>>>> 3. hosted-engine --clean_metadata --host-id=1 > >>>>> > >>>>> - Perhaps even pass --force-cleanup, not sure when it's needed > >>>>> > >>>>> - Repeat for other IDs as needed > >>>>> > >>>>> 4. Start ovirt-ha-agent (I think this should start all the others, but > >>>>> make sure) > >>>>> > >>>>> 5. Wait a bit. I am pretty certain that they should recreate their > >>>>> entries in the shared storage and eventually --vm-status should look > >>>>> ok. > >>>>> > >>>>> 6. Exit global maintenance > >>>>> > >>>>> Good luck, > >>>>> > >>>>>> Appreciated! (and happy cos our cluster is almost back to normal :) ) > >>>>>> > >>>>>> On 2021/02/03 11:30, Yedidyah Bar David wrote: > >>>>>>> On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi <[email protected]> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> Hello and thanks for assisting! > >>>>>>>> > >>>>>>>> I think I may have found the problem :) > >>>>>>>> > >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf > >>>>>>>> > >>>>>>>> is blank. > >>>>>>>> > >>>>>>>> But I do have hosted-engine.conf~ > >>>>>>> > >>>>>>> Any idea how this happened? > >>>>>>> > >>>>>>> Perhaps this is a backup done by emacs? > >>>>>>> > >>>>>>>> > >>>>>>>> Can I cp this to restore the original? > >>>>>>> > >>>>>>> Please compare it to your other hosts. It should be (mostly?) > >>>>>>> identical, but make sure that host_id= is unique per host. It should > >>>>>>> match the spm host id for this host in the engine database. > >>>>>>> > >>>>>>>> > >>>>>>>> Anything else I need to do? > >>>>>>> > >>>>>>> Not sure, but better find the root cause to make sure no other damage > >>>>>>> was done. > >>>>>>> > >>>>>>> Good luck, > >>>>>>> > >>>>>>>> > >>>>>>>> Appreciated > >>>>>>>> > >>>>>>>> > >>>>>>>> On 2021/02/02 11:37, Strahil Nikolov wrote: > >>>>>>>>> Usually, > >>>>>>>>> > >>>>>>>>> I would start with checking the output of the > >>>>>>>>> /var/log/ovirt-hosted-engine-ha/{broker,agent}.log > >>>>>>>>> > >>>>>>>>> I'm typing it on my phone, so the path could have a typo. > >>>>>>>>> > >>>>>>>>> Check if the following services (also typed by memory, might have > >>>>>>>>> to remove the 'd') are running: > >>>>>>>>> - sanlock > >>>>>>>>> - supervdsmd > >>>>>>>>> - vdsmd > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Sometimes, some of my VGs (gluster) are not activated, so if you > >>>>>>>>> run hyperconverged -> you can 'vgchange -ay'. > >>>>>>>>> > >>>>>>>>> Best Regards, > >>>>>>>>> Strahil Nikolov > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Sent from Yahoo Mail on Android > >>>>>>>>> <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature> > >>>>>>>>> > >>>>>>>>> On Tue, Feb 2, 2021 at 11:28, Roderick Mooi > >>>>>>>>> <[email protected]> wrote: > >>>>>>>>> Hi! > >>>>>>>>> > >>>>>>>>> We had a power outage and all our servers (oVirt hosts) > >>>>>>>>> went down. When they started up neither the hosted-engine nor VMs > >>>>>>>>> were started. > >>>>>>>>> > >>>>>>>>> hosted-engine --vm-status > >>>>>>>>> says: > >>>>>>>>> You must run deploy first > >>>>>>>>> > >>>>>>>>> I tried running deploy with various options but ultimately > >>>>>>>>> get stuck at: > >>>>>>>>> > >>>>>>>>> The Host ID is already known. Is this a re-deployment on an > >>>>>>>>> additional host that was previously set up (Yes, No)[Yes]? > >>>>>>>>> ... > >>>>>>>>> [ ERROR ] Failed to execute stage 'Closing up': <urlopen > >>>>>>>>> error [Errno 113] No route to host> > >>>>>>>>> > >>>>>>>>> OR > >>>>>>>>> > >>>>>>>>> The specified storage location already contains a data > >>>>>>>>> domain. Is this an additional host setup (Yes, No)[Yes]? No > >>>>>>>>> [ ERROR ] Re-deploying the engine VM over a previously > >>>>>>>>> (partially) deployed system is not supported. Please clean up the > >>>>>>>>> storage device or select a different one and retry. > >>>>>>>>> > >>>>>>>>> NOTES: > >>>>>>>>> 1. This is oVirt v3.6 (legacy install, I know...) > >>>>>>>>> 2. We do have daily engine backups (.bak files) [till the > >>>>>>>>> day the power failed] > >>>>>>>>> > >>>>>>>>> Any advice/assistance appreciated. > >>>>>>>>> > >>>>>>>>> Thanks! > >>>>>>>>> > >>>>>>>>> Roderick > >>>>>>>>> _______________________________________________ > >>>>>>>>> Users mailing list -- [email protected] > >>>>>>>>> <mailto:[email protected]> > >>>>>>>>> To unsubscribe send an email to [email protected] > >>>>>>>>> <mailto:[email protected]> > >>>>>>>>> Privacy Statement: > >>>>>>>>> https://www.ovirt.org/privacy-policy.html > >>>>>>>>> <https://www.ovirt.org/privacy-policy.html> > >>>>>>>>> oVirt Code of Conduct: > >>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>>>>>>> <https://www.ovirt.org/community/about/community-guidelines/> > >>>>>>>>> List Archives: > >>>>>>>>> > >>>>>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/ > >>>>>>>>> > >>>>>>>>> <https://lists.ovirt.org/archives/list/[email protected]/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/> > >>>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Users mailing list -- [email protected] > >>>>>>>> To unsubscribe send an email to [email protected] > >>>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>>>>>> oVirt Code of Conduct: > >>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>>>>>> List Archives: > >>>>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/ > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > -- Didi _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/RQCJKHTJLBPL4IXQ72BVMDRZRJ3WANMM/

