On Thu, Feb 4, 2021 at 10:09 AM Roderick Mooi <[email protected]> wrote: > > Hi Didi! > > Ok, I started the clean metadata process and then found the real issue - I > had copied the certs (just /etc/pki/vdsm; other pki folders were intact) from > a working host (host 2) to host 1 following the re-deploy cleanup as part of > the process to get it online again. The problem is the cert contains the > hostname (so now the cert on host 1 contains as Subject CN the hostname of > host 2).
Right. Sorry I didn't remember that. > I found some docs on the certs for libvirt but it's not clear what I need to > do to correctly re-generate the vdsm certs on host 1. Can you help? PS I > presume I need to re-generate client certs for that host as well and copy to > the engine? Easiest is to put the host to maintenance, then "Enroll Certificate" - IIRC this should be enough. If you want to make sure, perhaps better remove all certs/keys and do 'Reinstall' instead, and make sure you choose 'Deploy' for 'Hosted Engine'. Good luck, > > Appreciated, > > Roderick > > > On 2021/02/03 16:58, Yedidyah Bar David wrote: > > On Wed, Feb 3, 2021 at 4:52 PM Roderick Mooi <[email protected]> wrote: > >> > >> Thanks, > >> > >>> I didn't check, but am pretty certain that it's not related to the > >>> engine db. Do you see such duplicates there as well (using the web ui > >>> or sql against it)? If so, fix these first. If no other means, put the > >>> host to maintenance and reinstall with the correct name. > >> > >> Not seeing duplicates in the web UI, only in the --vm-status. Can you > >> please assist me with the sql commands or reference to the database schema > >> + where to check? I'd like to check that first before doing anything too > >> drastic. > > > > /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c 'select * from vds' > > > >> > >> Note: it only duplicated the hostname after I changed the host_id, before > >> that it had the correct hostname but duplicate host_id. > >> > >> PS I have a recent backup of the database (just before which I could > >> restore if you think that'll do the trick without breaking anything? > >> > >> > >> On 2021/02/03 16:33, Yedidyah Bar David wrote: > >>> On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi <[email protected]> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>>> Any idea how this happened? > >>>> > >>>> Somehow related to the power being "pulled" at the wrong time? > >>>> > >>>>> Perhaps this is a backup done by emacs? > >>>> > >>>> Not sure what does it but I'm glad it did ;) > >>>> > >>>>> Please compare it to your other hosts. It should be (mostly?) > >>>>> identical, but make sure that host_id= is unique per host. It should > >>>>> match the spm host id for this host in the engine database. > >>>> > >>>> I had to restore one of my hosts (host 1) manually due a cleanup during > >>>> my re-deploy attempts. I managed to do this successfully by copying the > >>>> missing files from another host (host 2) but the first time the host ID > >>>> matched one of the other hosts (which made at least hosted-engine > >>>> --vm-status unhappy) [I hadn't seen your email yet :(]. I subsequently > >>>> corrected the host_id and rebooted the guilty host. Things mostly seem > >>>> to be working now except that in hosted-engine --vm-status my first two > >>>> hosts (the one I copied the .conf from as well as the one I copied it to > >>>> [without changing the ID :O]) now have the same hostname :-/ I'm > >>>> assuming there's a mismatch in the engine database - where/how do I fix > >>>> that? > >>>> > >>> > >>> I didn't check, but am pretty certain that it's not related to the > >>> engine db. Do you see such duplicates there as well (using the web ui > >>> or sql against it)? If so, fix these first. If no other means, put the > >>> host to maintenance and reinstall with the correct name. > >>> > >>> If it's just the shared storage, you can try the following. Carefully. > >>> Didn't try myself. Try on a test system first. > >>> > >>> 1. Set global maintenance > >>> > >>> 2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd > >>> > >>> 3. hosted-engine --clean_metadata --host-id=1 > >>> > >>> - Perhaps even pass --force-cleanup, not sure when it's needed > >>> > >>> - Repeat for other IDs as needed > >>> > >>> 4. Start ovirt-ha-agent (I think this should start all the others, but > >>> make sure) > >>> > >>> 5. Wait a bit. I am pretty certain that they should recreate their > >>> entries in the shared storage and eventually --vm-status should look > >>> ok. > >>> > >>> 6. Exit global maintenance > >>> > >>> Good luck, > >>> > >>>> Appreciated! (and happy cos our cluster is almost back to normal :) ) > >>>> > >>>> On 2021/02/03 11:30, Yedidyah Bar David wrote: > >>>>> On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi <[email protected]> > >>>>> wrote: > >>>>>> > >>>>>> Hello and thanks for assisting! > >>>>>> > >>>>>> I think I may have found the problem :) > >>>>>> > >>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf > >>>>>> > >>>>>> is blank. > >>>>>> > >>>>>> But I do have hosted-engine.conf~ > >>>>> > >>>>> Any idea how this happened? > >>>>> > >>>>> Perhaps this is a backup done by emacs? > >>>>> > >>>>>> > >>>>>> Can I cp this to restore the original? > >>>>> > >>>>> Please compare it to your other hosts. It should be (mostly?) > >>>>> identical, but make sure that host_id= is unique per host. It should > >>>>> match the spm host id for this host in the engine database. > >>>>> > >>>>>> > >>>>>> Anything else I need to do? > >>>>> > >>>>> Not sure, but better find the root cause to make sure no other damage > >>>>> was done. > >>>>> > >>>>> Good luck, > >>>>> > >>>>>> > >>>>>> Appreciated > >>>>>> > >>>>>> > >>>>>> On 2021/02/02 11:37, Strahil Nikolov wrote: > >>>>>>> Usually, > >>>>>>> > >>>>>>> I would start with checking the output of the > >>>>>>> /var/log/ovirt-hosted-engine-ha/{broker,agent}.log > >>>>>>> > >>>>>>> I'm typing it on my phone, so the path could have a typo. > >>>>>>> > >>>>>>> Check if the following services (also typed by memory, might have to > >>>>>>> remove the 'd') are running: > >>>>>>> - sanlock > >>>>>>> - supervdsmd > >>>>>>> - vdsmd > >>>>>>> > >>>>>>> > >>>>>>> Sometimes, some of my VGs (gluster) are not activated, so if you run > >>>>>>> hyperconverged -> you can 'vgchange -ay'. > >>>>>>> > >>>>>>> Best Regards, > >>>>>>> Strahil Nikolov > >>>>>>> > >>>>>>> > >>>>>>> Sent from Yahoo Mail on Android > >>>>>>> <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature> > >>>>>>> > >>>>>>> On Tue, Feb 2, 2021 at 11:28, Roderick Mooi > >>>>>>> <[email protected]> wrote: > >>>>>>> Hi! > >>>>>>> > >>>>>>> We had a power outage and all our servers (oVirt hosts) went > >>>>>>> down. When they started up neither the hosted-engine nor VMs were > >>>>>>> started. > >>>>>>> > >>>>>>> hosted-engine --vm-status > >>>>>>> says: > >>>>>>> You must run deploy first > >>>>>>> > >>>>>>> I tried running deploy with various options but ultimately get > >>>>>>> stuck at: > >>>>>>> > >>>>>>> The Host ID is already known. Is this a re-deployment on an > >>>>>>> additional host that was previously set up (Yes, No)[Yes]? > >>>>>>> ... > >>>>>>> [ ERROR ] Failed to execute stage 'Closing up': <urlopen error > >>>>>>> [Errno 113] No route to host> > >>>>>>> > >>>>>>> OR > >>>>>>> > >>>>>>> The specified storage location already contains a data domain. > >>>>>>> Is this an additional host setup (Yes, No)[Yes]? No > >>>>>>> [ ERROR ] Re-deploying the engine VM over a previously > >>>>>>> (partially) deployed system is not supported. Please clean up the > >>>>>>> storage device or select a different one and retry. > >>>>>>> > >>>>>>> NOTES: > >>>>>>> 1. This is oVirt v3.6 (legacy install, I know...) > >>>>>>> 2. We do have daily engine backups (.bak files) [till the day > >>>>>>> the power failed] > >>>>>>> > >>>>>>> Any advice/assistance appreciated. > >>>>>>> > >>>>>>> Thanks! > >>>>>>> > >>>>>>> Roderick > >>>>>>> _______________________________________________ > >>>>>>> Users mailing list -- [email protected] <mailto:[email protected]> > >>>>>>> To unsubscribe send an email to [email protected] > >>>>>>> <mailto:[email protected]> > >>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>>>>> <https://www.ovirt.org/privacy-policy.html> > >>>>>>> oVirt Code of Conduct: > >>>>>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>>>>> <https://www.ovirt.org/community/about/community-guidelines/> > >>>>>>> List Archives: > >>>>>>> > >>>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/ > >>>>>>> > >>>>>>> <https://lists.ovirt.org/archives/list/[email protected]/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/> > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Users mailing list -- [email protected] > >>>>>> To unsubscribe send an email to [email protected] > >>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >>>>>> oVirt Code of Conduct: > >>>>>> https://www.ovirt.org/community/about/community-guidelines/ > >>>>>> List Archives: > >>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/ > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > -- Didi _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/BIZFTSQGJHVVMXGA2TDWHLCBQ4I4VE34/

