On Thu, Feb 4, 2021 at 12:15 PM Roderick Mooi <[email protected]> wrote:
>
> Thanks so much, this worked!
>
> For the record/list benefit, I first put the host into maintenance and then 
> selected Enroll Certificate - this regenerated the certs.
> (VDSM cert can be checked with: certtool -i --infile 
> /etc/pki/vdsm/certs/vdsmcert.pem)
>
> I then took these steps on the affected (incorrectly reported) host to update 
> the hosted-engine --vm-status:
> 1. hosted-engine --set-maintenance --mode=global
> 2. systemctl stop ovirt-ha-agent.service
> 3. hosted-engine --clean-metadata
> 4. systemctl start ovirt-ha-agent.service
> 5. hosted-engine --vm-status (after a minute or two - to verify that the host 
> details are now correct)

Mooi!

(pun intended) (Yes, I speak a little dutch)

Thanks for the report!

>
> Cheers :)
>
> On 2021/02/04 10:17, Yedidyah Bar David wrote:
> > On Thu, Feb 4, 2021 at 10:09 AM Roderick Mooi <[email protected]> wrote:
> >>
> >> Hi Didi!
> >>
> >> Ok, I started the clean metadata process and then found the real issue - I 
> >> had copied the certs (just /etc/pki/vdsm; other pki folders were intact) 
> >> from a working host (host 2) to host 1 following the re-deploy cleanup as 
> >> part of the process to get it online again. The problem is the cert 
> >> contains the hostname (so now the cert on host 1 contains as Subject CN 
> >> the hostname of host 2).
> >
> > Right. Sorry I didn't remember that.
> >
> >> I found some docs on the certs for libvirt but it's not clear what I need 
> >> to do to correctly re-generate the vdsm certs on host 1. Can you help? PS 
> >> I presume I need to re-generate client certs for that host as well and 
> >> copy to the engine?
> >
> > Easiest is to put the host to maintenance, then "Enroll Certificate" -
> > IIRC this should be enough. If you want to make sure, perhaps better
> > remove all certs/keys and do 'Reinstall' instead, and make sure you
> > choose 'Deploy' for 'Hosted Engine'.
> >
> > Good luck,
> >
> >>
> >> Appreciated,
> >>
> >> Roderick
> >>
> >>
> >> On 2021/02/03 16:58, Yedidyah Bar David wrote:
> >>> On Wed, Feb 3, 2021 at 4:52 PM Roderick Mooi <[email protected]> 
> >>> wrote:
> >>>>
> >>>> Thanks,
> >>>>
> >>>>> I didn't check, but am pretty certain that it's not related to the
> >>>>> engine db. Do you see such duplicates there as well (using the web ui
> >>>>> or sql against it)? If so, fix these first. If no other means, put the
> >>>>> host to maintenance and reinstall with the correct name.
> >>>>
> >>>> Not seeing duplicates in the web UI, only in the --vm-status. Can you 
> >>>> please assist me with the sql commands or reference to the database 
> >>>> schema + where to check? I'd like to check that first before doing 
> >>>> anything too drastic.
> >>>
> >>> /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c 'select * from vds'
> >>>
> >>>>
> >>>> Note: it only duplicated the hostname after I changed the host_id, 
> >>>> before that it had the correct hostname but duplicate host_id.
> >>>>
> >>>> PS I have a recent backup of the database (just before which I could 
> >>>> restore if you think that'll do the trick without breaking anything?
> >>>>
> >>>>
> >>>> On 2021/02/03 16:33, Yedidyah Bar David wrote:
> >>>>> On Wed, Feb 3, 2021 at 4:21 PM Roderick Mooi <[email protected]> 
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>>> Any idea how this happened?
> >>>>>>
> >>>>>> Somehow related to the power being "pulled" at the wrong time?
> >>>>>>
> >>>>>>> Perhaps this is a backup done by emacs?
> >>>>>>
> >>>>>> Not sure what does it but I'm glad it did ;)
> >>>>>>
> >>>>>>> Please compare it to your other hosts. It should be (mostly?)
> >>>>>>> identical, but make sure that host_id= is unique per host. It should
> >>>>>>> match the spm host id for this host in the engine database.
> >>>>>>
> >>>>>> I had to restore one of my hosts (host 1) manually due a cleanup 
> >>>>>> during my re-deploy attempts. I managed to do this successfully by 
> >>>>>> copying the missing files from another host (host 2) but the first 
> >>>>>> time the host ID matched one of the other hosts (which made at least 
> >>>>>> hosted-engine --vm-status unhappy) [I hadn't seen your email yet :(]. 
> >>>>>> I subsequently corrected the host_id and rebooted the guilty host. 
> >>>>>> Things mostly seem to be working now except that in hosted-engine 
> >>>>>> --vm-status my first two hosts (the one I copied the .conf from as 
> >>>>>> well as the one I copied it to [without changing the ID :O]) now have 
> >>>>>> the same hostname :-/ I'm assuming there's a mismatch in the engine 
> >>>>>> database - where/how do I fix that?
> >>>>>>
> >>>>>
> >>>>> I didn't check, but am pretty certain that it's not related to the
> >>>>> engine db. Do you see such duplicates there as well (using the web ui
> >>>>> or sql against it)? If so, fix these first. If no other means, put the
> >>>>> host to maintenance and reinstall with the correct name.
> >>>>>
> >>>>> If it's just the shared storage, you can try the following. Carefully.
> >>>>> Didn't try myself. Try on a test system first.
> >>>>>
> >>>>> 1. Set global maintenance
> >>>>>
> >>>>> 2. Stop ovirt-ha-agent, ovirt-ha-broker, perhaps also vdsmd, supervdsmd
> >>>>>
> >>>>> 3. hosted-engine --clean_metadata --host-id=1
> >>>>>
> >>>>> - Perhaps even pass --force-cleanup, not sure when it's needed
> >>>>>
> >>>>> - Repeat for other IDs as needed
> >>>>>
> >>>>> 4. Start ovirt-ha-agent (I think this should start all the others, but
> >>>>> make sure)
> >>>>>
> >>>>> 5. Wait a bit. I am pretty certain that they should recreate their
> >>>>> entries in the shared storage and eventually --vm-status should look
> >>>>> ok.
> >>>>>
> >>>>> 6. Exit global maintenance
> >>>>>
> >>>>> Good luck,
> >>>>>
> >>>>>> Appreciated! (and happy cos our cluster is almost back to normal :) )
> >>>>>>
> >>>>>> On 2021/02/03 11:30, Yedidyah Bar David wrote:
> >>>>>>> On Wed, Feb 3, 2021 at 11:12 AM Roderick Mooi <[email protected]> 
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hello and thanks for assisting!
> >>>>>>>>
> >>>>>>>> I think I may have found the problem :)
> >>>>>>>>
> >>>>>>>> /etc/ovirt-hosted-engine/hosted-engine.conf
> >>>>>>>>
> >>>>>>>> is blank.
> >>>>>>>>
> >>>>>>>> But I do have hosted-engine.conf~
> >>>>>>>
> >>>>>>> Any idea how this happened?
> >>>>>>>
> >>>>>>> Perhaps this is a backup done by emacs?
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Can I cp this to restore the original?
> >>>>>>>
> >>>>>>> Please compare it to your other hosts. It should be (mostly?)
> >>>>>>> identical, but make sure that host_id= is unique per host. It should
> >>>>>>> match the spm host id for this host in the engine database.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Anything else I need to do?
> >>>>>>>
> >>>>>>> Not sure, but better find the root cause to make sure no other damage 
> >>>>>>> was done.
> >>>>>>>
> >>>>>>> Good luck,
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Appreciated
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2021/02/02 11:37, Strahil Nikolov wrote:
> >>>>>>>>> Usually,
> >>>>>>>>>
> >>>>>>>>> I would start with checking the output of the 
> >>>>>>>>> /var/log/ovirt-hosted-engine-ha/{broker,agent}.log
> >>>>>>>>>
> >>>>>>>>> I'm typing it on my phone, so the path could have a typo.
> >>>>>>>>>
> >>>>>>>>> Check if the following services (also typed by memory, might have 
> >>>>>>>>> to remove the 'd') are running:
> >>>>>>>>> - sanlock
> >>>>>>>>> - supervdsmd
> >>>>>>>>> - vdsmd
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Sometimes, some of my VGs (gluster) are not activated, so if you 
> >>>>>>>>> run hyperconverged -> you can 'vgchange -ay'.
> >>>>>>>>>
> >>>>>>>>> Best Regards,
> >>>>>>>>> Strahil Nikolov
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Sent from Yahoo Mail on Android 
> >>>>>>>>> <https://go.onelink.me/107872968?pid=InProduct&c=Global_Internal_YGrowth_AndroidEmailSig__AndroidUsers&af_wl=ym&af_sub1=Internal&af_sub2=Global_YGrowth&af_sub3=EmailSignature>
> >>>>>>>>>
> >>>>>>>>>         On Tue, Feb 2, 2021 at 11:28, Roderick Mooi
> >>>>>>>>>         <[email protected]> wrote:
> >>>>>>>>>         Hi!
> >>>>>>>>>
> >>>>>>>>>         We had a power outage and all our servers (oVirt hosts) 
> >>>>>>>>> went down. When they started up neither the hosted-engine nor VMs 
> >>>>>>>>> were started.
> >>>>>>>>>
> >>>>>>>>>         hosted-engine --vm-status
> >>>>>>>>>         says:
> >>>>>>>>>         You must run deploy first
> >>>>>>>>>
> >>>>>>>>>         I tried running deploy with various options but ultimately 
> >>>>>>>>> get stuck at:
> >>>>>>>>>
> >>>>>>>>>         The Host ID is already known. Is this a re-deployment on an 
> >>>>>>>>> additional host that was previously set up (Yes, No)[Yes]?
> >>>>>>>>>         ...
> >>>>>>>>>         [ ERROR ] Failed to execute stage 'Closing up': <urlopen 
> >>>>>>>>> error [Errno 113] No route to host>
> >>>>>>>>>
> >>>>>>>>>         OR
> >>>>>>>>>
> >>>>>>>>>         The specified storage location already contains a data 
> >>>>>>>>> domain. Is this an additional host setup (Yes, No)[Yes]? No
> >>>>>>>>>         [ ERROR ] Re-deploying the engine VM over a previously 
> >>>>>>>>> (partially) deployed system is not supported. Please clean up the 
> >>>>>>>>> storage device or select a different one and retry.
> >>>>>>>>>
> >>>>>>>>>         NOTES:
> >>>>>>>>>         1. This is oVirt v3.6 (legacy install, I know...)
> >>>>>>>>>         2. We do have daily engine backups (.bak files) [till the 
> >>>>>>>>> day the power failed]
> >>>>>>>>>
> >>>>>>>>>         Any advice/assistance appreciated.
> >>>>>>>>>
> >>>>>>>>>         Thanks!
> >>>>>>>>>
> >>>>>>>>>         Roderick
> >>>>>>>>>         _______________________________________________
> >>>>>>>>>         Users mailing list -- [email protected] 
> >>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>         To unsubscribe send an email to [email protected] 
> >>>>>>>>> <mailto:[email protected]>
> >>>>>>>>>         Privacy Statement: 
> >>>>>>>>> https://www.ovirt.org/privacy-policy.html 
> >>>>>>>>> <https://www.ovirt.org/privacy-policy.html>
> >>>>>>>>>         oVirt Code of Conduct: 
> >>>>>>>>> https://www.ovirt.org/community/about/community-guidelines/ 
> >>>>>>>>> <https://www.ovirt.org/community/about/community-guidelines/>
> >>>>>>>>>         List Archives:
> >>>>>>>>>         
> >>>>>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/
> >>>>>>>>>  
> >>>>>>>>> <https://lists.ovirt.org/archives/list/[email protected]/message/73VDY7KLYBKCUXOUU4YTS4ZFGXN2ZX2U/>
> >>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Users mailing list -- [email protected]
> >>>>>>>> To unsubscribe send an email to [email protected]
> >>>>>>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >>>>>>>> oVirt Code of Conduct: 
> >>>>>>>> https://www.ovirt.org/community/about/community-guidelines/
> >>>>>>>> List Archives: 
> >>>>>>>> https://lists.ovirt.org/archives/list/[email protected]/message/HTWNERBX42JNOMONSCG6BL2MCIQZDW7C/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>


-- 
Didi
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/RQCJKHTJLBPL4IXQ72BVMDRZRJ3WANMM/

Reply via email to