ovirt-2 is 'state=GlobalMaintenance' , but the other 2 nodes is uknown.Try to start ovirt-ha-broker & ovirt-ha-agent Also, you may try to move the hosted-engine to ovirt-2 and try again
Best Regards,Strahil Nikolov On Fri, Feb 18, 2022 at 21:48, Joseph Gelinas<jos...@gelinas.cc> wrote: I may be in maintenance mode, I did try to set it in the beginning of this, but engine-setup doesn't see it. At this point my nodes say they can't connect to the HA daemon, or have stale data. [root@ovirt-1 ~]# hosted-engine --set-maintenance --mode=global Cannot connect to the HA daemon, please check the logs. [root@ovirt-3 ~]# hosted-engine --set-maintenance --mode=global Cannot connect to the HA daemon, please check the logs. [root@ovirt-2 ~]# hosted-engine --set-maintenance --mode=global [root@ovirt-2 ~]# hosted-engine --vm-status !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host ovirt-1.xxxxxx.com (id: 1) status ==-- Host ID : 1 Host timestamp : 6750990 Score : 0 Engine status : unknown stale-data Hostname : ovirt-1.xxxxxx.com Local maintenance : False stopped : True crc32 : 5290657b conf_on_shared_storage : True local_conf_timestamp : 6750950 Status up-to-date : False Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=6750990 (Thu Feb 17 22:17:53 2022) host-id=1 score=0 vm_conf_refresh_time=6750950 (Thu Feb 17 22:17:12 2022) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True --== Host ovirt-3.xxxxxx.com (id: 2) status ==-- Host ID : 2 Host timestamp : 6731526 Score : 0 Engine status : unknown stale-data Hostname : ovirt-3.xxxxxx.com Local maintenance : False stopped : True crc32 : 12c6b5c9 conf_on_shared_storage : True local_conf_timestamp : 6731486 Status up-to-date : False Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=6731526 (Thu Feb 17 15:29:37 2022) host-id=2 score=0 vm_conf_refresh_time=6731486 (Thu Feb 17 15:28:57 2022) conf_on_shared_storage=True maintenance=False state=AgentStopped stopped=True --== Host ovirt-2.xxxxxx.com (id: 3) status ==-- Host ID : 3 Host timestamp : 6829853 Score : 3400 Engine status : {"vm": "down", "health": "bad", "detail": "unknown", "reason": "vm not running on this host"} Hostname : ovirt-2.xxxxxx.com Local maintenance : False stopped : False crc32 : 0779c0b8 conf_on_shared_storage : True local_conf_timestamp : 6829853 Status up-to-date : True Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=6829853 (Fri Feb 18 19:25:17 2022) host-id=3 score=3400 vm_conf_refresh_time=6829853 (Fri Feb 18 19:25:17 2022) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False !! Cluster is in GLOBAL MAINTENANCE mode !! Ovirt-ha-agent on 1&3 just keeps trying to restart: MainThread::ERROR::2022-02-18 19:34:36,910::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Trying to restart agent MainThread::INFO::2022-02-18 19:34:36,910::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down MainThread::INFO::2022-02-18 19:34:47,268::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) ovirt-hosted-engine-ha agent 2.4.5 started MainThread::INFO::2022-02-18 19:34:47,280::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname) Certificate common name not found, using hostname to identify host MainThread::ERROR::2022-02-18 19:35:47,629::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 436, in start_monitoring self._initialize_vdsm() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 595, in _initialize_vdsm logger=self._log File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 472, in connect_vdsm_json_rpc __vdsm_json_rpc_connect(logger, timeout) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 415, in __vdsm_json_rpc_connect timeout=VDSM_MAX_RETRY * VDSM_DELAY RuntimeError: Couldn't connect to VDSM within 60 seconds Ovirt-2's ovirt-hosted-engine-ha/agent.log has entries detecting global maintenance though `systemctl status ovirt-ha-agent` has python exception errors from yesterday. MainThread::INFO::2022-02-18 19:39:10,452::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Global maintenance detected MainThread::INFO::2022-02-18 19:39:10,524::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop) Current state GlobalMaintenance (score: 3400) Feb 17 18:49:12 ovirt-2.us1.vricon.com python3[1324125]: detected unhandled Python exception in '/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py' > On Feb 18, 2022, at 14:20, Strahil Nikolov <hunter86...@yahoo.com> wrote: > > To set the engine into maintenance mode you can ssh to any Hypervisor and run: > 'hosted-engine --set-maintenance --mode=global' > wait 1 minute and run 'hosted-engine --vm-status' to validate. > > Best Regards, > Strahil Nikolov > > On Fri, Feb 18, 2022 at 19:03, Joseph Gelinas > <jos...@gelinas.cc> wrote: > Hi, > > The certificates on our oVirt stack recently expired, while all the VMs are > still up, I can't put the cluster into global maintenance via ovirt-engine, > or do anything via ovirt-engine for that matter. Just get event logs about > cert validity. > > VDSM ovirt-1.xxxxx.com command Get Host Capabilities failed: PKIX path > validation failed: java.security.cert.CertPathValidatorException: validity > check failed > VDSM ovirt-2.xxxxx.com command Get Host Capabilities failed: PKIX path > validation failed: java.security.cert.CertPathValidatorException: validity > check failed > VDSM ovirt-3.xxxxx.com command Get Host Capabilities failed: PKIX path > validation failed: java.security.cert.CertPathValidatorException: validity > check failed > > Under Compute -> Hosts, all are status Unassigned. Default data center is > status Non Responsive. > > I have tried a couple of solutions to regenerate the certificates without > much luck and have copied the originals back in place. > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/upgrade_guide/replacing_sha-1_certificates_with_sha-256_certificates_4-1_local_db#Replacing_All_Signed_Certificates_with_SHA-256_4-1_local_db > > https://access.redhat.com/solutions/2409751 > > > I have seen things saying running engine-setup will generate new certs, > however engine doesn't think the cluster is in global maintenance so won't > run that, I believe I can get around the check with `engine-setup > --otopi-environment=OVESETUP_CONFIG/continueSetupOnHEVM=bool:True` but is > that the right thing to do? Will it deploy the certs on to the hosts as well > so things communicate properly? Looks like one is supposed to put a node into > maintenance and reenroll it after doing the engine-setup, but will it even be > able to put the nodes into maintenance given I can't do anything with them > now? > > Appreciate any ideas. > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/QCFPKQ3OKPOUV266MFJUMVTNG2OHLJVW/ _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XOQBFYM5W7SCJISJHQ7PZZ3VZWKY6GEZ/
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/NZE5DYLGQEFQ523MBSUDLKRSFIH7DP62/