ovirt-2 is 'state=GlobalMaintenance' , but the other 2 nodes is uknown.Try to 
start ovirt-ha-broker & ovirt-ha-agent
Also, you may try to move the hosted-engine to ovirt-2 and try again

Best Regards,Strahil Nikolov
 
 
  On Fri, Feb 18, 2022 at 21:48, Joseph Gelinas<jos...@gelinas.cc> wrote:   I 
may be in maintenance mode, I did try to set it in the beginning of this, but 
engine-setup doesn't see it. At this point my nodes say they can't connect to 
the HA daemon, or have stale data.

[root@ovirt-1 ~]# hosted-engine --set-maintenance --mode=global
Cannot connect to the HA daemon, please check the logs.

[root@ovirt-3 ~]# hosted-engine --set-maintenance --mode=global
Cannot connect to the HA daemon, please check the logs.

[root@ovirt-2 ~]# hosted-engine --set-maintenance --mode=global
[root@ovirt-2 ~]# hosted-engine --vm-status


!! Cluster is in GLOBAL MAINTENANCE mode !!



--== Host ovirt-1.xxxxxx.com (id: 1) status ==--

Host ID                            : 1
Host timestamp                    : 6750990
Score                              : 0
Engine status                      : unknown stale-data
Hostname                          : ovirt-1.xxxxxx.com
Local maintenance                  : False
stopped                            : True
crc32                              : 5290657b
conf_on_shared_storage            : True
local_conf_timestamp              : 6750950
Status up-to-date                  : False
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=6750990 (Thu Feb 17 22:17:53 2022)
    host-id=1
    score=0
    vm_conf_refresh_time=6750950 (Thu Feb 17 22:17:12 2022)
    conf_on_shared_storage=True
    maintenance=False
    state=AgentStopped
    stopped=True


--== Host ovirt-3.xxxxxx.com (id: 2) status ==--

Host ID                            : 2
Host timestamp                    : 6731526
Score                              : 0
Engine status                      : unknown stale-data
Hostname                          : ovirt-3.xxxxxx.com
Local maintenance                  : False
stopped                            : True
crc32                              : 12c6b5c9
conf_on_shared_storage            : True
local_conf_timestamp              : 6731486
Status up-to-date                  : False
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=6731526 (Thu Feb 17 15:29:37 2022)
    host-id=2
    score=0
    vm_conf_refresh_time=6731486 (Thu Feb 17 15:28:57 2022)
    conf_on_shared_storage=True
    maintenance=False
    state=AgentStopped
    stopped=True


--== Host ovirt-2.xxxxxx.com (id: 3) status ==--

Host ID                            : 3
Host timestamp                    : 6829853
Score                              : 3400
Engine status                      : {"vm": "down", "health": "bad", "detail": 
"unknown", "reason": "vm not running on this host"}
Hostname                          : ovirt-2.xxxxxx.com
Local maintenance                  : False
stopped                            : False
crc32                              : 0779c0b8
conf_on_shared_storage            : True
local_conf_timestamp              : 6829853
Status up-to-date                  : True
Extra metadata (valid at timestamp):
    metadata_parse_version=1
    metadata_feature_version=1
    timestamp=6829853 (Fri Feb 18 19:25:17 2022)
    host-id=3
    score=3400
    vm_conf_refresh_time=6829853 (Fri Feb 18 19:25:17 2022)
    conf_on_shared_storage=True
    maintenance=False
    state=GlobalMaintenance
    stopped=False


!! Cluster is in GLOBAL MAINTENANCE mode !!


Ovirt-ha-agent on 1&3 just keeps trying to restart:

MainThread::ERROR::2022-02-18 
19:34:36,910::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
 Trying to restart agent
MainThread::INFO::2022-02-18 
19:34:36,910::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent 
shutting down
MainThread::INFO::2022-02-18 
19:34:47,268::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 2.4.5 started
MainThread::INFO::2022-02-18 
19:34:47,280::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
 Certificate common name not found, using hostname to identify host
MainThread::ERROR::2022-02-18 
19:35:47,629::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
 Traceback (most recent call last):
  File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
131, in _run_agent
    return action(he)
  File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
55, in action_proper
    return he.start_monitoring()
  File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 436, in start_monitoring
    self._initialize_vdsm()
  File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 595, in _initialize_vdsm
    logger=self._log
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", 
line 472, in connect_vdsm_json_rpc
    __vdsm_json_rpc_connect(logger, timeout)
  File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", 
line 415, in __vdsm_json_rpc_connect
    timeout=VDSM_MAX_RETRY * VDSM_DELAY
RuntimeError: Couldn't  connect to VDSM within 60 seconds


Ovirt-2's ovirt-hosted-engine-ha/agent.log has entries detecting global 
maintenance though `systemctl status ovirt-ha-agent` has python exception 
errors from yesterday.

MainThread::INFO::2022-02-18 
19:39:10,452::state_decorators::51::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
 Global maintenance detected
MainThread::INFO::2022-02-18 
19:39:10,524::hosted_engine::517::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_monitoring_loop)
 Current state GlobalMaintenance (score: 3400)


Feb 17 18:49:12 ovirt-2.us1.vricon.com python3[1324125]: detected unhandled 
Python exception in 
'/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py'



> On Feb 18, 2022, at 14:20, Strahil Nikolov <hunter86...@yahoo.com> wrote:
> 
> To set the engine into maintenance mode you can ssh to any Hypervisor and run:
> 'hosted-engine --set-maintenance --mode=global'
> wait 1 minute and run 'hosted-engine --vm-status' to validate.
> 
> Best Regards,
> Strahil Nikolov
> 
> On Fri, Feb 18, 2022 at 19:03, Joseph Gelinas
> <jos...@gelinas.cc> wrote:
> Hi,
> 
> The certificates on our oVirt stack recently expired, while all the VMs are 
> still up, I can't put the cluster into global maintenance via ovirt-engine, 
> or do anything via ovirt-engine for that matter. Just get event logs about 
> cert validity.
> 
> VDSM ovirt-1.xxxxx.com command Get Host Capabilities failed: PKIX path 
> validation failed: java.security.cert.CertPathValidatorException: validity 
> check failed
> VDSM ovirt-2.xxxxx.com command Get Host Capabilities failed: PKIX path 
> validation failed: java.security.cert.CertPathValidatorException: validity 
> check failed
> VDSM ovirt-3.xxxxx.com command Get Host Capabilities failed: PKIX path 
> validation failed: java.security.cert.CertPathValidatorException: validity 
> check failed
> 
> Under Compute -> Hosts, all are status Unassigned. Default data center is 
> status Non Responsive.
> 
> I have tried a couple of solutions to regenerate the certificates without 
> much luck and have copied the originals back in place.
> 
> https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.3/html/upgrade_guide/replacing_sha-1_certificates_with_sha-256_certificates_4-1_local_db#Replacing_All_Signed_Certificates_with_SHA-256_4-1_local_db
> 
> https://access.redhat.com/solutions/2409751
> 
> 
> I have seen things saying running engine-setup will generate new certs, 
> however engine doesn't think the cluster is in global maintenance so won't 
> run that, I believe I can get around the check with `engine-setup 
> --otopi-environment=OVESETUP_CONFIG/continueSetupOnHEVM=bool:True` but is 
> that the right thing to do? Will it deploy the certs on to the hosts as well 
> so things communicate properly? Looks like one is supposed to put a node into 
> maintenance and reenroll it after doing the engine-setup, but will it even be 
> able to put the nodes into maintenance given I can't do anything with them 
> now?
> 
> Appreciate any ideas.
> 
> 
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/QCFPKQ3OKPOUV266MFJUMVTNG2OHLJVW/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XOQBFYM5W7SCJISJHQ7PZZ3VZWKY6GEZ/
  
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/NZE5DYLGQEFQ523MBSUDLKRSFIH7DP62/

Reply via email to