I have a three node Ovirt cluster where one node has stale-data for the hosted 
engine, but the other two nodes do not:

Output of `hosted-engine --vm-status` on a good node:

```


!! Cluster is in GLOBAL MAINTENANCE mode !!



--== Host ovirt2.low.mdds.tcs-sec.com (id: 1) status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt2.low.mdds.tcs-sec.com
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": 
"Up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : f91f57e4
local_conf_timestamp               : 9915242
Host timestamp                     : 9915241
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9915241 (Fri Mar 27 14:38:14 2020)
        host-id=1
        score=3400
        vm_conf_refresh_time=9915242 (Fri Mar 27 14:38:14 2020)
        conf_on_shared_storage=True
        maintenance=False
        state=GlobalMaintenance
        stopped=False


--== Host ovirt1.low.mdds.tcs-sec.com (id: 2) status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : ovirt1.low.mdds.tcs-sec.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", 
"health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 48f9c0fc
local_conf_timestamp               : 9218845
Host timestamp                     : 9218845
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=9218845 (Fri Mar 27 14:38:22 2020)
        host-id=2
        score=3400
        vm_conf_refresh_time=9218845 (Fri Mar 27 14:38:22 2020)
        conf_on_shared_storage=True
        maintenance=False
        state=GlobalMaintenance
        stopped=False


--== Host ovirt3.low.mdds.tcs-sec.com (id: 3) status ==--

conf_on_shared_storage             : True
Status up-to-date                  : False
Hostname                           : ovirt3.low.mdds.tcs-sec.com
Host ID                            : 3
Engine status                      : unknown stale-data
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 620c8566
local_conf_timestamp               : 1208310
Host timestamp                     : 1208310
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=1208310 (Mon Dec 16 21:14:24 2019)
        host-id=3
        score=3400
        vm_conf_refresh_time=1208310 (Mon Dec 16 21:14:24 2019)
        conf_on_shared_storage=True
        maintenance=False
        state=GlobalMaintenance
        stopped=False


!! Cluster is in GLOBAL MAINTENANCE mode !!

```

I tried the steps in https://access.redhat.com/discussions/3511881, but 
`hosted-engine --vm-status` on the node with stale data shows:

```
The hosted engine configuration has not been retrieved from shared storage. 
Please ensure that ovirt-ha-agent is running and the storage server is 
reachable.
```

One the stale node, ovirt-ha-agent and ovirt-ha-broker are continually 
restarting. Since it seems the agent depends on the broker, the broker logs 
includes this snippet, repeating roughly every 3 seconds:

```
MainThread::INFO::2020-03-27 
15:01:06,584::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) 
ovirt-hosted-engine-ha broker 2.3.6 started
MainThread::INFO::2020-03-27 
15:01:06,584::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Searching for submonitors in 
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
MainThread::INFO::2020-03-27 
15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor engine-health
MainThread::INFO::2020-03-27 
15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor storage-domain
MainThread::INFO::2020-03-27 
15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor network
MainThread::INFO::2020-03-27 
15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-03-27 
15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor mem-free
MainThread::INFO::2020-03-27 
15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor network
MainThread::INFO::2020-03-27 
15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor mgmt-bridge
MainThread::INFO::2020-03-27 
15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor storage-domain
MainThread::INFO::2020-03-27 
15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor cpu-load-no-engine
MainThread::INFO::2020-03-27 
15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor engine-health
MainThread::INFO::2020-03-27 
15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor mgmt-bridge
MainThread::INFO::2020-03-27 
15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor cpu-load
MainThread::INFO::2020-03-27 
15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor cpu-load
MainThread::INFO::2020-03-27 
15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Loaded submonitor mem-free
MainThread::INFO::2020-03-27 
15:01:06,590::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
 Finished loading submonitors
MainThread::INFO::2020-03-27 
15:01:06,678::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
 Connecting the storage
MainThread::INFO::2020-03-27 
15:01:06,678::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
 Connecting storage server
MainThread::INFO::2020-03-27 
15:01:06,717::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
 Connecting storage server
MainThread::INFO::2020-03-27 
15:01:06,732::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
 Refreshing the storage domain
MainThread::WARNING::2020-03-27 
15:01:08,940::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
 Can't connect vdsm storage: [Errno 5] Input/output error: 
'/rhev/data-center/mnt/glusterSD/ovirt2:_engine/182a4a94-743f-4941-89c1-dc2008ae1cf5/ha_agent/hosted-engine.lockspace'
```

I restarted the stale node yesterday, but it still shows stale data from 
December of last year.

What is the recommended way for me to try to recover from this?

(This came to my attention when warnings concerning space on the /var/log 
partition began popping up.)

Thank you,
Randall

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5TWVACR6PADE6N2GD5W6NFTIEDHLRDMZ/

Reply via email to