On March 27, 2020 5:06:16 PM GMT+02:00, "Wood, Randall" <[email protected]> 
wrote:
>I have a three node Ovirt cluster where one node has stale-data for the
>hosted engine, but the other two nodes do not:
>
>Output of `hosted-engine --vm-status` on a good node:
>
>```
>
>
>!! Cluster is in GLOBAL MAINTENANCE mode !!
>
>
>
>--== Host ovirt2.low.mdds.tcs-sec.com (id: 1) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : True
>Hostname                           : ovirt2.low.mdds.tcs-sec.com
>Host ID                            : 1
>Engine status                      : {"health": "good", "vm": "up",
>"detail": "Up"}
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : f91f57e4
>local_conf_timestamp               : 9915242
>Host timestamp                     : 9915241
>Extra metadata (valid at timestamp):
>       metadata_parse_version=1
>       metadata_feature_version=1
>       timestamp=9915241 (Fri Mar 27 14:38:14 2020)
>       host-id=1
>       score=3400
>       vm_conf_refresh_time=9915242 (Fri Mar 27 14:38:14 2020)
>       conf_on_shared_storage=True
>       maintenance=False
>       state=GlobalMaintenance
>       stopped=False
>
>
>--== Host ovirt1.low.mdds.tcs-sec.com (id: 2) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : True
>Hostname                           : ovirt1.low.mdds.tcs-sec.com
>Host ID                            : 2
>Engine status                      : {"reason": "vm not running on this
>host", "health": "bad", "vm": "down", "detail": "unknown"}
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : 48f9c0fc
>local_conf_timestamp               : 9218845
>Host timestamp                     : 9218845
>Extra metadata (valid at timestamp):
>       metadata_parse_version=1
>       metadata_feature_version=1
>       timestamp=9218845 (Fri Mar 27 14:38:22 2020)
>       host-id=2
>       score=3400
>       vm_conf_refresh_time=9218845 (Fri Mar 27 14:38:22 2020)
>       conf_on_shared_storage=True
>       maintenance=False
>       state=GlobalMaintenance
>       stopped=False
>
>
>--== Host ovirt3.low.mdds.tcs-sec.com (id: 3) status ==--
>
>conf_on_shared_storage             : True
>Status up-to-date                  : False
>Hostname                           : ovirt3.low.mdds.tcs-sec.com
>Host ID                            : 3
>Engine status                      : unknown stale-data
>Score                              : 3400
>stopped                            : False
>Local maintenance                  : False
>crc32                              : 620c8566
>local_conf_timestamp               : 1208310
>Host timestamp                     : 1208310
>Extra metadata (valid at timestamp):
>       metadata_parse_version=1
>       metadata_feature_version=1
>       timestamp=1208310 (Mon Dec 16 21:14:24 2019)
>       host-id=3
>       score=3400
>       vm_conf_refresh_time=1208310 (Mon Dec 16 21:14:24 2019)
>       conf_on_shared_storage=True
>       maintenance=False
>       state=GlobalMaintenance
>       stopped=False
>
>
>!! Cluster is in GLOBAL MAINTENANCE mode !!
>
>```
>
>I tried the steps in https://access.redhat.com/discussions/3511881, but
>`hosted-engine --vm-status` on the node with stale data shows:
>
>```
>The hosted engine configuration has not been retrieved from shared
>storage. Please ensure that ovirt-ha-agent is running and the storage
>server is reachable.
>```
>
>One the stale node, ovirt-ha-agent and ovirt-ha-broker are continually
>restarting. Since it seems the agent depends on the broker, the broker
>logs includes this snippet, repeating roughly every 3 seconds:
>
>```
>MainThread::INFO::2020-03-27
>15:01:06,584::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
>ovirt-hosted-engine-ha broker 2.3.6 started
>MainThread::INFO::2020-03-27
>15:01:06,584::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Searching for submonitors in
>/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor engine-health
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor storage-domain
>MainThread::INFO::2020-03-27
>15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor network
>MainThread::INFO::2020-03-27
>15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load-no-engine
>MainThread::INFO::2020-03-27
>15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mem-free
>MainThread::INFO::2020-03-27
>15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor network
>MainThread::INFO::2020-03-27
>15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mgmt-bridge
>MainThread::INFO::2020-03-27
>15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor storage-domain
>MainThread::INFO::2020-03-27
>15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load-no-engine
>MainThread::INFO::2020-03-27
>15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor engine-health
>MainThread::INFO::2020-03-27
>15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mgmt-bridge
>MainThread::INFO::2020-03-27
>15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load
>MainThread::INFO::2020-03-27
>15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor cpu-load
>MainThread::INFO::2020-03-27
>15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Loaded submonitor mem-free
>MainThread::INFO::2020-03-27
>15:01:06,590::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors)
>Finished loading submonitors
>MainThread::INFO::2020-03-27
>15:01:06,678::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect)
>Connecting the storage
>MainThread::INFO::2020-03-27
>15:01:06,678::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>Connecting storage server
>MainThread::INFO::2020-03-27
>15:01:06,717::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>Connecting storage server
>MainThread::INFO::2020-03-27
>15:01:06,732::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
>Refreshing the storage domain
>MainThread::WARNING::2020-03-27
>15:01:08,940::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__)
>Can't connect vdsm storage: [Errno 5] Input/output error:
>'/rhev/data-center/mnt/glusterSD/ovirt2:_engine/182a4a94-743f-4941-89c1-dc2008ae1cf5/ha_agent/hosted-engine.lockspace'
>```
>
>I restarted the stale node yesterday, but it still shows stale data
>from December of last year.
>
>What is the recommended way for me to try to recover from this?
>
>(This came to my attention when warnings concerning space on the
>/var/log partition began popping up.)
>
>Thank you,
>Randall
>
>_______________________________________________
>Users mailing list -- [email protected]
>To unsubscribe send an email to [email protected]
>Privacy Statement: https://www.ovirt.org/privacy-policy.html
>oVirt Code of Conduct:
>https://www.ovirt.org/community/about/community-guidelines/
>List Archives:
>https://lists.ovirt.org/archives/list/[email protected]/message/5TWVACR6PADE6N2GD5W6NFTIEDHLRDMZ/

Hey Randall,

This is  the key:

Can't connect vdsm storage: [Errno 5] Input/output error: 
'/rhev/data-center/mnt/glusterSD/ovirt2:_engine/182a4a94-743f-4941-89c1-dc2008ae1cf5/ha_agent/hosted-engine.lockspace'

Go to the folder  and check the links.
Actually you can remove them and then the broker will recreate them.
Sometimes  (when using gluster) there could be a split brain - in such case , 
just remove the links on the offending brick and the broker will be allowed  to 
access or recreate the link.

Best Regards,
Strahil Nikolov
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/ZDI6VZOFF7NCHRJWBAMLHAPJBAHUQCL4/

Reply via email to