On March 27, 2020 5:06:16 PM GMT+02:00, "Wood, Randall" <[email protected]> wrote: >I have a three node Ovirt cluster where one node has stale-data for the >hosted engine, but the other two nodes do not: > >Output of `hosted-engine --vm-status` on a good node: > >``` > > >!! Cluster is in GLOBAL MAINTENANCE mode !! > > > >--== Host ovirt2.low.mdds.tcs-sec.com (id: 1) status ==-- > >conf_on_shared_storage : True >Status up-to-date : True >Hostname : ovirt2.low.mdds.tcs-sec.com >Host ID : 1 >Engine status : {"health": "good", "vm": "up", >"detail": "Up"} >Score : 3400 >stopped : False >Local maintenance : False >crc32 : f91f57e4 >local_conf_timestamp : 9915242 >Host timestamp : 9915241 >Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=9915241 (Fri Mar 27 14:38:14 2020) > host-id=1 > score=3400 > vm_conf_refresh_time=9915242 (Fri Mar 27 14:38:14 2020) > conf_on_shared_storage=True > maintenance=False > state=GlobalMaintenance > stopped=False > > >--== Host ovirt1.low.mdds.tcs-sec.com (id: 2) status ==-- > >conf_on_shared_storage : True >Status up-to-date : True >Hostname : ovirt1.low.mdds.tcs-sec.com >Host ID : 2 >Engine status : {"reason": "vm not running on this >host", "health": "bad", "vm": "down", "detail": "unknown"} >Score : 3400 >stopped : False >Local maintenance : False >crc32 : 48f9c0fc >local_conf_timestamp : 9218845 >Host timestamp : 9218845 >Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=9218845 (Fri Mar 27 14:38:22 2020) > host-id=2 > score=3400 > vm_conf_refresh_time=9218845 (Fri Mar 27 14:38:22 2020) > conf_on_shared_storage=True > maintenance=False > state=GlobalMaintenance > stopped=False > > >--== Host ovirt3.low.mdds.tcs-sec.com (id: 3) status ==-- > >conf_on_shared_storage : True >Status up-to-date : False >Hostname : ovirt3.low.mdds.tcs-sec.com >Host ID : 3 >Engine status : unknown stale-data >Score : 3400 >stopped : False >Local maintenance : False >crc32 : 620c8566 >local_conf_timestamp : 1208310 >Host timestamp : 1208310 >Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=1208310 (Mon Dec 16 21:14:24 2019) > host-id=3 > score=3400 > vm_conf_refresh_time=1208310 (Mon Dec 16 21:14:24 2019) > conf_on_shared_storage=True > maintenance=False > state=GlobalMaintenance > stopped=False > > >!! Cluster is in GLOBAL MAINTENANCE mode !! > >``` > >I tried the steps in https://access.redhat.com/discussions/3511881, but >`hosted-engine --vm-status` on the node with stale data shows: > >``` >The hosted engine configuration has not been retrieved from shared >storage. Please ensure that ovirt-ha-agent is running and the storage >server is reachable. >``` > >One the stale node, ovirt-ha-agent and ovirt-ha-broker are continually >restarting. Since it seems the agent depends on the broker, the broker >logs includes this snippet, repeating roughly every 3 seconds: > >``` >MainThread::INFO::2020-03-27 >15:01:06,584::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) >ovirt-hosted-engine-ha broker 2.3.6 started >MainThread::INFO::2020-03-27 >15:01:06,584::monitor::40::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Searching for submonitors in >/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/submonitors >MainThread::INFO::2020-03-27 >15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor engine-health >MainThread::INFO::2020-03-27 >15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor storage-domain >MainThread::INFO::2020-03-27 >15:01:06,585::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor network >MainThread::INFO::2020-03-27 >15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor cpu-load-no-engine >MainThread::INFO::2020-03-27 >15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor mem-free >MainThread::INFO::2020-03-27 >15:01:06,587::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor network >MainThread::INFO::2020-03-27 >15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor mgmt-bridge >MainThread::INFO::2020-03-27 >15:01:06,588::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor storage-domain >MainThread::INFO::2020-03-27 >15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor cpu-load-no-engine >MainThread::INFO::2020-03-27 >15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor engine-health >MainThread::INFO::2020-03-27 >15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor mgmt-bridge >MainThread::INFO::2020-03-27 >15:01:06,589::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor cpu-load >MainThread::INFO::2020-03-27 >15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor cpu-load >MainThread::INFO::2020-03-27 >15:01:06,590::monitor::49::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Loaded submonitor mem-free >MainThread::INFO::2020-03-27 >15:01:06,590::monitor::50::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) >Finished loading submonitors >MainThread::INFO::2020-03-27 >15:01:06,678::storage_backends::373::ovirt_hosted_engine_ha.lib.storage_backends::(connect) >Connecting the storage >MainThread::INFO::2020-03-27 >15:01:06,678::storage_server::349::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >Connecting storage server >MainThread::INFO::2020-03-27 >15:01:06,717::storage_server::356::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >Connecting storage server >MainThread::INFO::2020-03-27 >15:01:06,732::storage_server::413::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >Refreshing the storage domain >MainThread::WARNING::2020-03-27 >15:01:08,940::storage_broker::97::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) >Can't connect vdsm storage: [Errno 5] Input/output error: >'/rhev/data-center/mnt/glusterSD/ovirt2:_engine/182a4a94-743f-4941-89c1-dc2008ae1cf5/ha_agent/hosted-engine.lockspace' >``` > >I restarted the stale node yesterday, but it still shows stale data >from December of last year. > >What is the recommended way for me to try to recover from this? > >(This came to my attention when warnings concerning space on the >/var/log partition began popping up.) > >Thank you, >Randall > >_______________________________________________ >Users mailing list -- [email protected] >To unsubscribe send an email to [email protected] >Privacy Statement: https://www.ovirt.org/privacy-policy.html >oVirt Code of Conduct: >https://www.ovirt.org/community/about/community-guidelines/ >List Archives: >https://lists.ovirt.org/archives/list/[email protected]/message/5TWVACR6PADE6N2GD5W6NFTIEDHLRDMZ/
Hey Randall, This is the key: Can't connect vdsm storage: [Errno 5] Input/output error: '/rhev/data-center/mnt/glusterSD/ovirt2:_engine/182a4a94-743f-4941-89c1-dc2008ae1cf5/ha_agent/hosted-engine.lockspace' Go to the folder and check the links. Actually you can remove them and then the broker will recreate them. Sometimes (when using gluster) there could be a split brain - in such case , just remove the links on the offending brick and the broker will be allowed to access or recreate the link. Best Regards, Strahil Nikolov _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/ZDI6VZOFF7NCHRJWBAMLHAPJBAHUQCL4/

