Hi everybody, there are couple of things to check here.
- what version of hosted engine agent is this? The logs look like coming from 4.1 - what version of engine is used? - check the host ID in /etc/ovirt-hosted-engine/hosted-engine.conf on both hosts, the numbers must be different - it looks like the agent or broker on host 2 is not active (or there would be a report) - the second host does not see data from the first host (unknown stale-data), wait for a minute and check again, then check the storage connection And then the general troubleshooting: - put hosted engine in global maintenance mode (and check that it is visible from the other host using he --vm-status) - mount storage domain (hosted-engine --connect-storage) - check sanlock client status to see if proper lockspaces are present Best regards Martin Sivak On Tue, Jan 16, 2018 at 1:16 PM, Derek Atkins <[email protected]> wrote: > Why are both hosts reporting as ovirt 1? > Look at the hostname fields to see what mean. > > -derek > Sent using my mobile device. Please excuse any typos. > > On January 16, 2018 7:11:09 AM Artem Tambovskiy <[email protected]> > wrote: >> >> Hello, >> >> Yes, I followed exactly the same procedure while reinstalling the hosts >> (the only difference that I have SSH key configured instead of the >> password). >> >> Just reinstalled the second host one more time, after 20 min the host >> still haven't reached active score of 3400 (Hosted Engine HA:Not Active) and >> I still don't see crown icon for this host. >> >> hosted-engine --vm-status from ovirt1 host >> >> [root@ovirt1 ~]# hosted-engine --vm-status >> >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : True >> Hostname : ovirt1.telia.ru >> Host ID : 1 >> Engine status : {"health": "good", "vm": "up", >> "detail": "up"} >> Score : 3400 >> stopped : False >> Local maintenance : False >> crc32 : 3f94156a >> local_conf_timestamp : 349144 >> Host timestamp : 349144 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=349144 (Tue Jan 16 15:03:45 2018) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=349144 (Tue Jan 16 15:03:45 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUp >> stopped=False >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> >> hosted-engine --vm-status output from ovirt2 host >> >> [root@ovirt2 ovirt-hosted-engine-ha]# hosted-engine --vm-status >> >> >> --== Host 1 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 1 >> Engine status : unknown stale-data >> Score : 3400 >> stopped : False >> Local maintenance : False >> crc32 : 6d3606f1 >> local_conf_timestamp : 349264 >> Host timestamp : 349264 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=349264 (Tue Jan 16 15:05:45 2018) >> host-id=1 >> score=3400 >> vm_conf_refresh_time=349264 (Tue Jan 16 15:05:45 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=EngineUp >> stopped=False >> >> >> --== Host 2 status ==-- >> >> conf_on_shared_storage : True >> Status up-to-date : False >> Hostname : ovirt1.telia.ru >> Host ID : 2 >> Engine status : unknown stale-data >> Score : 0 >> stopped : True >> Local maintenance : False >> crc32 : c7037c03 >> local_conf_timestamp : 7530 >> Host timestamp : 7530 >> Extra metadata (valid at timestamp): >> metadata_parse_version=1 >> metadata_feature_version=1 >> timestamp=7530 (Fri Jan 12 16:10:12 2018) >> host-id=2 >> score=0 >> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >> conf_on_shared_storage=True >> maintenance=False >> state=AgentStopped >> stopped=True >> >> >> Also I saw some log messages in webGUI about time drift like >> >> "Host ovirt2.telia.ru has time-drift of 5305 seconds while maximum >> configured value is 300 seconds." that is a bit weird as haven't touched any >> time settings since I installed the cluster. >> both host have the same time and timezone (MSK) but hosted engine lives in >> UTC timezone. Is it mandatory to have everything in sync and in the same >> timezone? >> >> Regards, >> Artem >> >> >> >> >> >> >> On Tue, Jan 16, 2018 at 2:20 PM, Kasturi Narra <[email protected]> wrote: >>> >>> Hello, >>> >>> I now see that your hosted engine is up and running. Can you let me >>> know how did you try reinstalling the host? Below is the procedure which is >>> used and hope you did not miss any step while reinstalling. If no, can you >>> try reinstalling again and see if that works ? >>> >>> 1) Move the host to maintenance >>> 2) click on reinstall >>> 3) provide the password >>> 4) uncheck 'automatically configure host firewall' >>> 5) click on 'Deploy' tab >>> 6) click Hosted Engine deployment as 'Deploy' >>> >>> And once the host installation is done, wait till the active score of the >>> host shows 3400 in the general tab then check hosted-engine --vm-status. >>> >>> Thanks >>> kasturi >>> >>> On Mon, Jan 15, 2018 at 4:57 PM, Artem Tambovskiy >>> <[email protected]> wrote: >>>> >>>> Hello, >>>> >>>> I have uploaded 2 archives with all relevant logs to shared hosting >>>> files from host 1 (which is currently running all VM's including >>>> hosted_engine) - https://yadi.sk/d/PttRoYV63RTvhK >>>> files from second host - https://yadi.sk/d/UBducEsV3RTvhc >>>> >>>> I have tried to restart both ovirt-ha-agent and ovirt-ha-broker but it >>>> gives no effect. I have also tried to shutdown hosted_engine VM, stop >>>> ovirt-ha-agent and ovirt-ha-broker services disconnect storage and connect >>>> it again - no effect as well. >>>> Also I tried to reinstall second host from WebGUI - this lead to the >>>> interesting situation - now hosted-engine --vm-status shows that both >>>> hosts have the same address. >>>> >>>> [root@ovirt1 ~]# hosted-engine --vm-status >>>> >>>> --== Host 1 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : True >>>> Hostname : ovirt1.telia.ru >>>> Host ID : 1 >>>> Engine status : {"health": "good", "vm": "up", >>>> "detail": "up"} >>>> Score : 3400 >>>> stopped : False >>>> Local maintenance : False >>>> crc32 : a7758085 >>>> local_conf_timestamp : 259327 >>>> Host timestamp : 259327 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=259327 (Mon Jan 15 14:06:48 2018) >>>> host-id=1 >>>> score=3400 >>>> vm_conf_refresh_time=259327 (Mon Jan 15 14:06:48 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=EngineUp >>>> stopped=False >>>> >>>> >>>> --== Host 2 status ==-- >>>> >>>> conf_on_shared_storage : True >>>> Status up-to-date : False >>>> Hostname : ovirt1.telia.ru >>>> Host ID : 2 >>>> Engine status : unknown stale-data >>>> Score : 0 >>>> stopped : True >>>> Local maintenance : False >>>> crc32 : c7037c03 >>>> local_conf_timestamp : 7530 >>>> Host timestamp : 7530 >>>> Extra metadata (valid at timestamp): >>>> metadata_parse_version=1 >>>> metadata_feature_version=1 >>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>> host-id=2 >>>> score=0 >>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>> conf_on_shared_storage=True >>>> maintenance=False >>>> state=AgentStopped >>>> stopped=True >>>> >>>> Gluster seems working fine. all gluster nodes showing connected state. >>>> >>>> Any advises on how to resolve this situation are highly appreciated! >>>> >>>> Regards, >>>> Artem >>>> >>>> >>>> On Mon, Jan 15, 2018 at 11:45 AM, Kasturi Narra <[email protected]> >>>> wrote: >>>>> >>>>> Hello Artem, >>>>> >>>>> Can you check if glusterd service is running on host1 and all >>>>> the peers are in connected state ? If yes, can you restart ovirt-ha-agent >>>>> and broker services and check if things are working fine ? >>>>> >>>>> Thanks >>>>> kasturi >>>>> >>>>> On Sat, Jan 13, 2018 at 12:33 AM, Artem Tambovskiy >>>>> <[email protected]> wrote: >>>>>> >>>>>> Explored logs on both hosts. >>>>>> broker.log shows no errors. >>>>>> >>>>>> agent.log looking not good: >>>>>> >>>>>> on host1 (which running hosted engine) : >>>>>> >>>>>> MainThread::ERROR::2018-01-12 >>>>>> 21:51:03,883::agent::205::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>>>> Traceback (most recent call last): >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 191, in _run_agent >>>>>> return action(he) >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", >>>>>> line 64, in action_proper >>>>>> return he.start_monitoring() >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 411, in start_monitoring >>>>>> self._initialize_sanlock() >>>>>> File >>>>>> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", >>>>>> line 749, in _initialize_sanlock >>>>>> "Failed to initialize sanlock, the number of errors has" >>>>>> SanlockInitializationError: Failed to initialize sanlock, the number >>>>>> of errors has exceeded the limit >>>>>> >>>>>> MainThread::ERROR::2018-01-12 >>>>>> 21:51:03,884::agent::206::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>>>> Trying to restart agent >>>>>> MainThread::WARNING::2018-01-12 >>>>>> 21:51:08,889::agent::209::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) >>>>>> Restarting agent, attempt '1' >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:08,919::hosted_engine::242::ovirt_hosted_engine_ha.agenthosted_engine.HostedEngine::(_get_hostname) >>>>>> Found certificate common name: ovirt1.telia.ru >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:08,921::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:11,398::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:11,399::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:13,725::storage_server::239::ovirt_hosted_engine_ha.libstorage_server.StorageServer::(connect_storage_server) >>>>>> Connecting storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:18,390::storage_server::246::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>>>> Connecting storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:18,423::storage_server::253::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) >>>>>> Refreshing the storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:18,689::hosted_engine::663::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Preparing images >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:18,690::image::126::ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) >>>>>> Preparing images >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,895::hosted_engine::666::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Refreshing vm.conf >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,895::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,896::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,896::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,897::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,915::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,918::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,919::hosted_engine::509::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>>>> Initializing ha-broker connection >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,919::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor ping, options {'addr': '80.239.162.97'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,922::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104457680 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,922::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name': >>>>>> 'ovirtmgmt', 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,936::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor) >>>>>> Success, id 140547104458064 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,936::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,938::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104458448 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,939::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlinkBrokerLink::(start_monitor) >>>>>> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', >>>>>> 'vm_uuid': >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,940::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104457552 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,941::brokerlink::130::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': >>>>>> 'b366e466-b0ea-4a09-866b-d0248d7523a6', 'address': '0'} >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:21,942::brokerlink::141::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) >>>>>> Success, id 140547104459792 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:26,951::brokerlink::179::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(set_storage_domain) >>>>>> Success, id 140546772847056 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:26,952::hosted_engine::601::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) >>>>>> Broker initialized, all submonitors started >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:51:27,049::hosted_engine::704::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Ensuring lease for lockspace hosted-engine, host id 1 is acquired (file: >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/093faa75-5e33-4559-84fa-1f1f8d48153b/911c7637-b49d-463e-b186-23b404e50769) >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:53:48,067::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:56:14,088::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>>> MainThread::INFO::2018-01-12 >>>>>> 21:58:40,111::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:06,133::hosted_engine::745::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) >>>>>> Failed to acquire the lock. Waiting '5's before the next attempt >>>>>> >>>>>> >>>>>> agent.log from second host >>>>>> >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:37,241::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:37,242::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:39,540::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Storage domain reported as valid and reconnect is not forced. >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:41,939::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:52,150::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:52,150::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:52,151::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:52,153::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:52,174::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:52,179::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:52,189::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:54,586::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:54,587::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:56,903::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Storage domain reported as valid and reconnect is not forced. >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:59,299::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:48 2018 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:01:59,299::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:09,659::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:09,659::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:09,660::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:09,663::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:09,683::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:09,688::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:09,698::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:12,112::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:12,113::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:14,444::hosted_engine::639::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Storage domain reported as valid and reconnect is not forced. >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:16,859::states::682::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(score) >>>>>> Score is 0 due to unexpected vm shutdown at Fri Jan 12 21:57:47 2018 >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:16,859::hosted_engine::453::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >>>>>> Current state EngineUnexpectedlyDown (score: 0) >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,100::config::493::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf) >>>>>> Reloading vm.conf from the shared storage domain >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,100::config::416::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Trying to get a fresher copy of vm configuration from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,101::ovf_store::132::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> Extracting Engine VM OVF from the OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,103::ovf_store::134::ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(getEngineVMOVF) >>>>>> OVF_STORE volume path: >>>>>> /var/run/vdsm/storage/4a7f8717-9bb0-4d80-8016-498fa4b88162/5cabd8e1-5f4b-469e-becc-227469e03f5c/8048cbd7-77e2-4805-9af4-d109fa36dfcf >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,125::config::435::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Found an OVF for HE VM, trying to convert >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,129::config::440::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(_get_vm_conf_content_from_ovf_store) >>>>>> Got vm.conf from OVF_STORE >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,130::states::667::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) >>>>>> Engine down, local host does not have best score >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:27,139::hosted_engine::604::ovirt_hosted_engine_ha.agent.hosted_engineHostedEngine::(_initialize_vdsm) >>>>>> Initializing VDSM >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:29,584::hosted_engine::630::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images) >>>>>> Connecting the storage >>>>>> MainThread::INFO::2018-01-12 >>>>>> 22:02:29,586::storage_server::220::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(validate_storage_server) >>>>>> Validating storage server >>>>>> >>>>>> >>>>>> Any suggestions how to resolve this . >>>>>> >>>>>> regards, >>>>>> Artem >>>>>> >>>>>> >>>>>> On Fri, Jan 12, 2018 at 7:08 PM, Artem Tambovskiy >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> Trying to fix one thing I broke another :( >>>>>>> >>>>>>> I fixed mnt_options for hosted engine storage domain and installed >>>>>>> latest security patches to my hosts and hosted engine. All VM's up and >>>>>>> running, but hosted_engine --vm-status reports about issues: >>>>>>> >>>>>>> [root@ovirt1 ~]# hosted-engine --vm-status >>>>>>> >>>>>>> >>>>>>> --== Host 1 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : False >>>>>>> Hostname : ovirt2 >>>>>>> Host ID : 1 >>>>>>> Engine status : unknown stale-data >>>>>>> Score : 0 >>>>>>> stopped : False >>>>>>> Local maintenance : False >>>>>>> crc32 : 193164b8 >>>>>>> local_conf_timestamp : 8350 >>>>>>> Host timestamp : 8350 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=8350 (Fri Jan 12 19:03:54 2018) >>>>>>> host-id=1 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=8350 (Fri Jan 12 19:03:54 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=EngineUnexpectedlyDown >>>>>>> stopped=False >>>>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>>>> >>>>>>> >>>>>>> --== Host 2 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : False >>>>>>> Hostname : ovirt1.telia.ru >>>>>>> Host ID : 2 >>>>>>> Engine status : unknown stale-data >>>>>>> Score : 0 >>>>>>> stopped : True >>>>>>> Local maintenance : False >>>>>>> crc32 : c7037c03 >>>>>>> local_conf_timestamp : 7530 >>>>>>> Host timestamp : 7530 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> host-id=2 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=AgentStopped >>>>>>> stopped=True >>>>>>> [root@ovirt1 ~]# >>>>>>> >>>>>>> >>>>>>> >>>>>>> from second host situation looks a bit different: >>>>>>> >>>>>>> >>>>>>> [root@ovirt2 ~]# hosted-engine --vm-status >>>>>>> >>>>>>> >>>>>>> --== Host 1 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : True >>>>>>> Hostname : ovirt2 >>>>>>> Host ID : 1 >>>>>>> Engine status : {"reason": "vm not running on >>>>>>> this host", "health": "bad", "vm": "down", "detail": "unknown"} >>>>>>> Score : 0 >>>>>>> stopped : False >>>>>>> Local maintenance : False >>>>>>> crc32 : 78eabdb6 >>>>>>> local_conf_timestamp : 8403 >>>>>>> Host timestamp : 8402 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=8402 (Fri Jan 12 19:04:47 2018) >>>>>>> host-id=1 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=8403 (Fri Jan 12 19:04:47 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=EngineUnexpectedlyDown >>>>>>> stopped=False >>>>>>> timeout=Thu Jan 1 05:24:43 1970 >>>>>>> >>>>>>> >>>>>>> --== Host 2 status ==-- >>>>>>> >>>>>>> conf_on_shared_storage : True >>>>>>> Status up-to-date : False >>>>>>> Hostname : ovirt1.telia.ru >>>>>>> Host ID : 2 >>>>>>> Engine status : unknown stale-data >>>>>>> Score : 0 >>>>>>> stopped : True >>>>>>> Local maintenance : False >>>>>>> crc32 : c7037c03 >>>>>>> local_conf_timestamp : 7530 >>>>>>> Host timestamp : 7530 >>>>>>> Extra metadata (valid at timestamp): >>>>>>> metadata_parse_version=1 >>>>>>> metadata_feature_version=1 >>>>>>> timestamp=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> host-id=2 >>>>>>> score=0 >>>>>>> vm_conf_refresh_time=7530 (Fri Jan 12 16:10:12 2018) >>>>>>> conf_on_shared_storage=True >>>>>>> maintenance=False >>>>>>> state=AgentStopped >>>>>>> stopped=True >>>>>>> >>>>>>> >>>>>>> WebGUI shows that engine running on host ovirt1. >>>>>>> Gluster looks fine >>>>>>> [root@ovirt1 ~]# gluster volume status engine >>>>>>> Status of volume: engine >>>>>>> Gluster process TCP Port RDMA Port >>>>>>> Online Pid >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Brick ovirt1.teliaru:/oVirt/engine 49169 0 Y >>>>>>> 3244 >>>>>>> Brick ovirt2.telia.ru:/oVirt/engine 49179 0 Y >>>>>>> 20372 >>>>>>> Brick ovirt3.telia.ru:/oVirt/engine 49206 0 Y >>>>>>> 16609 >>>>>>> Self-heal Daemon on localhost N/A N/A Y >>>>>>> 117868 >>>>>>> Self-heal Daemon on ovirt2.telia.ru N/A N/A Y >>>>>>> 20521 >>>>>>> Self-heal Daemon on ovirt3 N/A N/A Y >>>>>>> 25093 >>>>>>> >>>>>>> Task Status of Volume engine >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> There are no active volume tasks >>>>>>> >>>>>>> How to resolve this issue? >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list >>>>>>> [email protected] >>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list >>>>>> [email protected] >>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/users >> > > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users > _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

