Hi Jirka, 

the patch works. it stabilized the status of my two hosts. the engine migration 
during failover also works fine. thanks guys! 

Jaicel 


From: "Jiri Moskovcak" <jmosk...@redhat.com> 
To: "Jaicel" <jai...@asti.dost.gov.ph> 
Cc: "Niels de Vos" <nde...@redhat.com>, "Vijay Bellur" <vbel...@redhat.com>, 
users@ovirt.org, "Gluster Devel" <gluster-de...@gluster.org> 
Sent: Monday, November 3, 2014 3:33:16 PM 
Subject: Re: [ovirt-users] Hosted-Engine HA problem 

On 11/01/2014 07:43 AM, Jaicel wrote: 
> Hi, 
> 
> my engine runs on Host1. current status and agent logs below. 
> 
> Host 1 

Hi, 
it seems like you ran into [1], you can either zero-out the metadata 
file or apply the patch from [1] manually. 

--Jirka 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1158925 

> 
> MainThread::INFO::2014-10-31 
> 16:55:39,918::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> ovirt-hosted-engi 
> ne-ha agent 1.1.6 started 
> MainThread::INFO::2014-10-31 
> 16:55:39,985::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_get_hostname) Found certificate common name: 192.168.12.11 
> MainThread::INFO::2014-10-31 
> 16:55:40,228::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_broker) Initializing ha-broker connection 
> MainThread::INFO::2014-10-31 
> 16:55:40,228::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor ping, options {'addr': '192.168.12.254'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,231::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634215107920 
> MainThread::INFO::2014-10-31 
> 16:55:40,231::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 
> 'bridge_name': 'ovirtmgmt', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,237::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634215108432 
> MainThread::INFO::2014-10-31 
> 16:55:40,237::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,240::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 39956688 
> MainThread::INFO::2014-10-31 
> 16:55:40,240::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 
> 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f 
> 9', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,243::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634215107664 
> MainThread::INFO::2014-10-31 
> 16:55:40,244::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
> '41d4aff1-54e1-4946-a812-2e656bb7d3f9', ' 
> address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:40,249::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 140634006879632 
> MainThread::INFO::2014-10-31 
> 16:55:40,249::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_broker) Broker initialized, all submonitors started 
> MainThread::INFO::2014-10-31 
> 16:55:40,298::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 1 
> is acquired (file: /rhev/data-center/mnt/g 
> luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
>  
> MainThread::INFO::2014-10-31 
> 16:55:40,322::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(refresh) Global metadata: {'maintenance': False} 
> MainThread::INFO::2014-10-31 
> 16:55:40,322::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(refresh) Host 192.168.12.12 (id 2): {'live-data': False, 'extra': 
> 'metadata_parse_version=1\nmetadata_feature_version 
> =1\ntimestamp=1413882675 (Tue Oct 21 17:11:15 
> 2014)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 
> 'hostname': '192.168.12.12', 'host-id': 2, 'engine-status': {'reason': 'vm 
> not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 
> 'unknown'}, 'score': 2400, 'maintenance': False, 'host-ts': 1413882675} 
> MainThread::INFO::2014-10-31 
> 16:55:40,322::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
>  Local (id 1): {'engine-health': None, 'bridge': True, 'mem-free': None, 
> 'maintenance': False, 'cpu-load': None, 'gateway': True} 
> MainThread::INFO::2014-10-31 
> 16:55:40,323::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1414745740.32 type=state_transition 
> detail=StartState-ReinitializeFSM hostname='ovirt1' 
> MainThread::INFO::2014-10-31 
> 16:55:40,392::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (StartState-ReinitializeFSM) 
> sent? ignored 
> MainThread::INFO::2014-10-31 
> 16:55:40,675::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state ReinitializeFSM (score: 0) 
> MainThread::INFO::2014-10-31 
> 16:55:50,710::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  
> Trying: notify time=1414745750.71 type=state_transition 
> detail=ReinitializeFSM-EngineUp hostname='ovirt1' 
> MainThread::INFO::2014-10-31 
> 16:55:50,710::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (ReinitializeFSM-EngineUp) 
> sent? ignored 
> MainThread::INFO::2014-10-31 
> 16:55:51,001::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineUp (score: 2400) 
> MainThread::CRITICAL::2014-10-31 
> 16:56:01,033::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> Could not start ha-agent 
> Traceback (most recent call last): 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
> line 97, in run 
> self._run_agent() 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
> line 154, in _run_agent 
> hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 307, in start_monitoring 
> for old_state, state, delay in self.fsm: 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", 
> line 125, in next 
> new_data = self.refresh(self._state.data) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>  line 77, in refresh 
> stats.update(self.hosted_engine.collect_stats()) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 700, in collect_stats 
> stats = self.process_remote_metadata(host_id, remote_data) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 747, in process_remote_metadata 
> md['engine-status'] = engine_status(md["engine-status"]) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 79, in engine_status 
> in json.loads(status).iteritems()]) 
> AttributeError: 'NoneType' object has no attribute 'iteritems' 
> [root@ovirt1 ~]# hosted-engine --vm-status 
> 
> 
> --== Host 1 status ==-- 
> 
> Status up-to-date : False 
> Hostname : 192.168.12.11 
> Host ID : 1 
> Engine status : unknown stale-data 
> Score : 2400 
> Local maintenance : False 
> Host timestamp : 1414745750 
> Extra metadata (valid at timestamp): 
> metadata_parse_version=1 
> metadata_feature_version=1 
> timestamp=1414745750 (Fri Oct 31 16:55:50 2014) 
> host-id=1 
> score=2400 
> maintenance=False 
> state=EngineUp 
> 
> 
> --== Host 2 status ==-- 
> 
> Status up-to-date : False 
> Hostname : 192.168.12.12 
> Host ID : 2 
> Engine status : unknown stale-data 
> Score : 2400 
> Local maintenance : False 
> Host timestamp : 1414745821 
> Extra metadata (valid at timestamp): 
> metadata_parse_version=1 
> metadata_feature_version=1 
> timestamp=1414745821 (Fri Oct 31 16:57:01 2014) 
> host-id=2 
> score=2400 
> maintenance=False 
> state=EngineStart 
> [root@ovirt1 ~]# service ovirt-ha-agent status 
> ovirt-ha-agent dead but subsys locked 
> 
> Host2 
> 
> MainThread::INFO::2014-10-31 
> 16:55:59,642::agent::52::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> ovirt-hosted-engi 
> ne-ha agent 1.1.6 started 
> MainThread::INFO::2014-10-31 
> 16:55:59,678::hosted_engine::223::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_get_hostname) Found certificate common name: 192.168.12.12 
> MainThread::INFO::2014-10-31 
> 16:55:59,918::hosted_engine::367::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_broker) Initializing ha-broker connection 
> MainThread::INFO::2014-10-31 
> 16:55:59,919::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor ping, options {'addr': '192.168.12.254'} 
> MainThread::INFO::2014-10-31 
> 16:55:59,922::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 25353488 
> MainThread::INFO::2014-10-31 
> 16:55:59,922::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor mgmt-bridge, options {'use_ssl': 'true', 
> 'bridge_name': 'ovirtmgmt', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:59,928::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 25354128 
> MainThread::INFO::2014-10-31 
> 16:55:59,928::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:59,931::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 25353552 
> MainThread::INFO::2014-10-31 
> 16:55:59,931::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 
> 'vm_uuid': '41d4aff1-54e1-4946-a812-2e656bb7d3f 
> 9', 'address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:59,934::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 139976608389584 
> MainThread::INFO::2014-10-31 
> 16:55:59,934::brokerlink::126::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Starting monitor engine-health, options {'use_ssl': 'true', 'vm_uuid': 
> '41d4aff1-54e1-4946-a812-2e656bb7d3f9', ' 
> address': '0'} 
> MainThread::INFO::2014-10-31 
> 16:55:59,939::brokerlink::137::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_mo
>  
> nitor) Success, id 139976608447760 
> MainThread::INFO::2014-10-31 
> 16:55:59,939::hosted_engine::391::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_broker) Broker initialized, all submonitors started 
> MainThread::INFO::2014-10-31 
> 16:55:59,983::hosted_engine::476::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(_initialize_sanlock) Ensuring lease for lockspace hosted-engine, host id 2 
> is acquired (file: /rhev/data-center/mnt/g 
> luster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.lockspace)
>  
> MainThread::INFO::2014-10-31 
> 16:56:00,001::state_machine::153::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(refresh) Global metadata: {'maintenance': False} 
> MainThread::INFO::2014-10-31 
> 16:56:00,001::state_machine::158::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(refresh) Host 192.168.12.11 (id 1): {'live-data': True, 'extra': 
> 'metadata_parse_version=1\nmetadata_feature_version= 
> 1\ntimestamp=1414745750 (Fri Oct 31 16:55:50 
> 2014)\nhost-id=1\nscore=2400\nmaintenance=False\nstate=EngineUp\n', 'hostn 
> ame': '192.168.12.11', 'host-id': 1, 'engine-status': {'health': 'good', 
> 'vm': 'up', 'detail': 'up'}, 'score': 2400, 'm 
> aintenance': False, 'host-ts': 1414745750} 
> MainThread::INFO::2014-10-31 
> 16:56:00,001::state_machine::161::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(refresh) Local (id 2): {'engine-health': None, 'bridge': True, 'mem-free': 
> None, 'maintenance': False, 'cpu-load': No 
> ne, 'gateway': True} 
> MainThread::INFO::2014-10-31 
> 16:56:00,002::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  
> Trying: notify time=1414745760.0 type=state_transition 
> detail=StartState-ReinitializeFSM hostname='ovirt2' 
> MainThread::INFO::2014-10-31 
> 16:56:00,045::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  
> Success, was notification of state_transition (StartState-ReinitializeFSM) 
> sent? ignored 
> MainThread::INFO::2014-10-31 
> 16:56:00,325::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine:
>  
> :(start_monitoring) Current state ReinitializeFSM (score: 0) 
> MainThread::INFO::2014-10-31 
> 16:56:10,352::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1414745770.35 type=state_transition 
> detail=ReinitializeFSM-EngineDown hostname='ovirt2' 
> MainThread::INFO::2014-10-31 
> 16:56:10,353::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (ReinitializeFSM-EngineDown) 
> sent? ignored 
> MainThread::INFO::2014-10-31 
> 16:56:10,638::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineDown (score: 2400) 
> MainThread::INFO::2014-10-31 
> 16:56:20,663::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  The engine is not running, but we do not have enough data to decide which 
> hosts are alive 
> MainThread::INFO::2014-10-31 
> 16:56:20,663::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1414745780.66 type=state_transition 
> detail=EngineDown-EngineDown hostname='ovirt2' 
> MainThread::INFO::2014-10-31 
> 16:56:20,664::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineDown-EngineDown) sent? 
> ignored 
> MainThread::INFO::2014-10-31 
> 16:56:20,943::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineDown (score: 2400) 
> MainThread::INFO::2014-10-31 
> 16:56:30,968::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  The engine is not running, but we do not have enough data to decide which 
> hosts are alive 
> MainThread::INFO::2014-10-31 
> 16:56:30,969::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1414745790.97 type=state_transition 
> detail=EngineDown-EngineDown hostname='ovirt2' 
> MainThread::INFO::2014-10-31 
> 16:56:30,969::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineDown-EngineDown) sent? 
> ignored 
> MainThread::INFO::2014-10-31 
> 16:56:31,248::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineDown (score: 2400) 
> MainThread::INFO::2014-10-31 
> 16:56:41,274::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  The engine is not running, but we do not have enough data to decide which 
> hosts are alive 
> MainThread::INFO::2014-10-31 
> 16:56:41,275::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1414745801.28 type=state_transition 
> detail=EngineDown-EngineDown hostname='ovirt2' 
> MainThread::INFO::2014-10-31 
> 16:56:41,276::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineDown-EngineDown) sent? 
> ignored 
> MainThread::INFO::2014-10-31 
> 16:56:41,555::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineDown (score: 2400) 
> MainThread::INFO::2014-10-31 
> 16:56:51,583::states::441::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  The engine is not running, but we do not have enough data to decide which 
> hosts are alive 
> MainThread::INFO::2014-10-31 
> 16:56:51,584::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1414745811.58 type=state_transition 
> detail=EngineDown-EngineDown hostname='ovirt2' 
> MainThread::INFO::2014-10-31 
> 16:56:51,584::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineDown-EngineDown) sent? 
> ignored 
> MainThread::INFO::2014-10-31 
> 16:56:51,864::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineDown (score: 2400) 
> MainThread::INFO::2014-10-31 
> 16:57:01,897::states::454::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
>  Engine down and local host has best score (2400), attempting to start engine 
> VM 
> MainThread::INFO::2014-10-31 
> 16:57:01,898::brokerlink::108::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Trying: notify time=1414745821.9 type=state_transition 
> detail=EngineDown-EngineStart hostname='ovirt2' 
> MainThread::INFO::2014-10-31 
> 16:57:01,906::brokerlink::117::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
>  Success, was notification of state_transition (EngineDown-EngineStart) sent? 
> ignored 
> MainThread::INFO::2014-10-31 
> 16:57:02,189::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
>  Current state EngineStart (score: 2400) 
> MainThread::CRITICAL::2014-10-31 
> 16:57:02,207::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> Could not start ha-agent 
> Traceback (most recent call last): 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
> line 97, in run 
> self._run_agent() 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
> line 154, in _run_agent 
> hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 307, in start_monitoring 
> for old_state, state, delay in self.fsm: 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", 
> line 125, in next 
> new_data = self.refresh(self._state.data) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>  line 77, in refresh 
> stats.update(self.hosted_engine.collect_stats()) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 662, in collect_stats 
> constants.SERVICE_TYPE) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 171, in get_stats_from_storage 
> result = self._checked_communicate(request) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 199, in _checked_communicate 
> .format(message or response)) 
> RequestError: Request failed: <type 'exceptions.OSError'> 
> 
> [root@ovirt2 ~]# hosted-engine --vm-status 
> Traceback (most recent call last): 
> File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main 
> "__main__", fname, loader, pkg_name) 
> File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code 
> exec code in run_globals 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", 
> line 111, in <module> 
> if not status_checker.print_status(): 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py", 
> line 58, in print_status 
> all_host_stats = ha_cli.get_all_host_stats() 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
> line 137, in get_all_host_stats 
> return self.get_all_stats(self.StatModes.HOST) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py", 
> line 86, in get_all_stats 
> constants.SERVICE_TYPE) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 171, in get_stats_from_storage 
> result = self._checked_communicate(request) 
> File 
> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 199, in _checked_communicate 
> .format(message or response)) 
> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: <type 
> 'exceptions.OSError'> 
> [root@ovirt2 ~]# service ovirt-ha-agent status 
> ovirt-ha-agent dead but subsys locked 
> 
> 
> Thanks, 
> Jaicel 
> 
> ----- Original Message ----- 
> From: "Jiri Moskovcak" <jmosk...@redhat.com> 
> To: "Jaicel" <jai...@asti.dost.gov.ph> 
> Cc: "Niels de Vos" <nde...@redhat.com>, "Vijay Bellur" <vbel...@redhat.com>, 
> users@ovirt.org, "Gluster Devel" <gluster-de...@gluster.org> 
> Sent: Friday, October 31, 2014 11:05:32 PM 
> Subject: Re: [ovirt-users] Hosted-Engine HA problem 
> 
> On 10/31/2014 10:26 AM, Jaicel wrote: 
>> i've increased the limit and then restarted agent and broker. status 
>> normalize, but then right now it went to "False" state again but still both 
>> having 2400 score. agent logs remains the same, with "ovirt-ha-agent dead 
>> but subsys locked" status. ha-broker logs below 
>> 
>> Thread-138::INFO::2014-10-31 
>> 17:24:22,981::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>  Connection established 
>> Thread-138::INFO::2014-10-31 
>> 17:24:22,991::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Connection closed 
>> Thread-139::INFO::2014-10-31 
>> 17:24:38,385::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>  Connection established 
>> Thread-139::INFO::2014-10-31 
>> 17:24:38,395::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Connection closed 
>> Thread-140::INFO::2014-10-31 
>> 17:24:53,816::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>  Connection established 
>> Thread-140::INFO::2014-10-31 
>> 17:24:53,827::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Connection closed 
>> Thread-141::INFO::2014-10-31 
>> 17:25:09,172::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>  Connection established 
>> Thread-141::INFO::2014-10-31 
>> 17:25:09,182::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Connection closed 
>> Thread-142::INFO::2014-10-31 
>> 17:25:24,551::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>  Connection established 
>> Thread-142::INFO::2014-10-31 
>> 17:25:24,562::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>  Connection closed 
>> 
>> Thanks, 
>> Jaicel 
> 
> ok, now it seems that broker runs fine, so I need the recent agent.log 
> to debug it more. 
> 
> --Jirka 
> 
>> 
>> ----- Original Message ----- 
>> From: "Jiri Moskovcak" <jmosk...@redhat.com> 
>> To: "Jaicel R. Sabonsolin" <jai...@asti.dost.gov.ph>, "Niels de Vos" 
>> <nde...@redhat.com> 
>> Cc: "Vijay Bellur" <vbel...@redhat.com>, users@ovirt.org, "Gluster Devel" 
>> <gluster-de...@gluster.org> 
>> Sent: Friday, October 31, 2014 4:32:02 PM 
>> Subject: Re: [ovirt-users] Hosted-Engine HA problem 
>> 
>> On 10/31/2014 03:53 AM, Jaicel R. Sabonsolin wrote: 
>>> Hi guys, 
>>> 
>>> these logs appear on both hosts just like the result of --vm-status. tried 
>>> to tcpdump on ovirt hosts and gluster nodes but only packets exchange with 
>>> my monitoring VM(zabbix) appeared. 
>>> 
>>> agent.log 
>>> new_data = self.refresh(self._state.data) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
>>>  line 77, in refresh 
>>> stats.update(self.hosted_engine.collect_stats()) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>>>  line 662, in collect_stats 
>>> constants.SERVICE_TYPE) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>  line 171, in get_stats_from_storage 
>>> result = self._checked_communicate(request) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>  line 199, in _checked_communicate 
>>> .format(message or response)) 
>>> RequestError: Request failed: <type 'exceptions.OSError'> 
>>> 
>>> broker.log 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>  line 165, in handle 
>>> response = "success " + self._dispatch(data) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>  line 261, in _dispatch 
>>> .get_all_stats_for_service_type(**options) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>  line 41, in get_all_stats_for_service_type 
>>> d = self.get_raw_stats_for_service_type(storage_dir, service_type) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>  line 74, in get_raw_stats_for_service_type 
>>> f = os.open(path, direct_flag | os.O_RDONLY) 
>>> OSError: [Errno 24] Too many open files: 
>>> '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
>>>  
>> 
>> - ah, there we go ^^^^^^ you might need to tweak the limit of allowed 
>> open files as described here [1] or find the app keeps so many files open 
>> 
>> 
>> --Jirka 
>> 
>> [1] 
>> http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
>>  
>> 
>>> Thread-38160::INFO::2014-10-31 
>>> 10:28:37,989::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>  Connection closed 
>>> Thread-38161::INFO::2014-10-31 
>>> 10:28:53,656::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
>>>  Connection established 
>>> Thread-38161::ERROR::2014-10-31 
>>> 10:28:53,657::listener::190::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>  Error handling request, data: 'get-stats 
>>> storage_dir=/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent
>>>  service_type=hosted-engine' 
>>> Traceback (most recent call last): 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>  line 165, in handle 
>>> response = "success " + self._dispatch(data) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/listener.py",
>>>  line 261, in _dispatch 
>>> .get_all_stats_for_service_type(**options) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>  line 41, in get_all_stats_for_service_type 
>>> d = self.get_raw_stats_for_service_type(storage_dir, service_type) 
>>> File 
>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
>>>  line 74, in get_raw_stats_for_service_type 
>>> f = os.open(path, direct_flag | os.O_RDONLY) 
>>> OSError: [Errno 24] Too many open files: 
>>> '/rhev/data-center/mnt/gluster1:_engine/6eb220be-daff-4785-8f78-111cc24139c4/ha_agent/hosted-engine.metadata'
>>>  
>>> Thread-38161::INFO::2014-10-31 
>>> 10:28:53,658::listener::184::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
>>>  Connection closed 
>>> 
>>> Thanks, 
>>> Jaicel 
>>> 
>>> ----- Original Message ----- 
>>> From: "Niels de Vos" <nde...@redhat.com> 
>>> To: "Vijay Bellur" <vbel...@redhat.com> 
>>> Cc: "Jiri Moskovcak" <jmosk...@redhat.com>, "Jaicel R. Sabonsolin" 
>>> <jai...@asti.dost.gov.ph>, users@ovirt.org, "Gluster Devel" 
>>> <gluster-de...@gluster.org> 
>>> Sent: Friday, October 31, 2014 4:11:25 AM 
>>> Subject: Re: [ovirt-users] Hosted-Engine HA problem 
>>> 
>>> On Thu, Oct 30, 2014 at 09:07:24PM +0530, Vijay Bellur wrote: 
>>>> On 10/30/2014 06:45 PM, Jiri Moskovcak wrote: 
>>>>> On 10/30/2014 09:22 AM, Jaicel R. Sabonsolin wrote: 
>>>>>> Hi Guys, 
>>>>>> 
>>>>>> I need help with my ovirt Hosted-Engine HA setup. I am running on 2 
>>>>>> ovirt hosts and 2 gluster nodes with replicated volumes. i already have 
>>>>>> VMs running on my hosts and they can migrate normally once i for example 
>>>>>> power off the host that they are running on. the problem is that the 
>>>>>> engine can't migrate once i switch off the host that hosts the engine. 
>>>>>> 
>>>>>> oVirt 3.4.3-1.el6 
>>>>>> KVM 0.12.1.2 - 2.415.el6_5.10 
>>>>>> LIBVIRT libvirt-0.10.2-29.el6_5.9 
>>>>>> VDSM vdsm-4.14.17-0.el6 
>>>>>> 
>>>>>> 
>>>>>> right now, i have this result from hosted-engine --vm-status. 
>>>>>> 
>>>>>> File "/usr/lib64/python2.6/runpy.py", line 122, in 
>>>>>> _run_module_as_main 
>>>>>> "__main__", fname, loader, pkg_name) 
>>>>>> File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code 
>>>>>> exec code in run_globals 
>>>>>> File 
>>>>>> 
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>>  
>>>>>> 
>>>>>> line 111, in <module> 
>>>>>> if not status_checker.print_status(): 
>>>>>> File 
>>>>>> 
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_setup/vm_status.py",
>>>>>>  
>>>>>> 
>>>>>> line 58, in print_status 
>>>>>> all_host_stats = ha_cli.get_all_host_stats() 
>>>>>> File 
>>>>>> 
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>>  
>>>>>> 
>>>>>> line 137, in get_all_host_stats 
>>>>>> return self.get_all_stats(self.StatModes.HOST) 
>>>>>> File 
>>>>>> 
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/client/client.py",
>>>>>>  
>>>>>> 
>>>>>> line 86, in get_all_stats 
>>>>>> constants.SERVICE_TYPE) 
>>>>>> File 
>>>>>> 
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>>  
>>>>>> 
>>>>>> line 171, in get_stats_from_storage 
>>>>>> result = self._checked_communicate(request) 
>>>>>> File 
>>>>>> 
>>>>>> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
>>>>>>  
>>>>>> 
>>>>>> line 199, in _checked_communicate 
>>>>>> .format(message or response)) 
>>>>>> ovirt_hosted_engine_ha.lib.exceptions.RequestError: Request failed: 
>>>>>> <type 'exceptions.OSError'> 
>>>>>> 
>>>>>> 
>>>>>> restarting ha-broker and ha-agent normalizes the status but eventually 
>>>>>> it would become "false" and then return to the result above. hope you 
>>>>>> guys could help me with this. 
>>>>>> 
>>>>> 
>>>>> Hi Jaicel, 
>>>>> please attach agent.log and broker.log from the host where you trying to 
>>>>> run hosted-engine --vm-status. I have a feeling that you ran into a 
>>>>> known problem on gluster - stalled file descriptor, in that case the 
>>>>> only known solution at this time is to restart the broker & agent as you 
>>>>> have already found out. 
>>>>> 
>>>> 
>>>> Adding Niels and gluster-devel to troubleshoot from Gluster NFS 
>>>> perspective. 
>>> 
>>> I'd welcome any details on this "stalled file descriptor" problem. Is 
>>> there a bug filed with some details like logs, sysrq-t and maybe even 
>>> tcpdumps? If there is an easy way to reproduce this behaviour, I can 
>>> surely look into it and hopefully come up with some advise or fix. 
>>> 
>>> Thanks, 
>>> Niels 
>>> 

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to