On 07/01/2015 06:52 AM, Carles Costa wrote:
Dear Experts,

I am experiencing a problem with ovirt, every hour the hosted engine will shutdown and reboot. The engine-status value will move to "unknown stale-data" every second minute, and then the hosted engine will be again operative 14 minutes after that. As far as I can see the scores remain in 2400 at all times, and seems I have a liveliness check failing, but I am not able to find why.

Why I have this problem every hour exactly?
Why the liveliness check fails?

I would appreciate if someone can bring some light, I am new to ovirt but I really like it so far.


Hi Carles, and welcome.

The agent will try to a servlet running on the engine VM in http://{ENGINE_IP}/OvirtEngineWeb/HealthStatus

Also debug log will help if we can't resolve this - see the conf the change it
/etc/ovirt-hosted-engine-ha/broker-log.conf
/etc/ovirt-hosted-engine-ha/agent-log.conf


During the period the machine is down I can see this messages on the /var/log/ovirt-hosted-engine-ha/broker.log :

Thread-170803::INFO::2015-07-01 11:06:47,335::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-170803::INFO::2015-07-01 11:06:47,342::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-170804::INFO::2015-07-01 11:06:47,343::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-170804::INFO::2015-07-01 11:06:47,344::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-170805::INFO::2015-07-01 11:06:47,344::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-170805::INFO::2015-07-01 11:06:47,346::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-170806::INFO::2015-07-01 11:06:47,346::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-170806::INFO::2015-07-01 11:06:47,348::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-170807::INFO::2015-07-01 11:06:47,348::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup) Connection established Thread-170807::INFO::2015-07-01 11:06:47,350::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Connection closed Thread-7::INFO::2015-07-01 11:06:50,394::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load) System load total=0.0095, engine=0.0046, non-engine=0.0049 Thread-8::WARNING::2015-07-01 11:06:50,464::engine_health::116::engine_health.CpuLoadNoEngine::(action) bad health status: Hosted Engine is not up!

and here the /var/log/ovirt-hosted-engine-ha/agent.log :

MainThread::INFO::2015-07-01 11:01:04,216::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 2400) MainThread::INFO::2015-07-01 11:01:04,217::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01 11:01:14,682::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 2400) MainThread::INFO::2015-07-01 11:01:14,682::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01 11:01:24,724::states::393::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm running on localhost MainThread::INFO::2015-07-01 11:01:25,174::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 2400) MainThread::INFO::2015-07-01 11:01:25,174::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01 11:01:35,234::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1435719695.23 type=state_transition detail=EngineUp-EngineUpBadHealth ho
stname='mc-place-compute-01-live.mc.mcon.net'
MainThread::INFO::2015-07-01 11:03:42,536::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Success, was notification of state_transition (EngineUp-EngineUpBadHealth) sent? ignored MainThread::INFO::2015-07-01 11:03:43,018::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUpBadHealth (score: 2400) MainThread::INFO::2015-07-01 11:03:43,018::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01 11:03:53,060::state_machine::160::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Global metadata: {'maintenance': False} MainThread::INFO::2015-07-01 11:03:53,060::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Host mc-place-compute-02-live.mc.mcon.net (id 2): {'extra': 'metadata_parse_versi on=1\nmetadata_feature_version=1\ntimestamp=226246 (Wed Jul 1 11:02:11 2015)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 'hostname': 'mc-place-compute-02-live.mc.mcon.net', 'alive': True, 'h ost-id': 2, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400, 'maintenance': False, 'host-ts': 226246} MainThread::INFO::2015-07-01 11:03:53,060::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Host mc-place-compute-03-live.mc.mcon.net (id 3): {'extra': 'metadata_parse_versi on=1\nmetadata_feature_version=1\ntimestamp=226256 (Wed Jul 1 11:02:15 2015)\nhost-id=3\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 'hostname': 'mc-place-compute-03-live.mc.mcon.net', 'alive': True, 'h ost-id': 3, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400, 'maintenance': False, 'host-ts': 226256} MainThread::INFO::2015-07-01 11:03:53,061::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Host mc-place-compute-04-live.mc.mcon.net (id 4): {'extra': 'metadata_parse_versi on=1\nmetadata_feature_version=1\ntimestamp=226300 (Wed Jul 1 11:02:11 2015)\nhost-id=4\nscore=2400\nmaintenance=False\nstate=EngineDown\n', 'hostname': 'mc-place-compute-04-live.mc.mcon.net', 'alive': True, 'h ost-id': 4, 'engine-status': {'reason': 'vm not running on this host', 'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400, 'maintenance': False, 'host-ts': 226300} MainThread::INFO::2015-07-01 11:03:53,061::state_machine::168::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh) Local (id 1): {'engine-health': {'reason': 'failed liveliness check', 'health': ' bad', 'vm': 'up', 'detail': 'up'}, 'bridge': True, 'mem-free': 136637.0, 'maintenance': False, 'cpu-load': 0.0035, 'gateway': True} MainThread::ERROR::2015-07-01 11:03:53,061::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine VM has bad health status, timeout in 300 seconds MainThread::INFO::2015-07-01 11:03:53,081::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout set to Wed Jul 1 11:08:53 2015 while transitioning <class 'ovirt_hosted_ engine_ha.agent.states.EngineUpBadHealth'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineUpBadHealth'> MainThread::INFO::2015-07-01 11:03:53,530::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUpBadHealth (score: 2400) MainThread::INFO::2015-07-01 11:03:53,530::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01 11:04:03,559::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine VM has bad health status, timeout in 289 seconds MainThread::INFO::2015-07-01 11:04:03,980::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUpBadHealth (score: 2400) MainThread::INFO::2015-07-01 11:04:03,980::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01 11:04:14,007::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine VM has bad health status, timeout in 279 seconds MainThread::INFO::2015-07-01 11:04:14,478::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUpBadHealth (score: 2400) MainThread::INFO::2015-07-01 11:04:14,478::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01 11:04:24,505::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine VM has bad health status, timeout in 268 seconds MainThread::INFO::2015-07-01 11:04:24,994::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUpBadHealth (score: 2400) MainThread::INFO::2015-07-01 11:04:24,994::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240


Best Regards

Carles Cortes Costa




_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to