Dear Experts,
I am experiencing a problem with ovirt, every hour the hosted engine will
shutdown and reboot. The engine-status value will move to "unknown stale-data"
every second minute, and then the hosted engine will be again operative 14
minutes after that. As far as I can see the scores remain in 2400 at all times,
and seems I have a liveliness check failing, but I am not able to find why.
Why I have this problem every hour exactly?
Why the liveliness check fails?
I would appreciate if someone can bring some light, I am new to ovirt but I
really like it so far.
During the period the machine is down I can see this messages on the
/var/log/ovirt-hosted-engine-ha/broker.log :
Thread-170803::INFO::2015-07-01
11:06:47,335::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170803::INFO::2015-07-01
11:06:47,342::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170804::INFO::2015-07-01
11:06:47,343::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170804::INFO::2015-07-01
11:06:47,344::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170805::INFO::2015-07-01
11:06:47,344::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170805::INFO::2015-07-01
11:06:47,346::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170806::INFO::2015-07-01
11:06:47,346::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170806::INFO::2015-07-01
11:06:47,348::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170807::INFO::2015-07-01
11:06:47,348::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170807::INFO::2015-07-01
11:06:47,350::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-7::INFO::2015-07-01
11:06:50,394::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load)
System load total=0.0095, engine=0.0046, non-engine=0.0049
Thread-8::WARNING::2015-07-01
11:06:50,464::engine_health::116::engine_health.CpuLoadNoEngine::(action) bad
health status: Hosted Engine is not up!
and here the /var/log/ovirt-hosted-engine-ha/agent.log :
MainThread::INFO::2015-07-01
11:01:04,216::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2015-07-01
11:01:04,217::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:01:14,682::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2015-07-01
11:01:14,682::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:01:24,724::states::393::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine vm running on localhost
MainThread::INFO::2015-07-01
11:01:25,174::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2015-07-01
11:01:25,174::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:01:35,234::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1435719695.23 type=state_transition
detail=EngineUp-EngineUpBadHealth ho
stname='mc-place-compute-01-live.mc.mcon.net'
MainThread::INFO::2015-07-01
11:03:42,536::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition (EngineUp-EngineUpBadHealth)
sent? ignored
MainThread::INFO::2015-07-01
11:03:43,018::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:03:43,018::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:03:53,060::state_machine::160::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Global metadata: {'maintenance': False}
MainThread::INFO::2015-07-01
11:03:53,060::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host mc-place-compute-02-live.mc.mcon.net (id 2): {'extra':
'metadata_parse_versi
on=1\nmetadata_feature_version=1\ntimestamp=226246 (Wed Jul 1 11:02:11
2015)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': 'mc-place-compute-02-live.mc.mcon.net', 'alive': True, 'h
ost-id': 2, 'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False, 'host-ts': 226246}
MainThread::INFO::2015-07-01
11:03:53,060::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host mc-place-compute-03-live.mc.mcon.net (id 3): {'extra':
'metadata_parse_versi
on=1\nmetadata_feature_version=1\ntimestamp=226256 (Wed Jul 1 11:02:15
2015)\nhost-id=3\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': 'mc-place-compute-03-live.mc.mcon.net', 'alive': True, 'h
ost-id': 3, 'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False, 'host-ts': 226256}
MainThread::INFO::2015-07-01
11:03:53,061::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host mc-place-compute-04-live.mc.mcon.net (id 4): {'extra':
'metadata_parse_versi
on=1\nmetadata_feature_version=1\ntimestamp=226300 (Wed Jul 1 11:02:11
2015)\nhost-id=4\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': 'mc-place-compute-04-live.mc.mcon.net', 'alive': True, 'h
ost-id': 4, 'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False, 'host-ts': 226300}
MainThread::INFO::2015-07-01
11:03:53,061::state_machine::168::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Local (id 1): {'engine-health': {'reason': 'failed liveliness check',
'health': '
bad', 'vm': 'up', 'detail': 'up'}, 'bridge': True, 'mem-free': 136637.0,
'maintenance': False, 'cpu-load': 0.0035, 'gateway': True}
MainThread::ERROR::2015-07-01
11:03:53,061::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 300 seconds
MainThread::INFO::2015-07-01
11:03:53,081::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Timeout set to Wed Jul 1 11:08:53 2015 while transitioning <class
'ovirt_hosted_
engine_ha.agent.states.EngineUpBadHealth'> -> <class
'ovirt_hosted_engine_ha.agent.states.EngineUpBadHealth'>
MainThread::INFO::2015-07-01
11:03:53,530::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:03:53,530::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01
11:04:03,559::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 289 seconds
MainThread::INFO::2015-07-01
11:04:03,980::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:04:03,980::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01
11:04:14,007::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 279 seconds
MainThread::INFO::2015-07-01
11:04:14,478::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:04:14,478::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01
11:04:24,505::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 268 seconds
MainThread::INFO::2015-07-01
11:04:24,994::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:04:24,994::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
Best Regards
Carles Cortes Costa
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users