On 07/01/2015 06:52 AM, Carles Costa wrote:
Dear Experts,
I am experiencing a problem with ovirt, every hour the hosted engine
will shutdown and reboot. The engine-status value will move to
"unknown stale-data" every second minute, and then the hosted engine
will be again operative 14 minutes after that. As far as I can see the
scores remain in 2400 at all times, and seems I have a liveliness
check failing, but I am not able to find why.
Why I have this problem every hour exactly?
Why the liveliness check fails?
I would appreciate if someone can bring some light, I am new to ovirt
but I really like it so far.
Hi Carles, and welcome.
The agent will try to a servlet running on the engine VM in
http://{ENGINE_IP}/OvirtEngineWeb/HealthStatus
Also debug log will help if we can't resolve this - see the conf the
change it
/etc/ovirt-hosted-engine-ha/broker-log.conf
/etc/ovirt-hosted-engine-ha/agent-log.conf
During the period the machine is down I can see this messages on the
/var/log/ovirt-hosted-engine-ha/broker.log :
Thread-170803::INFO::2015-07-01
11:06:47,335::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170803::INFO::2015-07-01
11:06:47,342::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170804::INFO::2015-07-01
11:06:47,343::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170804::INFO::2015-07-01
11:06:47,344::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170805::INFO::2015-07-01
11:06:47,344::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170805::INFO::2015-07-01
11:06:47,346::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170806::INFO::2015-07-01
11:06:47,346::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170806::INFO::2015-07-01
11:06:47,348::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-170807::INFO::2015-07-01
11:06:47,348::listener::134::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(setup)
Connection established
Thread-170807::INFO::2015-07-01
11:06:47,350::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
Connection closed
Thread-7::INFO::2015-07-01
11:06:50,394::cpu_load_no_engine::121::cpu_load_no_engine.EngineHealth::(calculate_load)
System load total=0.0095, engine=0.0046, non-engine=0.0049
Thread-8::WARNING::2015-07-01
11:06:50,464::engine_health::116::engine_health.CpuLoadNoEngine::(action)
bad health status: Hosted Engine is not up!
and here the /var/log/ovirt-hosted-engine-ha/agent.log :
MainThread::INFO::2015-07-01
11:01:04,216::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2015-07-01
11:01:04,217::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:01:14,682::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2015-07-01
11:01:14,682::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:01:24,724::states::393::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine vm running on localhost
MainThread::INFO::2015-07-01
11:01:25,174::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUp (score: 2400)
MainThread::INFO::2015-07-01
11:01:25,174::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:01:35,234::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Trying: notify time=1435719695.23 type=state_transition
detail=EngineUp-EngineUpBadHealth ho
stname='mc-place-compute-01-live.mc.mcon.net'
MainThread::INFO::2015-07-01
11:03:42,536::brokerlink::120::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify)
Success, was notification of state_transition
(EngineUp-EngineUpBadHealth) sent? ignored
MainThread::INFO::2015-07-01
11:03:43,018::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:03:43,018::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::INFO::2015-07-01
11:03:53,060::state_machine::160::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Global metadata: {'maintenance': False}
MainThread::INFO::2015-07-01
11:03:53,060::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host mc-place-compute-02-live.mc.mcon.net (id 2): {'extra':
'metadata_parse_versi
on=1\nmetadata_feature_version=1\ntimestamp=226246 (Wed Jul 1
11:02:11
2015)\nhost-id=2\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': 'mc-place-compute-02-live.mc.mcon.net', 'alive': True, 'h
ost-id': 2, 'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False, 'host-ts': 226246}
MainThread::INFO::2015-07-01
11:03:53,060::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host mc-place-compute-03-live.mc.mcon.net (id 3): {'extra':
'metadata_parse_versi
on=1\nmetadata_feature_version=1\ntimestamp=226256 (Wed Jul 1
11:02:15
2015)\nhost-id=3\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': 'mc-place-compute-03-live.mc.mcon.net', 'alive': True, 'h
ost-id': 3, 'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False, 'host-ts': 226256}
MainThread::INFO::2015-07-01
11:03:53,061::state_machine::165::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Host mc-place-compute-04-live.mc.mcon.net (id 4): {'extra':
'metadata_parse_versi
on=1\nmetadata_feature_version=1\ntimestamp=226300 (Wed Jul 1
11:02:11
2015)\nhost-id=4\nscore=2400\nmaintenance=False\nstate=EngineDown\n',
'hostname': 'mc-place-compute-04-live.mc.mcon.net', 'alive': True, 'h
ost-id': 4, 'engine-status': {'reason': 'vm not running on this host',
'health': 'bad', 'vm': 'down', 'detail': 'unknown'}, 'score': 2400,
'maintenance': False, 'host-ts': 226300}
MainThread::INFO::2015-07-01
11:03:53,061::state_machine::168::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(refresh)
Local (id 1): {'engine-health': {'reason': 'failed liveliness check',
'health': '
bad', 'vm': 'up', 'detail': 'up'}, 'bridge': True, 'mem-free':
136637.0, 'maintenance': False, 'cpu-load': 0.0035, 'gateway': True}
MainThread::ERROR::2015-07-01
11:03:53,061::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 300 seconds
MainThread::INFO::2015-07-01
11:03:53,081::state_decorators::95::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check)
Timeout set to Wed Jul 1 11:08:53 2015 while transitioning <class
'ovirt_hosted_
engine_ha.agent.states.EngineUpBadHealth'> -> <class
'ovirt_hosted_engine_ha.agent.states.EngineUpBadHealth'>
MainThread::INFO::2015-07-01
11:03:53,530::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:03:53,530::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01
11:04:03,559::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 289 seconds
MainThread::INFO::2015-07-01
11:04:03,980::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:04:03,980::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01
11:04:14,007::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 279 seconds
MainThread::INFO::2015-07-01
11:04:14,478::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:04:14,478::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
0)
MainThread::ERROR::2015-07-01
11:04:24,505::states::562::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume)
Engine VM has bad health status, timeout in 268 seconds
MainThread::INFO::2015-07-01
11:04:24,994::hosted_engine::327::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Current state EngineUpBadHealth (score: 2400)
MainThread::INFO::2015-07-01
11:04:24,994::hosted_engine::332::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring)
Best remote host mc-place-compute-02-live.mc.mcon.net (id: 2, score: 240
Best Regards
Carles Cortes Costa
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users