In engine, i have Hosted Engine HA: not active for my host1 Hosted Engine HA: active (score 0) for my host2
2014-04-23 13:52 GMT+02:00 Jiri Moskovcak <[email protected]>: > Hi, > I'm not sure yet what causes the problem, but the workaround should be: > > open file > /usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py > in your favorite editor, go to line 52 and change it: > > from: except ValueError: > to: except (ValueError, TypeError): > > --Jirka > > > On 04/23/2014 12:43 PM, Kevin Tibi wrote: > >> Hi, >> >> /var/log/ovirt-hosted-engine-ha/broker.log >> >> Host1: >> Thread-118327::INFO::2014-04-23 >> 12:34:59,360::listener::134::ovirt_hosted_engine_ha.broker. >> listener.ConnectionHandler::(setup) >> Connection established >> Thread-118327::INFO::2014-04-23 >> 12:34:59,375::listener::184::ovirt_hosted_engine_ha.broker. >> listener.ConnectionHandler::(handle) >> Connection closed >> Thread-118328::INFO::2014-04-23 >> 12:35:14,546::listener::134::ovirt_hosted_engine_ha.broker. >> listener.ConnectionHandler::(setup) >> Connection established >> Thread-118328::INFO::2014-04-23 >> 12:35:14,549::listener::184::ovirt_hosted_engine_ha.broker. >> listener.ConnectionHandler::(handle) >> Connection closed >> >> Host2: >> Thread-4::INFO::2014-04-23 >> 12:36:08,020::mem_free::53::mem_free.MemFree::(action >> ) memFree: 9816 >> Thread-3::INFO::2014-04-23 >> 12:36:08,240::mgmt_bridge::59::mgmt_bridge.MgmtBridge >> ::(action) Found bridge ovirtmgmt >> Thread-296455::INFO::2014-04-23 >> 12:36:08,678::listener::134::ovirt_hosted_engine >> _ha.broker.listener.ConnectionHandler::(setup) Connection established >> Thread-296455::INFO::2014-04-23 >> 12:36:08,684::listener::184::ovirt_hosted_engine >> _ha.broker.listener.ConnectionHandler::(handle) Connection closed >> >> >> >> /var/log/ovirt-hosted-engine-ha/agent.log >> >> host1: >> >> MainThread::INFO::2014-04-02 >> 17:46:14,856::state_decorators::25::ovirt_hosted_en >> gine_ha.agent.hosted_engine.HostedEngine::(check) Unknown local >> engine vm status no actions taken >> MainThread::INFO::2014-04-02 >> 17:46:14,857::brokerlink::108::ovirt_hosted_engine_ >> ha.lib.brokerlink.BrokerLink::(notify) Trying: notify >> time=1396453574.86 type=st ate_transition >> detail=UnknownLocalVmState-UnknownLocalVmState hostname='host01.o >> virt.lan' >> MainThread::INFO::2014-04-02 >> 17:46:14,858::brokerlink::117::ovirt_hosted_engine_ >> ha.lib.brokerlink.BrokerLink::(notify) Success, was notification >> of state_transi tion >> (UnknownLocalVmState-UnknownLocalVmState) sent? ignored >> MainThread::WARNING::2014-04-02 >> 17:46:15,463::hosted_engine::334::ovirt_hosted_e >> ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Error >> while monito ring engine: float() argument >> must be a string or a number >> MainThread::WARNING::2014-04-02 >> 17:46:15,464::hosted_engine::337::ovirt_hosted_e >> ngine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Unexpected error >> Traceback (most recent call last): >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_eng >> ine.py", line 323, in start_monitoring >> state.score(self._log)) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py" >> , line 160, in score >> lm, logger, score, score_cfg) >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py" >> , line 61, in _penalize_memory >> if self._float_or_default(lm['mem-free'], 0) < vm_mem: >> File >> "/usr/lib/python2.6/site-packages/ovirt_hosted_engine_ha/agent/states.py" >> , line 51, in _float_or_default >> return float(value) >> TypeError: float() argument must be a string or a number >> MainThread::ERROR::2014-04-02 >> 17:46:15,464::hosted_engine::350::ovirt_hosted_eng >> ine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) >> Shutting down the ag ent because of 3 failures >> in a row! >> MainThread::INFO::2014-04-02 >> 17:46:15,466::agent::116::ovirt_hosted_engine_ha.ag >> <http://ovirt_hosted_engine_ha.ag> >> >> ent.agent.Agent::(run) Agent shutting down >> >> >> host2: >> >> MainThread::INFO::2014-04-23 >> 12:36:44,800::hosted_engine::323::ovirt_hosted_engine_ha. >> agent.hosted_engine.HostedEngine::(start_monitoring) >> Current state EngineUnexpectedlyDown (score: 0) >> MainThread::INFO::2014-04-23 >> 12:36:54,844::brokerlink::108::ovirt_hosted_engine_ha.lib. >> brokerlink.BrokerLink::(notify) >> Trying: notify time=1398249414.84 type=state_transition >> detail=EngineUnexpectedlyDown-EngineUnexpectedlyDown >> hostname='host02.ovirt.lan' >> MainThread::INFO::2014-04-23 >> 12:36:54,846::brokerlink::117::ovirt_hosted_engine_ha.lib. >> brokerlink.BrokerLink::(notify) >> Success, was notification of state_transition >> (EngineUnexpectedlyDown-EngineUnexpectedlyDown) sent? ignored >> >> /var/log/vdsm/vdsm.log >> >> host1 : >> >> Thread-116::DEBUG::2014-04-23 >> 12:40:17,060::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd >> iflag=direct >> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_iso/ >> cc51143e-8ad7-4b0b-a4d2-9024dffc1188/dom_md/metadata >> bs=4096 count=1' (cwd None) >> Thread-116::DEBUG::2014-04-23 >> 12:40:17,070::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: >> <err> = '0+1 records in\n0+1 records out\n343 bytes (343 B) copied, >> 0.000183642 s, 1.9 MB/s\n'; <rc> = 0 >> Thread-37::DEBUG::2014-04-23 >> 12:40:17,504::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd >> iflag=direct >> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_NFS01/ >> aea040f8-ab9d-435b-9ecf-ddd4272e592f/dom_md/metadata >> bs=4096 count=1' (cwd None) >> Thread-37::DEBUG::2014-04-23 >> 12:40:17,514::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: >> <err> = '0+1 records in\n0+1 records out\n472 bytes (472 B) copied, >> 0.000165064 s, 2.9 MB/s\n'; <rc> = 0 >> Thread-11736::DEBUG::2014-04-23 >> 12:40:18,170::task::595::TaskManager.Task::(_updateState) >> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::moving from state init -> >> state preparing >> Thread-11736::INFO::2014-04-23 >> 12:40:18,170::logUtils::44::dispatcher::(wrapper) Run and protect: >> repoStats(options=None) >> Thread-11736::INFO::2014-04-23 >> 12:40:18,171::logUtils::47::dispatcher::(wrapper) Run and protect: >> repoStats, Return response: {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': >> {'code': 0, 'version': 3, 'acquired': True, 'delay': '0.000165064', >> 'lastCheck': '0.7', 'valid': True}, >> '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0, 'version': 3, >> 'acquired': True, 'delay': '0.000174536', 'lastCheck': '3.0', 'valid': >> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0, 'version': 0, >> 'acquired': True, 'delay': '0.000183642', 'lastCheck': '1.1', 'valid': >> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0, 'version': 0, >> 'acquired': True, 'delay': '0.00045492', 'lastCheck': '8.6', 'valid': >> True}} >> Thread-11736::DEBUG::2014-04-23 >> 12:40:18,171::task::1185::TaskManager.Task::(prepare) >> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::finished: >> {'aea040f8-ab9d-435b-9ecf-ddd4272e592f': {'code': 0, 'version': 3, >> 'acquired': True, 'delay': '0.000165064', 'lastCheck': '0.7', 'valid': >> True}, '5ae613a4-44e4-42cb-89fc-7b5d34c1f30f': {'code': 0, 'version': 3, >> 'acquired': True, 'delay': '0.000174536', 'lastCheck': '3.0', 'valid': >> True}, 'cc51143e-8ad7-4b0b-a4d2-9024dffc1188': {'code': 0, 'version': 0, >> 'acquired': True, 'delay': '0.000183642', 'lastCheck': '1.1', 'valid': >> True}, 'ff98d346-4515-4349-8437-fb2f5e9eaadf': {'code': 0, 'version': 0, >> 'acquired': True, 'delay': '0.00045492', 'lastCheck': '8.6', 'valid': >> True}} >> Thread-11736::DEBUG::2014-04-23 >> 12:40:18,172::task::595::TaskManager.Task::(_updateState) >> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::moving from state preparing >> -> state finished >> Thread-11736::DEBUG::2014-04-23 >> 12:40:18,172::resourceManager::940::ResourceManager.Owner::(releaseAll) >> Owner.releaseAll requests {} resources {} >> Thread-11736::DEBUG::2014-04-23 >> 12:40:18,172::resourceManager::977::ResourceManager.Owner::(cancelAll) >> Owner.cancelAll requests {} >> Thread-11736::DEBUG::2014-04-23 >> 12:40:18,172::task::990::TaskManager.Task::(_decref) >> Task=`8a3a3e42-6e79-4849-9b1c-cad895722884`::ref 0 aborting False >> Thread-299::DEBUG::2014-04-23 >> 12:40:19,599::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bin/dd >> iflag=direct >> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_export/ >> ff98d346-4515-4349-8437-fb2f5e9eaadf/dom_md/metadata >> bs=4096 count=1' (cwd None) >> Thread-299::DEBUG::2014-04-23 >> 12:40:19,610::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCCESS: >> <err> = '0+1 records in\n0+1 records out\n352 bytes (352 B) copied, >> 0.000525872 s, 669 kB/s\n'; <rc> = 0 >> >> >> host2 : >> >> Thread-1688899::DEBUG::2014-04-23 >> 12:41:30,270::task::990::TaskManager.Task::(_decref) Task=`c23aeaf >> 5-aed4-4285-a8c9-2bffadc0240e`::ref 0 aborting >> False >> Thread-159126::DEBUG::2014-04-23 >> 12:41:30,547::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bi >> n/dd iflag=direct >> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_iso/ >> cc51143e-8ad7-4b0b-a4d2-9024df >> fc1188/dom_md/metadata bs=4096 count=1' >> (cwd None) >> Thread-159126::DEBUG::2014-04-23 >> 12:41:30,569::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCC >> ESS: <err> = '0+1 records in\n0+1 records >> out\n343 bytes (343 B) copied, 0.000480513 s, 714 kB/s\n'; >> <rc> = 0 >> Thread-159125::DEBUG::2014-04-23 >> 12:41:30,740::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bi >> n/dd iflag=direct >> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_DATA/ >> 5ae613a4-44e4-42cb-89fc-7b5d3 >> 4c1f30f/dom_md/metadata bs=4096 count=1' >> (cwd None) >> Thread-159125::DEBUG::2014-04-23 >> 12:41:30,762::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCC >> ESS: <err> = '0+1 records in\n0+1 records >> out\n545 bytes (545 B) copied, 0.000382036 s, 1.4 MB/s\n'; >> <rc> = 0 >> Thread-159128::DEBUG::2014-04-23 >> 12:41:32,226::fileSD::225::Storage.Misc.excCmd::(getReadDelay) '/bi >> n/dd iflag=direct >> if=/rhev/data-center/mnt/host01.ovirt.lan:_home_export/ >> ff98d346-4515-4349-8437-fb2 >> f5e9eaadf/dom_md/metadata bs=4096 count=1' >> (cwd None) >> Thread-159128::DEBUG::2014-04-23 >> 12:41:32,245::fileSD::225::Storage.Misc.excCmd::(getReadDelay) SUCC >> ESS: <err> = '0+1 records in\n0+1 records >> out\n352 bytes (352 B) copied, 0.000648972 s, 542 kB/s\n'; >> <rc> = 0 >> >> >> >> 2014-04-23 0:21 GMT+02:00 Doron Fediuck <[email protected] >> <mailto:[email protected]>>: >> >> >> >> >> ----- Original Message ----- >> > From: "Kevin Tibi" <[email protected] >> <mailto:[email protected]>> >> > To: "users" <[email protected] <mailto:[email protected]>> >> > Sent: Tuesday, April 22, 2014 2:12:50 PM >> > Subject: [ovirt-users] Hosted Engine error -243 >> > >> > Hi all, >> > >> > I have a probleme with my hosted engine. Every 10 min i have a >> event in >> > engine : >> > >> > VM HostedEngine is down. Exit message: internal error Failed to >> acquire lock: >> > error -243 >> > >> > My data is a local export NFS. >> > >> > Thx for you help. >> > >> > Kevin. >> > >> >> Hi Kevin, >> can you please check the /var/log/ovirt-hosted-* log files in your >> hosts >> and let us know if you see something else there or in your vdsm log >> file? >> _______________________________________________ >> Users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.ovirt.org/mailman/listinfo/users >> >> >> >> >> >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/users >> >> > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users >
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

