Can you please check/attach also /var/log/ovirt-hosted-engine-ha/broker.log and /var/log/vdsm/vdsm.log ?
On Tue, Mar 19, 2019 at 11:36 AM ada per <adap...@gmail.com> wrote: > Hello everyone, > > For a strange reason the hosted engine went down and I cannot restart it. > I tried manually restarting it without any success can you please advice? > > For all the nodes the engine status is the same as the one below. > --== Host nodex. (id: 6) status ==-- > conf_on_shared_storage : True > Status up-to-date : True > Hostname : nodex > Host ID : 6 > Engine status : {"reason": "bad vm status", "health": > "bad", "vm": "down_unexpected", "detail": "Down"} > Score : 3400 > stopped : False > Local maintenance : False > crc32 : 323a9f45 > local_conf_timestamp : 2648874 > Host timestamp : 2648874 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=2648874 (Tue Mar 19 12:25:44 2019) > host-id=6 > score=3400 > vm_conf_refresh_time=2648874 (Tue Mar 19 12:25:44 2019) > conf_on_shared_storage=True > maintenance=False > state=GlobalMaintenance > stopped=False > > When I try the commands > root@node5# hosted-engine --vm-shutdown > I ge the response: > root@node5# Command VM.shutdown with args {'delay': '120', 'message': 'VM > is shutting down!', 'vmID': 'a492d2eb-1dfd-470d-a141-3e55d2189275'} > failed:(code=1, message=Virtual machine does not exist) > > But when I run : hosted-engine --vm-start > I get the response: VM exists and is down, cleaning up and restarting > > > > Below you can see the # journalctl -u ovirt-ha-agent logs > > Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Unhandled > monitoring loop exception > Traceback (most > recent call last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 430, in start_monitoring > > self._monitoring_loop() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 449, in _monitoring_loop > for > old_state, state, delay in self.fsm: > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py", > line 127, in next > new_data = > self.refresh(self._state.data) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py", > line 81, in refresh > > stats.update(self.hosted_engine.collect_stats()) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 737, in collect_stats > all_stats = > self._broker.get_stats_from_storage() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 143, in get_stats_from_storage > result = > self._proxy.get_stats() > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ > return > self.__send(self.__name, args) > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request > > verbose=self.__verbose > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request > return > self.single_request(host, handler, request_body, verbose) > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request > > self.send_content(h, request_body) > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content > > connection.endheaders(request_body) > File > "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders > > self._send_output(message_body) > File > "/usr/lib64/python2.7/httplib.py", line 881, in _send_output > > self.send(msg) > File > "/usr/lib64/python2.7/httplib.py", line 843, in send > > self.connect() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", > line 52, in connect > > self.sock.connect(base64.b16decode(self.host)) > File > "/usr/lib64/python2.7/socket.py", line 224, in meth > return > getattr(self._sock,name)(*args) > error: [Errno 2] > No such file or directory > Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call > last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 131, in _run_agent > return > action(he) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 55, in action_proper > return > he.start_monitoring() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 437, in start_monitoring > > self.publish(stopped) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 337, in publish > > self._push_to_storage(blocks) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 708, in _push_to_storage > > self._broker.put_stats_on_storage(self.host_id, blocks) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 113, in put_stats_on_storage > > self._proxy.put_stats(host_id, xmlrpclib.Binary(data)) > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__ > return > self.__send(self.__name, args) > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request > > verbose=self.__verbose > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request > return > self.single_request(host, handler, request_body, verbose) > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request > > self.send_content(h, request_body) > File > "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content > > connection.endheaders(request_body) > File > "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders > > self._send_output(message_body) > File > "/usr/lib64/python2.7/httplib.py", line 881, in _send_output > > self.send(msg) > File > "/usr/lib64/python2.7/httplib.py", line 843, in send > > self.connect() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", > line 52, in connect > > self.sock.connect(base64.b16decode(self.host)) > File > "/usr/lib64/python2.7/socket.py", line 224, in meth > return > getattr(self._sock,name)(*args) > error: [Errno 2] > No such file or directory > Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent > Mar 14 12:04:42 node7. systemd[1]: ovirt-ha-agent.service: main process > exited, code=exited, status=157/n/a > Mar 14 12:04:42 node7. systemd[1]: Unit ovirt-ha-agent.service entered > failed state. > Mar 14 12:04:42 node7. systemd[1]: ovirt-ha-agent.service failed. > Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service holdoff time > over, scheduling restart. > Mar 14 12:04:52 node7. systemd[1]: Stopped oVirt Hosted Engine High > Availability Monitoring Agent. > Mar 14 12:04:52 node7. systemd[1]: Started oVirt Hosted Engine High > Availability Monitoring Agent. > Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to > start necessary monitors > Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call > last): > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 131, in _run_agent > return > action(he) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py", > line 55, in action_proper > return > he.start_monitoring() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 413, in start_monitoring > > self._initialize_broker() > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", > line 537, in _initialize_broker > > m.get('options', {})) > File > "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", > line 86, in start_monitor > > ).format(t=type, o=options, e=e) > RequestError: > brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such > file or directory, [monitor: 'ping', options: {'addr': '19 > Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent > Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service: main process > exited, code=exited, status=157/n/a > Mar 14 12:04:52 node7. systemd[1]: Unit ovirt-ha-agent.service entered > failed state. > Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service failed. > Mar 14 12:05:02 node7. systemd[1]: ovirt-ha-agent.service holdoff time > over, scheduling restart. > Mar 14 12:05:02 node7. systemd[1]: Stopped oVirt Hosted Engine High > Availability Monitoring Agent. > Mar 14 12:05:02 node7. systemd[1]: Started oVirt Hosted Engine High > Availability Monitoring Agent. > Mar 14 12:06:55 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to > stop engine vm with /usr/sbin/hosted-engine --vm-poweroff: Co > (code=1, > message=Virtual machine does not exist: {'vmId': > u'a492d2eb-1dfd-470d-a141-3e55d2189275'}) > Mar 14 12:06:55 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to > stop engine VM: Command VM.destroy with args {'vmID': 'a492d2 > (code=1, > message=Virtual machine does not exist: {'vmId': > u'a492d2eb-1dfd-470d-a141-3e55d2189275'}) > Mar 15 14:28:16 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:28:36 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:29:00 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:29:22 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:29:44 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:30:06 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:30:28 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:30:50 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:31:12 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:31:33 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:31:56 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:32:18 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:32:40 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > Mar 15 14:33:02 node7. ovirt-ha-agent[31822]: ovirt-ha-agent > ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM > stopped on localhost > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/NS2SASAK66TEO3MZQYIW64HCDLXVTIL6/ >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/QEZFF5FUUNH43BGBFD6CXQO7K63223HD/