Can you please check/attach also /var/log/ovirt-hosted-engine-ha/broker.log
and /var/log/vdsm/vdsm.log ?

On Tue, Mar 19, 2019 at 11:36 AM ada per <adap...@gmail.com> wrote:

> Hello everyone,
>
> For a strange reason the hosted engine went down and I cannot restart it.
> I tried manually restarting it without any success can you please advice?
>
> For all the nodes the engine status is the same as the one below.
> --== Host nodex. (id: 6) status ==--
> conf_on_shared_storage             : True
> Status up-to-date                  : True
> Hostname                           : nodex
> Host ID                            : 6
> Engine status                      : {"reason": "bad vm status", "health":
> "bad", "vm": "down_unexpected", "detail": "Down"}
> Score                              : 3400
> stopped                            : False
> Local maintenance                  : False
> crc32                              : 323a9f45
> local_conf_timestamp               : 2648874
> Host timestamp                     : 2648874
> Extra metadata (valid at timestamp):
>         metadata_parse_version=1
>         metadata_feature_version=1
>         timestamp=2648874 (Tue Mar 19 12:25:44 2019)
>         host-id=6
>         score=3400
>         vm_conf_refresh_time=2648874 (Tue Mar 19 12:25:44 2019)
>         conf_on_shared_storage=True
>         maintenance=False
>         state=GlobalMaintenance
>         stopped=False
>
> When I try the commands
> root@node5# hosted-engine --vm-shutdown
> I ge the response:
> root@node5# Command VM.shutdown with args {'delay': '120', 'message': 'VM
> is shutting down!', 'vmID': 'a492d2eb-1dfd-470d-a141-3e55d2189275'}
> failed:(code=1, message=Virtual machine does not exist)
>
> But when I run  : hosted-engine --vm-start
> I get the response: VM exists and is down, cleaning up and restarting
>
>
>
> Below you can see the # journalctl -u ovirt-ha-agent logs
>
> Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Unhandled
> monitoring loop exception
>                                                           Traceback (most
> recent call last):
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 430, in start_monitoring
>
> self._monitoring_loop()
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 449, in _monitoring_loop
>                                                               for
> old_state, state, delay in self.fsm:
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py",
> line 127, in next
>                                                               new_data =
> self.refresh(self._state.data)
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py",
> line 81, in refresh
>
> stats.update(self.hosted_engine.collect_stats())
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 737, in collect_stats
>                                                               all_stats =
> self._broker.get_stats_from_storage()
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 143, in get_stats_from_storage
>                                                               result =
> self._proxy.get_stats()
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
>                                                               return
> self.__send(self.__name, args)
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
>
> verbose=self.__verbose
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
>                                                               return
> self.single_request(host, handler, request_body, verbose)
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
>
> self.send_content(h, request_body)
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
>
> connection.endheaders(request_body)
>                                                             File
> "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
>
> self._send_output(message_body)
>                                                             File
> "/usr/lib64/python2.7/httplib.py", line 881, in _send_output
>
> self.send(msg)
>                                                             File
> "/usr/lib64/python2.7/httplib.py", line 843, in send
>
> self.connect()
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py",
> line 52, in connect
>
> self.sock.connect(base64.b16decode(self.host))
>                                                             File
> "/usr/lib64/python2.7/socket.py", line 224, in meth
>                                                               return
> getattr(self._sock,name)(*args)
>                                                           error: [Errno 2]
> No such file or directory
> Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call
> last):
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 131, in _run_agent
>                                                               return
> action(he)
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 55, in action_proper
>                                                               return
> he.start_monitoring()
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 437, in start_monitoring
>
> self.publish(stopped)
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 337, in publish
>
> self._push_to_storage(blocks)
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 708, in _push_to_storage
>
> self._broker.put_stats_on_storage(self.host_id, blocks)
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 113, in put_stats_on_storage
>
> self._proxy.put_stats(host_id, xmlrpclib.Binary(data))
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1233, in __call__
>                                                               return
> self.__send(self.__name, args)
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1591, in __request
>
> verbose=self.__verbose
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1273, in request
>                                                               return
> self.single_request(host, handler, request_body, verbose)
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1301, in single_request
>
> self.send_content(h, request_body)
>                                                             File
> "/usr/lib64/python2.7/xmlrpclib.py", line 1448, in send_content
>
> connection.endheaders(request_body)
>                                                             File
> "/usr/lib64/python2.7/httplib.py", line 1037, in endheaders
>
> self._send_output(message_body)
>                                                             File
> "/usr/lib64/python2.7/httplib.py", line 881, in _send_output
>
> self.send(msg)
>                                                             File
> "/usr/lib64/python2.7/httplib.py", line 843, in send
>
> self.connect()
>                                                             File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py",
> line 52, in connect
>
> self.sock.connect(base64.b16decode(self.host))
>                                                             File
> "/usr/lib64/python2.7/socket.py", line 224, in meth
>                                                               return
> getattr(self._sock,name)(*args)
>                                                           error: [Errno 2]
> No such file or directory
> Mar 14 12:04:42 node7. ovirt-ha-agent[4134]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
> Mar 14 12:04:42 node7. systemd[1]: ovirt-ha-agent.service: main process
> exited, code=exited, status=157/n/a
> Mar 14 12:04:42 node7. systemd[1]: Unit ovirt-ha-agent.service entered
> failed state.
> Mar 14 12:04:42 node7. systemd[1]: ovirt-ha-agent.service failed.
> Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service holdoff time
> over, scheduling restart.
> Mar 14 12:04:52 node7. systemd[1]: Stopped oVirt Hosted Engine High
> Availability Monitoring Agent.
> Mar 14 12:04:52 node7. systemd[1]: Started oVirt Hosted Engine High
> Availability Monitoring Agent.
> Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to
> start necessary monitors
> Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Traceback (most recent call
> last):
>                                                              File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 131, in _run_agent
>                                                                return
> action(he)
>                                                              File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
> line 55, in action_proper
>                                                                return
> he.start_monitoring()
>                                                              File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 413, in start_monitoring
>
>  self._initialize_broker()
>                                                              File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
> line 537, in _initialize_broker
>
>  m.get('options', {}))
>                                                              File
> "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
> line 86, in start_monitor
>
>  ).format(t=type, o=options, e=e)
>                                                            RequestError:
> brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such
> file or directory, [monitor: 'ping', options: {'addr': '19
> Mar 14 12:04:52 node7. ovirt-ha-agent[31765]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.agent.Agent ERROR Trying to restart agent
> Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service: main process
> exited, code=exited, status=157/n/a
> Mar 14 12:04:52 node7. systemd[1]: Unit ovirt-ha-agent.service entered
> failed state.
> Mar 14 12:04:52 node7. systemd[1]: ovirt-ha-agent.service failed.
> Mar 14 12:05:02 node7. systemd[1]: ovirt-ha-agent.service holdoff time
> over, scheduling restart.
> Mar 14 12:05:02 node7. systemd[1]: Stopped oVirt Hosted Engine High
> Availability Monitoring Agent.
> Mar 14 12:05:02 node7. systemd[1]: Started oVirt Hosted Engine High
> Availability Monitoring Agent.
> Mar 14 12:06:55 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to
> stop engine vm with /usr/sbin/hosted-engine --vm-poweroff: Co
>                                                            (code=1,
> message=Virtual machine does not exist: {'vmId':
> u'a492d2eb-1dfd-470d-a141-3e55d2189275'})
> Mar 14 12:06:55 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Failed to
> stop engine VM: Command VM.destroy with args {'vmID': 'a492d2
>                                                            (code=1,
> message=Virtual machine does not exist: {'vmId':
> u'a492d2eb-1dfd-470d-a141-3e55d2189275'})
> Mar 15 14:28:16 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:28:36 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:29:00 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:29:22 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:29:44 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:30:06 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:30:28 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:30:50 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:31:12 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:31:33 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:31:56 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:32:18 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:32:40 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> Mar 15 14:33:02 node7. ovirt-ha-agent[31822]: ovirt-ha-agent
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine ERROR Engine VM
> stopped on localhost
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/NS2SASAK66TEO3MZQYIW64HCDLXVTIL6/
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QEZFF5FUUNH43BGBFD6CXQO7K63223HD/

Reply via email to