On Wed, Jul 21, 2021 at 1:01 AM Valerio Luccio <[email protected]> wrote:
>
> Hi list,
>
> I have a hosted engine running on a CentOS 8. The engine and all the VM's are 
> stored on a 4-node gluster. I had some issues with the gluster and then the 
> hosted-engine stopped working (even though the  virtualization dashboard  
> showed 4 virtual machines running). I tried to "systemctl restart" the 
> hosted-engine, but it failed. I try to reboot the server and the 
> hosted-engine still will not come up. Note that the server has no issue 
> mounting the gluster:
>
> $ df
> hydra1:/MRIData           390664407040 20530130012 370134277028   6% 
> /rhev/data-center/mnt/glusterSD/hydra1:_MRIData
> $ ls -l 
> /rhev/data-center/mnt/glusterSD/hydra1\:_MRIData/6547dc22-b89e-4f14-8958-c9e8d27b29a4/
> drwxr-xr-x.  2 vdsm kvm 4.0K Mar 29 12:24 dom_md
> drwxr-xr-x.  2 vdsm kvm 4.0K Jul 20 17:47 ha_agent
> drwxr-xr-x. 12 vdsm kvm 4.0K Apr  1 16:32 images
> drwxr-xr-x.  4 vdsm kvm 4.0K Mar 29 12:24 master
>
> Where "hydra1" is one of my gluster nodes and MRIData is the volume name.
>
> Here is the relevant snippet from /var/log/ovirt-hosted-engine-ha/agent.log
>
> MainThread::INFO::2021-07-20 
> 17:29:07,584::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> ovirt-hosted-engine-ha agent 2.4.6 started
> MainThread::INFO::2021-07-20 
> 17:29:07,594::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>  Certificate common name not found, using hostname to identify host
> MainThread::INFO::2021-07-20 
> 17:29:07,635::hosted_engine::548::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>  Initializing ha-broker connection
> MainThread::INFO::2021-07-20 
> 17:29:07,636::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>  Starting monitor network, options {'addr': '192.168.39.65', 'network_test': 
> 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}
> MainThread::ERROR::2021-07-20 
> 17:29:07,636::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>  Failed to start necessary monitors
> MainThread::ERROR::2021-07-20 
> 17:29:07,637::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>  Traceback (most recent call last):
>   File 
> "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 85, in start_monitor
>     response = self._proxy.start_monitor(type, options)
>   File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
>     return self.__send(self.__name, args)
>   File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
>     verbose=self.__verbose
>   File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
>     return self.single_request(host, handler, request_body, verbose)
>   File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
>     http_conn = self.send_request(host, handler, request_body, verbose)
>   File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
>     self.send_content(connection, request_body)
>   File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
>     connection.endheaders(request_body)
>   File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
>     self._send_output(message_body, encode_chunked=encode_chunked)
>   File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
>     self.send(msg)
>   File "/usr/lib64/python3.6/http/client.py", line 974, in send
>     self.connect()
>   File 
> "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", 
> line 74, in connect
>     self.sock.connect(base64.b16decode(self.host))
> FileNotFoundError: [Errno 2] No such file or directory

This seems to indicate that the broker is down. Can you check it,
please - log, restart, status, etc.?

>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File 
> "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
> line 131, in _run_agent
>     return action(he)
>   File 
> "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", 
> line 55, in action_proper
>     return he.start_monitoring()
>   File 
> "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 437, in start_monitoring
>     self._initialize_broker()
>   File 
> "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
>  line 561, in _initialize_broker
>     m.get('options', {}))
>   File 
> "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
> line 91, in start_monitor
>     ).format(t=type, o=options, e=e)
> ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to 
> start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, 
> [monitor: 'network', options: {'addr': '192.168.39.65', 'network_test': 
> 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}]
>
> MainThread::ERROR::2021-07-20 
> 17:29:07,637::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
>  Trying to restart agent
> MainThread::INFO::2021-07-20 
> 17:29:07,637::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> Agent shutting down
>
> I'm puzzled by that "Certificate common name not found", which I had not seen 
> before.

This seems like a harmless bug. Now filed it, mainly for reference -
not sure it's worth fixing:

https://bugzilla.redhat.com/show_bug.cgi?id=1984262

> The fqdn of the hosted engine resolves fine on the server, so does the fqdn 
> of the server itself. The ip address it seems to try to use for the network 
> is that of one of the university's gateways.
>
> Any ideas ? Any way to debug this further ?

See above - check the broker.

Thanks and best regards,
-- 
Didi
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/OHGLOO5WJ4OTZCUFZKQP4DALPQX3G7UG/

Reply via email to