Hi list,

I have a hosted engine running on a CentOS 8. The engine and all the VM's are stored on a 4-node gluster. I had some issues with the gluster and then the hosted-engine stopped working (even though the  virtualization dashboard  showed 4 virtual machines running). I tried to "systemctl restart" the hosted-engine, but it failed. I try to reboot the server and the hosted-engine still will not come up. Note that the server has no issue mounting the gluster:

   $ df
   hydra1:/MRIData           390664407040 20530130012 370134277028   6% 
/rhev/data-center/mnt/glusterSD/hydra1:_MRIData
   $ ls -l 
/rhev/data-center/mnt/glusterSD/hydra1\:_MRIData/6547dc22-b89e-4f14-8958-c9e8d27b29a4/
   drwxr-xr-x.  2 vdsm kvm 4.0K Mar 29 12:24 dom_md
   drwxr-xr-x.  2 vdsm kvm 4.0K Jul 20 17:47 ha_agent
   drwxr-xr-x. 12 vdsm kvm 4.0K Apr  1 16:32 images
   drwxr-xr-x.  4 vdsm kvm 4.0K Mar 29 12:24 master

Where "hydra1" is one of my gluster nodes and MRIData is the volume name.

Here is the relevant snippet from /var/log/ovirt-hosted-engine-ha/agent.log

   MainThread::INFO::2021-07-20 
17:29:07,584::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
ovirt-hosted-engine-ha agent 2.4.6 started
   MainThread::INFO::2021-07-20 
17:29:07,594::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
 Certificate common name not found, using hostname to identify host
   MainThread::INFO::2021-07-20 
17:29:07,635::hosted_engine::548::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Initializing ha-broker connection
   MainThread::INFO::2021-07-20 
17:29:07,636::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
 Starting monitor network, options {'addr': '192.168.39.65', 'network_test': 
'dns', 'tcp_t_address': '', 'tcp_t_port': ''}
   MainThread::ERROR::2021-07-20 
17:29:07,636::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
 Failed to start necessary monitors
   MainThread::ERROR::2021-07-20 
17:29:07,637::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
 Traceback (most recent call last):
      File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 85, in start_monitor
        response = self._proxy.start_monitor(type, options)
      File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
        return self.__send(self.__name, args)
      File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
        verbose=self.__verbose
      File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
        return self.single_request(host, handler, request_body, verbose)
      File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
        http_conn = self.send_request(host, handler, request_body, verbose)
      File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
        self.send_content(connection, request_body)
      File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
        connection.endheaders(request_body)
      File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders
        self._send_output(message_body, encode_chunked=encode_chunked)
      File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output
        self.send(msg)
      File "/usr/lib64/python3.6/http/client.py", line 974, in send
        self.connect()
      File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 
74, in connect
        self.sock.connect(base64.b16decode(self.host))
   FileNotFoundError: [Errno 2] No such file or directory

   During handling of the above exception, another exception occurred:

   Traceback (most recent call last):
      File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
131, in _run_agent
        return action(he)
      File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 
55, in action_proper
        return he.start_monitoring()
      File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 437, in start_monitoring
        self._initialize_broker()
      File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
 line 561, in _initialize_broker
        m.get('options', {}))
      File 
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 91, in start_monitor
        ).format(t=type, o=options, e=e)
   ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to 
start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, 
[monitor: 'network', options: {'addr': '192.168.39.65', 'network_test': 'dns', 
'tcp_t_address': '', 'tcp_t_port': ''}]

   MainThread::ERROR::2021-07-20 
17:29:07,637::agent::144::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
 Trying to restart agent
   MainThread::INFO::2021-07-20 
17:29:07,637::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent 
shutting down

I'm puzzled by that "Certificate common name not found", which I had not seen before. The fqdn of the hosted engine resolves fine on the server, so does the fqdn of the server itself. The ip address it seems to try to use for the network is that of one of the university's gateways.

Any ideas ? Any way to debug this further ?

Thanks in advance,

--
Valerio Luccio             (212) 998-8736
Center for Brain Imaging   4 Washington Place, Room 157
New York University        New York, NY 10003

    "In an open world, who needs windows or gates ?"

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/DTEY6OTBIWNX2DKGA7M7ZGUZZNQJYZXW/

Reply via email to