Hello, It appears that my Manager / hosted-engine isn't working, and I'm unable to get it to start.
I have a 3-node HCI cluster, but right now, Gluster is only running on 1 host (so no replication). I was hoping to upgrade / replace the storage on my 2nd host today, but aborted that maintenance when I found that I couldn't even get into the Manager. The storage is mounted, but here's what I see: > [root@cha2-storage dwhite]# hosted-engine --vm-statusThe hosted engine > configuration has not been retrieved from shared storage. Please ensure that > ovirt-ha-agent is running and the storage server is reachable. > > [root@cha2-storage dwhite]# systemctl status ovirt-ha-agent● > ovirt-ha-agent.service - oVirt Hosted Engine High Availability Monitoring > Agent > Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; enabled; > vendor preset: disabled) > Active: active (running) since Fri 2021-08-13 11:10:51 EDT; 2h 44min ago > Main PID: 3591872 (ovirt-ha-agent) > Tasks: 1 (limit: 409676) > Memory: 21.5M > CGroup: /system.slice/ovirt-ha-agent.service > └─3591872 /usr/libexec/platform-python > /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent > > Aug 13 11:10:51 cha2-storage.mgt.barredowlweb.com systemd[1]: Started oVirt > Hosted Engine High Availability Monitoring Agent. Any time I try to do anything like connect the engine storage, disconnect the engine storage, or connect to the console, it just sits there, and doesn't do anything, and I eventually have to ctl-c out of it. Maybe I have to be patient? When I ctl-c, I get a trackback error: > [root@cha2-storage dwhite]# hosted-engine --console^CTraceback (most recent > call last): > File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main > > "__main__", mod_spec) > File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code > exec(code, run_globals) > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", > line 214, in <module> > [root@cha2-storage dwhite]# args.command(args) > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", > line 42, in func > f(*args, **kwargs) > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_setup/vdsm_helper.py", > line 91, in checkVmStatus > cli = ohautil.connect_vdsm_json_rpc() > File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", > line 472, in connect_vdsm_json_rpc > __vdsm_json_rpc_connect(logger, timeout) > File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/util.py", > line 395, in __vdsm_json_rpc_connect > timeout=timeout) > File "/usr/lib/python3.6/site-packages/vdsm/client.py", line 154, in connect > outgoing_heartbeat=outgoing_heartbeat, nr_retries=nr_retries) > File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 426, > in SimpleClient > nr_retries, reconnect_interval) > File "/usr/lib/python3.6/site-packages/yajsonrpc/stompclient.py", line 448, > in StandAloneRpcClient > client = StompClient(utils.create_connected_socket(host, port, sslctx), > File "/usr/lib/python3.6/site-packages/vdsm/utils.py", line 379, in > create_connected_socket > sock.connect((host, port)) > File "/usr/lib64/python3.6/ssl.py", line 1068, in connect > self._real_connect(addr, False) > File "/usr/lib64/python3.6/ssl.py", line 1059, in _real_connect > self.do_handshake() > File "/usr/lib64/python3.6/ssl.py", line 1036, in do_handshake > self._sslobj.do_handshake() > File "/usr/lib64/python3.6/ssl.py", line 648, in do_handshake > self._sslobj.do_handshake() This is what I see in /var/log/ovirt-hosted-engine-ha/broker.log: > MainThread::WARNING::2021-08-11 > 10:24:41,596::storage_broker::100::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(__init__) > Can't connect vdsm storage: Connection to storage server failed > MainThread::ERROR::2021-08-11 > 10:24:41,596::broker::69::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > Failed initializing the broker: Connection to storage server failed > MainThread::ERROR::2021-08-11 > 10:24:41,598::broker::71::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > Traceback (most recent call last): > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", > line 64, in run > self._storage_broker_instance = self._get_storage_broker() > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/broker.py", > line 143, in _get_storage_broker > return storage_broker.StorageBroker() > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", > line 97, in __init__ > self._backend.connect() > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py", > line 375, in connect > sserver.connect_storage_server() > File > "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", > line 451, in connect_storage_server > 'Connection to storage server failed' > RuntimeError: Connection to storage server failed > > MainThread::ERROR::2021-08-11 > 10:24:41,599::broker::72::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > Trying to restart the broker > MainThread::INFO::2021-08-11 > 10:24:42,439::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) > ovirt-hosted-engine-ha broker 2.4.7 started > MainThread::INFO::2021-08-11 > 10:24:44,442::monitor::45::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Searching for submonitors in > /usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/submonitors > MainThread::INFO::2021-08-11 > 10:24:44,443::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Loaded submonitor cpu-load > MainThread::INFO::2021-08-11 > 10:24:44,449::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Loaded submonitor cpu-load-no-engine > MainThread::INFO::2021-08-11 > 10:24:44,450::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Loaded submonitor engine-health > MainThread::INFO::2021-08-11 > 10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Loaded submonitor mem-free > MainThread::INFO::2021-08-11 > 10:24:44,451::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Loaded submonitor mgmt-bridge > MainThread::INFO::2021-08-11 > 10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Loaded submonitor network > MainThread::INFO::2021-08-11 > 10:24:44,452::monitor::62::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Loaded submonitor storage-domain > MainThread::INFO::2021-08-11 > 10:24:44,452::monitor::63::ovirt_hosted_engine_ha.broker.monitor.Monitor::(_discover_submonitors) > Finished loading submonitors And I see this in /var/log/vdsm/vdsm.log: > 2021-08-13 14:08:10,844-0400 ERROR (Reactor thread) > [ProtocolDetector.AcceptorImpl] Unhandled exception in acceptor > (protocoldetector:76) > Traceback (most recent call last): > File "/usr/lib64/python3.6/asyncore.py", line 108, in readwrite > File "/usr/lib64/python3.6/asyncore.py", line 417, in handle_read_event > File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line > 57, in handle_accept > File "/usr/lib/python3.6/site-packages/yajsonrpc/betterAsyncore.py", line > 173, in _delegate_call > File "/usr/lib/python3.6/site-packages/vdsm/protocoldetector.py", line 53, > in handle_accept > File "/usr/lib64/python3.6/asyncore.py", line 348, in accept > File "/usr/lib64/python3.6/socket.py", line 205, in accept > OSError: [Errno 24] Too many open files Can anyone help? Sent with ProtonMail Secure Email.
publickey - dmwhite823@protonmail.com - 0x320CD582.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/YVD2IDBACJX4CILGSCW77WZEWAB2TLNX/