Hello, On Sun, Aug 8, 2021 at 9:08 PM Strahil Nikolov <hunter86...@yahoo.com> wrote:
> Usually this is not the problem. > > Start checking: > 1. Export FS is mounted > 2. NFS server is running (after all this is a single node NFS setup) > 3. Check that vdsmd , supervdsmd and sanlock are running > 4. If needed, enable debug for the ovirt-ha-{agent,broker} as usually the > default log level won't show the problem. > > Best Regards, > Strahil Nikolov > > > 1. All NFS shares are exported, hosted storage (used by the hosted engine) is mounted by oVirt. $ mount | grep rhev localhost:/exports/hosted on /rhev/data-center/mnt/localhost:_exports_hosted type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1) 2. NFS is working as expected. $ exportfs | grep exports /exports/hosted 127.0.0.1/255.255.255.0 /exports/data 127.0.0.1/255.255.255.0 /exports/iso 127.0.0.1/255.255.255.0 /exports/export 127.0.0.1/255.255.255.0 3. All services seem to run just fine (minus broker and agent). $ ps -AH | /bin/egrep -e 'vdsm|sanlock' 2282 ? 00:00:00 sanlock 2284 ? 00:00:00 sanlock-helper 5065 ? 00:00:02 supervdsmd 12259 ? 00:20:15 vdsmd 4. In both cases I can see the problem in the log. Broker: -------------------------------------------------------------------------------------------------------------------------------------------------- MainThread::INFO::2021-08-08 19:46:06,962::status_broker::121::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker::(__init__) Status broker initialized. Listener::INFO::2021-08-08 19:46:06,962::listener::44::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__) Initializing RPCServer Listener::INFO::2021-08-08 19:46:06,963::listener::57::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__) RPCServer ready StatusStorageThread::ERROR::2021-08-08 19:46:06,985::storage_broker::167::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats) Corrupted metadata from /run/vdsm/storage/9541c195-9f59-4225-91be-53391b4f1bb3/10cb67f7-6be2-47e4-9268-81fca9862057/deadf86f-b937-4172-8359-90c991dc2ecf Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 163, in get_raw_stats data = bdata.decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 4191756: invalid start byte StatusStorageThread::ERROR::2021-08-08 19:46:06,986::status_broker::98::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run) Failed to read state. Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 163, in get_raw_stats data = bdata.decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 4191756: invalid start byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py", line 94, in run self._storage_broker.get_raw_stats() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py", line 169, in get_raw_stats .format(str(e))) ovirt_hosted_engine_ha.lib.exceptions.RequestError: Corrupted read metadata: 'utf-8' codec can't decode byte 0xb9 in position 4191756: invalid start byte StatusStorageThread::ERROR::2021-08-08 19:46:06,987::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart) Trying to restart the broker Listener::INFO::2021-08-08 19:46:07,464::broker::77::ovirt_hosted_engine_ha.broker.broker.Broker::(run) Server shutting down Listener::INFO::2021-08-08 19:46:07,464::monitor::117::ovirt_hosted_engine_ha.broker.monitor.Monitor::(stop_all_submonitors) Stopping all submonitors MainThread::INFO::2021-08-08 19:46:08,060::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run) ovirt-hosted-engine-ha broker 2.4.7 started Agent: -------------------------------------------------------------------------------------------------------------------------------------------------- MainThread::INFO::2021-08-08 19:36:25,467::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor) Starting monitor network, options {'addr': '192.168.1.9', 'network_test': 'tcp', 'tcp_t_address': '192.168.1.2', 'tcp_t_port': '22'} MainThread::ERROR::2021-08-08 19:36:25,468::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker) Failed to start necessary monitors MainThread::ERROR::2021-08-08 19:36:25,470::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 85, in start_monitor response = self._proxy.start_monitor(type, options) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__ return self.__send(self.__name, args) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request verbose=self.__verbose File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request return self.single_request(host, handler, request_body, verbose) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request http_conn = self.send_request(host, handler, request_body, verbose) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request self.send_content(connection, request_body) File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content connection.endheaders(request_body) File "/usr/lib64/python3.6/http/client.py", line 1264, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1040, in _send_output self.send(msg) File "/usr/lib64/python3.6/http/client.py", line 978, in send self.connect() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py", line 74, in connect self.sock.connect(base64.b16decode(self.host)) FileNotFoundError: [Errno 2] No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 131, in _run_agent return action(he) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py", line 55, in action_proper return he.start_monitoring() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 437, in start_monitoring self._initialize_broker() File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 561, in _initialize_broker m.get('options', {})) File "/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 91, in start_monitor ).format(t=type, o=options, e=e) ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to start monitor via ovirt-ha-broker: [Errno 2] No such file or directory, [monitor: 'network', options: {'addr': '192.168.1.9', 'network_test': 'tcp', 'tcp_t_address': '192.168.1.2', 'tcp_t_port': '22'}] > > > > В неделя, 8 август 2021 г., 20:06:46 ч. Гринуич+3, Gilboa Davara < > gilb...@gmail.com> написа: > > > > > > On Sun, Aug 8, 2021 at 7:53 PM Gilboa Davara <gilb...@gmail.com> wrote: > > Hello all, > > > > During the night, one of my (smaller) setups, a single node self hosted > engine (localhost NFS) crashed due to what-looks-like a massive disk > failure (Software RAID6, with 10 drives + spare). > > After a reboot, I let the RAID resync with a fresh drive) and went on to > start oVirt. > > However, no such luck. > > Two issues: > > 1. ovirt-ha-broker fails due to broken hosted engine state (log > attached). > > 2. ovirt-ha-agent fails due to network test (tcp) even though both > remote-host and DNS servers are active. (log attached). > > > > Two questions: > > 1. Can I somehow force the agent to disable the network liveliness test? > > 2. Can I somehow force the broker to rebuild / fix the hosted engine > state? > > > > - Gilboa > > FWIW switching agent network test to none (via hosted-engine > --set-shared-config network_test none --type=he_local) doesn't seem to work. > (Unless I'm missing the point and the agent is failing due to broker > issues and not due to a failed network liveliness check). > > > - Gilboa > > > > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/OH4H5K2FZXO6YNVFU6W3XL7NHW6N5LAU/ >
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIFK3ZYOL2DBEI62UTVJANZBHT76B5FP/