Hello,

On Sun, Aug 8, 2021 at 9:08 PM Strahil Nikolov <hunter86...@yahoo.com>
wrote:

> Usually this is not the problem.
>
> Start checking:
> 1. Export FS is mounted
> 2. NFS server is running (after all this is a single node NFS setup)
> 3. Check that vdsmd , supervdsmd and sanlock are running
> 4. If needed, enable debug for the ovirt-ha-{agent,broker} as usually the
> default log level won't show the problem.
>
> Best Regards,
> Strahil Nikolov
>
>
>
1. All NFS shares are exported, hosted storage (used by the hosted engine)
is mounted by oVirt.
$ mount | grep rhev
localhost:/exports/hosted on
/rhev/data-center/mnt/localhost:_exports_hosted type nfs4
(rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1)

2. NFS is working as expected.
$ exportfs | grep exports
/exports/hosted  127.0.0.1/255.255.255.0
/exports/data 127.0.0.1/255.255.255.0
/exports/iso   127.0.0.1/255.255.255.0
/exports/export 127.0.0.1/255.255.255.0

3. All services seem to run just fine (minus broker and agent).
$ ps -AH | /bin/egrep -e 'vdsm|sanlock'
   2282 ?        00:00:00   sanlock
   2284 ?        00:00:00     sanlock-helper
   5065 ?        00:00:02   supervdsmd
  12259 ?        00:20:15   vdsmd

4. In both cases I can see the problem in the log.

Broker:
--------------------------------------------------------------------------------------------------------------------------------------------------
MainThread::INFO::2021-08-08
19:46:06,962::status_broker::121::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker::(__init__)
Status broker initialized.
Listener::INFO::2021-08-08
19:46:06,962::listener::44::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__)
Initializing RPCServer
Listener::INFO::2021-08-08
19:46:06,963::listener::57::ovirt_hosted_engine_ha.broker.listener.Listener::(__init__)
RPCServer ready
StatusStorageThread::ERROR::2021-08-08
19:46:06,985::storage_broker::167::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(get_raw_stats)
Corrupted metadata from
/run/vdsm/storage/9541c195-9f59-4225-91be-53391b4f1bb3/10cb67f7-6be2-47e4-9268-81fca9862057/deadf86f-b937-4172-8359-90c991dc2ecf
Traceback (most recent call last):
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 163, in get_raw_stats
    data = bdata.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position
4191756: invalid start byte
StatusStorageThread::ERROR::2021-08-08
19:46:06,986::status_broker::98::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(run)
Failed to read state.
Traceback (most recent call last):
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 163, in get_raw_stats
    data = bdata.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position
4191756: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/status_broker.py",
line 94, in run
    self._storage_broker.get_raw_stats()
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",
line 169, in get_raw_stats
    .format(str(e)))
ovirt_hosted_engine_ha.lib.exceptions.RequestError: Corrupted read
metadata: 'utf-8' codec can't decode byte 0xb9 in position 4191756: invalid
start byte
StatusStorageThread::ERROR::2021-08-08
19:46:06,987::status_broker::70::ovirt_hosted_engine_ha.broker.status_broker.StatusBroker.Update::(trigger_restart)
Trying to restart the broker
Listener::INFO::2021-08-08
19:46:07,464::broker::77::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
Server shutting down
Listener::INFO::2021-08-08
19:46:07,464::monitor::117::ovirt_hosted_engine_ha.broker.monitor.Monitor::(stop_all_submonitors)
Stopping all submonitors
MainThread::INFO::2021-08-08
19:46:08,060::broker::47::ovirt_hosted_engine_ha.broker.broker.Broker::(run)
ovirt-hosted-engine-ha broker 2.4.7 started

Agent:
--------------------------------------------------------------------------------------------------------------------------------------------------
MainThread::INFO::2021-08-08
19:36:25,467::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor network, options {'addr': '192.168.1.9', 'network_test':
'tcp', 'tcp_t_address': '192.168.1.2', 'tcp_t_port': '22'}
MainThread::ERROR::2021-08-08
19:36:25,468::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
Failed to start necessary monitors
MainThread::ERROR::2021-08-08
19:36:25,470::agent::143::ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent)
Traceback (most recent call last):
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 85, in start_monitor
    response = self._proxy.start_monitor(type, options)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1112, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1452, in __request
    verbose=self.__verbose
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1154, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1166, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1279, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib64/python3.6/xmlrpc/client.py", line 1309, in send_content
    connection.endheaders(request_body)
  File "/usr/lib64/python3.6/http/client.py", line 1264, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.6/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.6/http/client.py", line 978, in send
    self.connect()
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/unixrpc.py",
line 74, in connect
    self.sock.connect(base64.b16decode(self.host))
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 131, in _run_agent
    return action(he)
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/agent.py",
line 55, in action_proper
    return he.start_monitoring()
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 437, in start_monitoring
    self._initialize_broker()
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py",
line 561, in _initialize_broker
    m.get('options', {}))
  File
"/usr/lib/python3.6/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py",
line 91, in start_monitor
    ).format(t=type, o=options, e=e)
ovirt_hosted_engine_ha.lib.exceptions.RequestError: brokerlink - failed to
start monitor via ovirt-ha-broker: [Errno 2] No such file or directory,
[monitor: 'network', options: {'addr': '192.168.1.9', 'network_test':
'tcp', 'tcp_t_address': '192.168.1.2', 'tcp_t_port': '22'}]








>
>
>
> В неделя, 8 август 2021 г., 20:06:46 ч. Гринуич+3, Gilboa Davara <
> gilb...@gmail.com> написа:
>
>
>
>
>
> On Sun, Aug 8, 2021 at 7:53 PM Gilboa Davara <gilb...@gmail.com> wrote:
> > Hello all,
> >
> > During the night, one of my (smaller) setups, a single node self hosted
> engine (localhost NFS) crashed due to what-looks-like a massive disk
> failure (Software RAID6, with 10 drives + spare).
> > After a reboot, I let the RAID resync with a fresh drive) and went on to
> start oVirt.
> > However, no such luck.
> > Two issues:
> > 1. ovirt-ha-broker fails due to broken hosted engine state (log
> attached).
> > 2. ovirt-ha-agent fails due to network test (tcp) even though both
> remote-host and DNS servers are active. (log attached).
> >
> > Two questions:
> > 1. Can I somehow force the agent to disable the network liveliness test?
> > 2. Can I somehow force the broker to rebuild / fix the hosted engine
> state?
> >
> > - Gilboa
>
> FWIW switching agent network test to none (via hosted-engine
> --set-shared-config network_test none --type=he_local) doesn't seem to work.
> (Unless I'm missing the point and the agent is failing due to broker
> issues and not due to a failed network liveliness check).
>
>
> - Gilboa
>
>
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OH4H5K2FZXO6YNVFU6W3XL7NHW6N5LAU/
>
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OIFK3ZYOL2DBEI62UTVJANZBHT76B5FP/

Reply via email to