[ovirt-users] Re: Unable to start ovirt-ha-agent on all hosts

Yedidyah Bar David Sun, 26 Dec 2021 03:38:32 -0800

On Sun, Dec 26, 2021 at 12:24 PM <[email protected]> wrote:
>
> Hi list!
>
> on a hyperconverged cluster with three hosts I am unable to start the 
> ovirt-ha-agent.
>
> The history:
>
> As all three hosts were running Centos 8, I tried to upgrade host3 to Centos 
> 8 Stream first and left all VMs and host1 and host2 untouched, basically as a 
> test. After all migrations of VMs to host3 failed with:
>
> ```
> qemu-kvm: error while loading state for instance 0x0 of device 
> '0000:00:01.0/pcie-root-port'#0122021-12-24T00:56:49.428234Z
> qemu-kvm: load of migration failed: Invalid argument
> ```


IIRC something similar was reported on the lists - that you can't
(always? easily?) migrate VMs between CentOS Linux 8 (.3? .4? not
sure) and current Stream. Is this mandatory for you? If not, you might
test on a test env stopping/starting your VMs and decide this is good
enough.

>
> and since I haven't had the time to dig into that, I decided to roll back the 
> upgrade and rebooted host3 into Centos 8 again and re-installed host3 through 
> the engine appliance. During that process (and the restart of host3) the 
> engine appliance became unresponsive and crashed.

Perhaps provide more details, if you have them. Did you put host3 to
maintenance? Remove it? etc.

>
> The problem:
>
> Currently all ovirt-ha-agent services on all hosts fail with the following 
> message in /var/log/ovirt-hosted-engine-ha/agent.log
>
> ```
> MainThread::INFO::2021-12-24 
> 03:56:03,500::agent::67::ovirt_hosted_engine_ha.agent.agent.Agent::(run) 
> ovirt-hosted-engine-ha agent 2.4.9 started
> MainThread::INFO::2021-12-24 
> 03:56:03,516::hosted_engine::242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
>  Certificate common name not found, using hostname to identify host
> MainThread::INFO::2021-12-24 
> 03:56:03,575::hosted_engine::548::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>  Initializing ha-broker connection
> MainThread::INFO::2021-12-24 
> 03:56:03,576::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>  Starting monitor network, options {'addr': 'GATEWAY_IP', 'network_test': 
> 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}

Not sure where 'GATEWAY_IP' comes from, but it should be the actual IP
address, e.g.:

MainThread::INFO::2021-12-20
07:51:05,151::brokerlink::82::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
Starting monitor network, options {'addr': '192.168.201.1',
'network_test': 'dns', 'tcp_t_address': '', 'tcp_t_port': ''}

Check (and perhaps fix manually, if you can't/do not want to first
diagnose/fix your reinstallation) the line 'gateway=' in
/etc/ovirt-hosted-engine/hosted-engine.conf . Perhaps compare this
file to your other hosts - only the line 'host_id' should be different
between them.

> MainThread::ERROR::2021-12-24 
> 03:56:03,577::hosted_engine::564::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_broker)
>  Failed to start necessary monitors
> ```
>
> Now I've stumbled upon this one 
> [1984262](https://bugzilla.redhat.com/show_bug.cgi?id=1984262) but it doesn't 
> seem to apply. All hosts resolve properly, all hosts also have proper 
> hostnames set, unique /etc/hosts entries and proper A records set (in the 
> form of hostname.subdomain.domain.tld).
>
> The versions involved are:
>
> ```
> [root@host2 ~]# rpm -qa ovirt*
> ovirt-hosted-engine-setup-2.5.4-2.el8.noarch
> ovirt-imageio-daemon-2.3.0-1.el8.x86_64
> ovirt-host-dependencies-4.4.9-2.el8.x86_64
> ovirt-vmconsole-1.0.9-1.el8.noarch
> ovirt-imageio-client-2.3.0-1.el8.x86_64
> ovirt-host-4.4.9-2.el8.x86_64
> ovirt-python-openvswitch-2.11-1.el8.noarch
> ovirt-openvswitch-ovn-host-2.11-1.el8.noarch
> ovirt-provider-ovn-driver-1.2.34-1.el8.noarch
> ovirt-openvswitch-ovn-2.11-1.el8.noarch
> ovirt-release44-4.4.9.2-1.el8.noarch
> ovirt-openvswitch-2.11-1.el8.noarch
> ovirt-ansible-collection-1.6.5-1.el8.noarch
> ovirt-openvswitch-ovn-common-2.11-1.el8.noarch
> ovirt-hosted-engine-ha-2.4.9-1.el8.noarch
> ovirt-vmconsole-host-1.0.9-1.el8.noarch
> ovirt-imageio-common-2.3.0-1.el8.x86_64
> ```
>
> Any hint how to fix this is really appreciated. I'd like to get the engine 
> appliance back, remove host 3 and re-initialize it since this is a production 
> cluster (with hosts 1 and 2 replicating the gluster storage and host 3 acting 
> as an arbiter).

OK. Good luck and best regards,
-- 
Didi
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/JEQ2HMLDKKUPB66EHKKW55FVUGRCPNTZ/

[ovirt-users] Re: Unable to start ovirt-ha-agent on all hosts

Reply via email to