[ovirt-users] Re: Install of RHV 4.4 failing - "Host is not up, please check logs, perhaps also on the engine machine"
Needed to add appropriate DNS entry / MAC for the VM It is a 192.168.2.122 typically* Sent from my iPhone ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WMLUHG67QN6CJZLNDYHIO7ZY7PJWRVRO/
[ovirt-users] Re: Install of RHV 4.4 failing - "Host is not up, please check logs, perhaps also on the engine machine"
On Tue, Jan 19, 2021 at 12:59 PM James Freeman wrote: > > So grateful for your help here - I ran tcpdump on the host, and I saw > the connection requests to the host from the hosted-engine on 54321/tcp, > so I was kind of getting there on the whole vdsm thing. > > The install just fell over again (same issue - the 120 second timeout > you described). Taking a step back here, I think something is wrong very > early on in my upgrade process. My environment is: > > 2 x RHEL based hosts (previously RHEL 7 - to be re-installed with RHEL 8 > as per install documentation) > NFS based storage > Self-hosted engine > > I have been following the documentation here: > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#SHE_Upgrading_from_4-3 > > And specifically here: > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#Upgrading_the_Manager_to_4-4_4-3_SHE > > All pre-requisite steps are done - the 4.3 engine was upgraded to the > latest version before the backup was taken and it was shut down. > > Now, I note that on my RHEL 8 host (newly installed), vdsmd is not > configured or running. The deploy script is not opening the firewall for > the temporary manager to talk to the host on 54321, but it wouldn't > matter if it did - even if were open up the firewall, there's no > configured vdsmd running for it to talk to anyway. > > I suddenly have the feeling that I've missed an important step that > would have configured the freshly installed RHEL 8 host for the > hosted-engine to be installed on - but I can't see what this might be. > I've been back and forth through the documentation but I can't see where > vdsmd would have been configured on the host. In short (ignoring all the This should happen automatically, does not require a manual step on your side. > failed attempts), my commands to install on a fresh RHEL 8 host have been: > > dnf module reset virt > dnf module list virt > dnf module enable virt:8.3 > dnf distro-sync --nobest > dnf install rhvm-appliance > reboot > dnf install ovirt-hosted-engine-setup Just to make sure, perhaps try also 'dnf install ovirt-host'. If this does carry on additional requirements, perhaps that's a bug somewhere. But I do not think this is what is failing you. > dnf install firewalld > systemctl status firewalld > systemctl enable firewalld > systemctl start firewalld I do not think these are needed - the deploy process should do this. Should be harmless, though. > systemctl status firewalld > hosted-engine --deploy --restore-from-file=backup.bck > > Am I missing something fundamental, or is there another step that's not > working where vdsmd would have been configured? Sorry, I ignored the fact that it's an upgrade/restore. In this case, it's expected that the restored engine will not have access to all other hosts during deploy, until it's started on the external network. So I suggest to ignore most errors in engine.log and check only those related to the host you deploy on. And check host-deploy/* logs. For a general overview of the hosted-engine deploy process, you might want to check 'Simone Tiraboschi - Hosted Engine 4.3 Deep Dive' in: https://www.ovirt.org/community/archived_conferences_presentations.html I think it's still the best presentation slides we have on this. Good luck, > > Many thanks > > James > > Yedidyah Bar David wrote on 19/01/2021 10:36: > > On Tue, Jan 19, 2021 at 12:25 PM James Freeman wrote: > >> Thanks Didi > >> > >> Great pointer - I have just performed a fresh deploy - am in the > >> hosted-engine VM, and in /var/log/ovirt-engine/engine-log, I can see the > >> following 3 lines cycling over and over again: > >> > >> 2021-01-19 05:12:11,395-05 INFO > >> [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp > >> Reactor) [] Connecting to rhvh1.example.org/192.168.50.31 > >> 2021-01-19 05:12:11,399-05 ERROR > >> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] > >> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) > >> [] Unable to RefreshCapabilities: ConnectException: Connection refused > >> 2021-01-19 05:12:11,401-05 ERROR > >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] > >> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) > >> [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = rhvh1.example.org, > >> VdsIdAndVdsVDSCommandParametersBase:{hostId='12057f7e-a4cf-46ec-b563-c1037ba5c62d', > >> vds='Host[rhvh1.example.org,12057f7e-a4cf-46ec-b563-c1037ba5c62d]'})' > >> execution failed: java.net.ConnectException: Connection refused > >> > >> I can ping 192.168.50.31 and resolve rhvh1.example.org - however I note > >> that firewalld on the hypervisor host (192.168.50.31) hasn't had > >> anything allowed through it yet apart from SSH and Cockpit. Is this a > >> problem, or a red herring? > > Generally speaking, the deploy process connec
[ovirt-users] Re: Install of RHV 4.4 failing - "Host is not up, please check logs, perhaps also on the engine machine"
So grateful for your help here - I ran tcpdump on the host, and I saw the connection requests to the host from the hosted-engine on 54321/tcp, so I was kind of getting there on the whole vdsm thing. The install just fell over again (same issue - the 120 second timeout you described). Taking a step back here, I think something is wrong very early on in my upgrade process. My environment is: 2 x RHEL based hosts (previously RHEL 7 - to be re-installed with RHEL 8 as per install documentation) NFS based storage Self-hosted engine I have been following the documentation here: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#SHE_Upgrading_from_4-3 And specifically here: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.4/html-single/upgrade_guide/index#Upgrading_the_Manager_to_4-4_4-3_SHE All pre-requisite steps are done - the 4.3 engine was upgraded to the latest version before the backup was taken and it was shut down. Now, I note that on my RHEL 8 host (newly installed), vdsmd is not configured or running. The deploy script is not opening the firewall for the temporary manager to talk to the host on 54321, but it wouldn't matter if it did - even if were open up the firewall, there's no configured vdsmd running for it to talk to anyway. I suddenly have the feeling that I've missed an important step that would have configured the freshly installed RHEL 8 host for the hosted-engine to be installed on - but I can't see what this might be. I've been back and forth through the documentation but I can't see where vdsmd would have been configured on the host. In short (ignoring all the failed attempts), my commands to install on a fresh RHEL 8 host have been: dnf module reset virt dnf module list virt dnf module enable virt:8.3 dnf distro-sync --nobest dnf install rhvm-appliance reboot dnf install ovirt-hosted-engine-setup dnf install firewalld systemctl status firewalld systemctl enable firewalld systemctl start firewalld systemctl status firewalld hosted-engine --deploy --restore-from-file=backup.bck Am I missing something fundamental, or is there another step that's not working where vdsmd would have been configured? Many thanks James Yedidyah Bar David wrote on 19/01/2021 10:36: On Tue, Jan 19, 2021 at 12:25 PM James Freeman wrote: Thanks Didi Great pointer - I have just performed a fresh deploy - am in the hosted-engine VM, and in /var/log/ovirt-engine/engine-log, I can see the following 3 lines cycling over and over again: 2021-01-19 05:12:11,395-05 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to rhvh1.example.org/192.168.50.31 2021-01-19 05:12:11,399-05 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] Unable to RefreshCapabilities: ConnectException: Connection refused 2021-01-19 05:12:11,401-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = rhvh1.example.org, VdsIdAndVdsVDSCommandParametersBase:{hostId='12057f7e-a4cf-46ec-b563-c1037ba5c62d', vds='Host[rhvh1.example.org,12057f7e-a4cf-46ec-b563-c1037ba5c62d]'})' execution failed: java.net.ConnectException: Connection refused I can ping 192.168.50.31 and resolve rhvh1.example.org - however I note that firewalld on the hypervisor host (192.168.50.31) hasn't had anything allowed through it yet apart from SSH and Cockpit. Is this a problem, or a red herring? Generally speaking, the deploy process connects first from the engine to the host via ssh (22), then (also) configures firewalld to allow access to vdsm (the oVirt host-side agent, port 54321), and later the engine normally communicates with the host via vdsm. Whether or not all of this worked, depends on exactly how you configured your host's firewalld beforehand. I suggest to start by not touching it, do the deployment, then see what it does/did (and that it worked), then decide how you are going to adapt your policy/tooling/whatever for later deployments, assuming you want to harden your hosts before deploying. It seems that the hosted-engine is coming up and being installed and configured ok. The engine health page looks ok (as validated by Ansible). It looks like the hosted-engine is waiting for something to happen on the host itself, but this never completed - which I suspect it never will given that it cannot connect to the host. The deploy process runs on the host, connects to the engine, asks it to add the host, then waits until it sees the host in the engine with status 'Up'. It indeed does not try to further diagnose failures, nor fail more quickly - if it's 'Up' it's quick, if it's not, it will wait for a timeout (120 times * 10 seconds = 20 minutes). Am I on the right track?
[ovirt-users] Re: Install of RHV 4.4 failing - "Host is not up, please check logs, perhaps also on the engine machine"
On Tue, Jan 19, 2021 at 12:25 PM James Freeman wrote: > > Thanks Didi > > Great pointer - I have just performed a fresh deploy - am in the > hosted-engine VM, and in /var/log/ovirt-engine/engine-log, I can see the > following 3 lines cycling over and over again: > > 2021-01-19 05:12:11,395-05 INFO > [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp > Reactor) [] Connecting to rhvh1.example.org/192.168.50.31 > 2021-01-19 05:12:11,399-05 ERROR > [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) > [] Unable to RefreshCapabilities: ConnectException: Connection refused > 2021-01-19 05:12:11,401-05 ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) > [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = rhvh1.example.org, > VdsIdAndVdsVDSCommandParametersBase:{hostId='12057f7e-a4cf-46ec-b563-c1037ba5c62d', > vds='Host[rhvh1.example.org,12057f7e-a4cf-46ec-b563-c1037ba5c62d]'})' > execution failed: java.net.ConnectException: Connection refused > > I can ping 192.168.50.31 and resolve rhvh1.example.org - however I note > that firewalld on the hypervisor host (192.168.50.31) hasn't had > anything allowed through it yet apart from SSH and Cockpit. Is this a > problem, or a red herring? Generally speaking, the deploy process connects first from the engine to the host via ssh (22), then (also) configures firewalld to allow access to vdsm (the oVirt host-side agent, port 54321), and later the engine normally communicates with the host via vdsm. Whether or not all of this worked, depends on exactly how you configured your host's firewalld beforehand. I suggest to start by not touching it, do the deployment, then see what it does/did (and that it worked), then decide how you are going to adapt your policy/tooling/whatever for later deployments, assuming you want to harden your hosts before deploying. > > It seems that the hosted-engine is coming up and being installed and > configured ok. The engine health page looks ok (as validated by > Ansible). It looks like the hosted-engine is waiting for something to > happen on the host itself, but this never completed - which I suspect it > never will given that it cannot connect to the host. The deploy process runs on the host, connects to the engine, asks it to add the host, then waits until it sees the host in the engine with status 'Up'. It indeed does not try to further diagnose failures, nor fail more quickly - if it's 'Up' it's quick, if it's not, it will wait for a timeout (120 times * 10 seconds = 20 minutes). > > Am I on the right track? You are :-). Good luck and best regards, > > Yedidyah Bar David wrote on 19/01/2021 10:06: > > On Tue, Jan 19, 2021 at 11:44 AM wrote: > >> Hi all > >> > >> I am in the process of migrating a RHV 4.3 setup to RHV 4.4 and struggling > >> with the setup. I am installing on RHEL 8.3, using settings backed up from > >> the RHV 4.3 install (via 'hosted-engine --deploy > >> --restore-from-file=backup.bck'). > >> > >> The install process always fails at the same point for me at the moment, > >> and I can't figure out how to get past it. As far as install progress > >> goes, the local hosted-engine comes up and runs on the node. I have been > >> able to grep for local_vm_ip in the logs, and can SSH into it with the > >> password I set during the setup phase. > >> > >> However the install playbooks always fail with: > >> 2021-01-18 18:38:00,086-0500 ERROR otopi.plugins.gr_he_common.core.misc > >> misc._terminate:167 Hosted Engine deployment failed: please check the logs > >> for the issue, fix accordingly or re-deploy from scratch. > >> > >> Earlier in the logs, I note the following: > >> 2021-01-18 18:34:51,258-0500 ERROR > >> otopi.ovirt_hosted_engine_setup.ansible_utils > >> ansible_utils._process_output:109 fatal: [localhost]: FAILED! => > >> {"changed": false, "msg": "Host is not up, please check logs, perhaps also > >> on the engine machine"} > >> 2021-01-18 18:37:16,661-0500 ERROR > >> otopi.ovirt_hosted_engine_setup.ansible_utils > >> ansible_utils._process_output:109 fatal: [localhost]: FAILED! => > >> {"changed": false, "msg": "The system may not be provisioned according to > >> the playbook results: please check the logs for the issue, fix accordingly > >> or re-deploy from scratch.\n"} > >> Traceback (most recent call last): > >>File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in > >> _executeMethod > >> method['method']() > >>File > >> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", > >> line 435, in _closeup > >> raise RuntimeError(_('Failed executing ansible-playbook')) > >> RuntimeError: Failed executing ansible-playbook > >> 2021-01-18 18:37:18,996-0500 ERROR otopi.context > >> context._executeMethod:154 Failed to
[ovirt-users] Re: Install of RHV 4.4 failing - "Host is not up, please check logs, perhaps also on the engine machine"
Thanks Didi Great pointer - I have just performed a fresh deploy - am in the hosted-engine VM, and in /var/log/ovirt-engine/engine-log, I can see the following 3 lines cycling over and over again: 2021-01-19 05:12:11,395-05 INFO [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connecting to rhvh1.example.org/192.168.50.31 2021-01-19 05:12:11,399-05 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] Unable to RefreshCapabilities: ConnectException: Connection refused 2021-01-19 05:12:11,401-05 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesAsyncVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-96) [] Command 'GetCapabilitiesAsyncVDSCommand(HostName = rhvh1.example.org, VdsIdAndVdsVDSCommandParametersBase:{hostId='12057f7e-a4cf-46ec-b563-c1037ba5c62d', vds='Host[rhvh1.example.org,12057f7e-a4cf-46ec-b563-c1037ba5c62d]'})' execution failed: java.net.ConnectException: Connection refused I can ping 192.168.50.31 and resolve rhvh1.example.org - however I note that firewalld on the hypervisor host (192.168.50.31) hasn't had anything allowed through it yet apart from SSH and Cockpit. Is this a problem, or a red herring? It seems that the hosted-engine is coming up and being installed and configured ok. The engine health page looks ok (as validated by Ansible). It looks like the hosted-engine is waiting for something to happen on the host itself, but this never completed - which I suspect it never will given that it cannot connect to the host. Am I on the right track? Yedidyah Bar David wrote on 19/01/2021 10:06: On Tue, Jan 19, 2021 at 11:44 AM wrote: Hi all I am in the process of migrating a RHV 4.3 setup to RHV 4.4 and struggling with the setup. I am installing on RHEL 8.3, using settings backed up from the RHV 4.3 install (via 'hosted-engine --deploy --restore-from-file=backup.bck'). The install process always fails at the same point for me at the moment, and I can't figure out how to get past it. As far as install progress goes, the local hosted-engine comes up and runs on the node. I have been able to grep for local_vm_ip in the logs, and can SSH into it with the password I set during the setup phase. However the install playbooks always fail with: 2021-01-18 18:38:00,086-0500 ERROR otopi.plugins.gr_he_common.core.misc misc._terminate:167 Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch. Earlier in the logs, I note the following: 2021-01-18 18:34:51,258-0500 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Host is not up, please check logs, perhaps also on the engine machine"} 2021-01-18 18:37:16,661-0500 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", line 435, in _closeup raise RuntimeError(_('Failed executing ansible-playbook')) RuntimeError: Failed executing ansible-playbook 2021-01-18 18:37:18,996-0500 ERROR otopi.context context._executeMethod:154 Failed to execute stage 'Closing up': Failed executing ansible-playbook 2021-01-18 18:37:32,421-0500 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host rhvm.example.org port 22: No route to host", "skip_reason": "Host localhost is unreachable", "unreachable": true} I find the unreachable message a bit odd, as at this stage all that has happened is that the local hosted-engine has been brought up to be configured, and so it is running on virbr0, not on my actual network. As a result, that DNS address will never resolve, and the IP it resolves to won't be up. I gave the installation script permission to modify the local /etc/hosts but this hasn't improved things. I presume I'm missing something in the install process, or earlier on in the logs, but I've been scanning for errors and possible clues to no avail. Any and all help greatly appreciated! Please check/share, on the engine machine under /var/log/ovirt-engine, or, if inaccessible, on the host, under /var/log/ovirt-hosted-engine-setup/engine-logs-*: engine.log host-deploy/* Good luck and best regards, ___ Users mailing list -- us
[ovirt-users] Re: Install of RHV 4.4 failing - "Host is not up, please check logs, perhaps also on the engine machine"
On Tue, Jan 19, 2021 at 11:44 AM wrote: > > Hi all > > I am in the process of migrating a RHV 4.3 setup to RHV 4.4 and struggling > with the setup. I am installing on RHEL 8.3, using settings backed up from > the RHV 4.3 install (via 'hosted-engine --deploy > --restore-from-file=backup.bck'). > > The install process always fails at the same point for me at the moment, and > I can't figure out how to get past it. As far as install progress goes, the > local hosted-engine comes up and runs on the node. I have been able to grep > for local_vm_ip in the logs, and can SSH into it with the password I set > during the setup phase. > > However the install playbooks always fail with: > 2021-01-18 18:38:00,086-0500 ERROR otopi.plugins.gr_he_common.core.misc > misc._terminate:167 Hosted Engine deployment failed: please check the logs > for the issue, fix accordingly or re-deploy from scratch. > > Earlier in the logs, I note the following: > 2021-01-18 18:34:51,258-0500 ERROR > otopi.ovirt_hosted_engine_setup.ansible_utils > ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": > false, "msg": "Host is not up, please check logs, perhaps also on the engine > machine"} > 2021-01-18 18:37:16,661-0500 ERROR > otopi.ovirt_hosted_engine_setup.ansible_utils > ansible_utils._process_output:109 fatal: [localhost]: FAILED! => {"changed": > false, "msg": "The system may not be provisioned according to the playbook > results: please check the logs for the issue, fix accordingly or re-deploy > from scratch.\n"} > Traceback (most recent call last): > File "/usr/lib/python3.6/site-packages/otopi/context.py", line 132, in > _executeMethod > method['method']() > File > "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-ansiblesetup/core/misc.py", > line 435, in _closeup > raise RuntimeError(_('Failed executing ansible-playbook')) > RuntimeError: Failed executing ansible-playbook > 2021-01-18 18:37:18,996-0500 ERROR otopi.context context._executeMethod:154 > Failed to execute stage 'Closing up': Failed executing ansible-playbook > 2021-01-18 18:37:32,421-0500 ERROR > otopi.ovirt_hosted_engine_setup.ansible_utils > ansible_utils._process_output:109 fatal: [localhost]: UNREACHABLE! => > {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: > connect to host rhvm.example.org port 22: No route to host", "skip_reason": > "Host localhost is unreachable", "unreachable": true} > > I find the unreachable message a bit odd, as at this stage all that has > happened is that the local hosted-engine has been brought up to be > configured, and so it is running on virbr0, not on my actual network. As a > result, that DNS address will never resolve, and the IP it resolves to won't > be up. I gave the installation script permission to modify the local > /etc/hosts but this hasn't improved things. > > I presume I'm missing something in the install process, or earlier on in the > logs, but I've been scanning for errors and possible clues to no avail. > > Any and all help greatly appreciated! Please check/share, on the engine machine under /var/log/ovirt-engine, or, if inaccessible, on the host, under /var/log/ovirt-hosted-engine-setup/engine-logs-*: engine.log host-deploy/* Good luck and best regards, -- Didi ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/CORIDDYMUPDHGBUGL4DV5IZ4T5QZPJGL/