Re: [ovirt-users] vdsm issues between engine and host
Hi Piotr, Thanks for the reply. It all looks healthy now. Regarding DNS, we had some issues with it at the time. However, I think the main issue was NetworkManager shutting the interface down seemingly at random. I had thought it had been disabled when I set the machine up about 5 months ago (and it has worked fine up until then). That, together with VDSM being enabled on the engine I can't explain. The only change I had made was an attempt to set up a hosted engine, which I did incorrectly by not setting the host into maintenance and doing it there (I instead tried to set it up as a VM on the running cluster). I can't see why this may have made the changes above, but I would not know why. Anyway, I've read the documentation more closely rather than hurrying through it. Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.1858] device (enp5s0f0): state change: disconnected -> prepare (reason 'none') [30 40 0] Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.1860] manager: NetworkManager state is now CONNECTING Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.1867] device (enp5s0f0): state change: prepare -> config (reason 'none') [40 50 0] Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.2071] device (enp5s0f0): state change: config -> ip-config (reason 'none') [50 70 0] Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.2120] device (enp5s0f0): state change: ip-config -> ip-check (reason 'none') [70 80 0] Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.2160] device (enp5s0f0): state change: ip-check -> secondaries (reason 'none') [80 90 0] Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.2164] device (enp5s0f0): state change: secondaries -> activated (reason 'none') [90 100 0] Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.2166] manager: NetworkManager state is now CONNECTED_LOCAL Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: [1487620073.2889] manager: NetworkManager state is now CONNECTED_GLOBAL Thanks again and sorry to have wasted your time with this, Cam On Tue, Feb 21, 2017 at 8:59 AM, Piotr Kliczewski wrote: > On Mon, Feb 20, 2017 at 9:47 PM, cmc wrote: >> Hi, >> >> Due to networking and DNS issues. our engine was offlined (it is >> physical machine currently, will be converting it to a VM in the >> future when time allows). When service was restored, I noticed that >> all the VMs were listed as being in an unknown state on one host. The >> VMs were fine, but the engine could not ascertain their status as the >> host itself was in an unknown state. vdsm was reporting errors and was >> not running on the engine (or at least was in status 'failed' in >> systemd). I tried starting vdsmd on the engine but it would not start. >> I decided to try to restart vdsmd on the host and that did allow the >> state of the VMs to be discovered, and the engine listed the host as >> up again. However, there are still errors with vdsmd on both the host >> and the engine, and the engine cannot start vdsmd. I guess it is able >> to monitor the hosts in a limited way as it says they are both up. >> There are communication errors between one of the hosts and the >> engine: the host is refusing connections by the look of it >> >> from the engine log: >> >> 2017-02-20 18:41:51,226Z ERROR >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] >> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] >> Command 'GetCapabilitiesVDSCommand(HostName = k >> vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', >> hostId='e050c27f-8709-404c-b03e-59c0167a824b', >> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})' >> execution failed: java.net.ConnectExce >> ption: Connection refused >> 2017-02-20 18:41:51,226Z ERROR >> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] >> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] >> Failure to refresh host 'kvm-ldn-01' runtime info: java.n >> et.ConnectException: Connection refused >> 2017-02-20 18:41:52,772Z ERROR >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] >> (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] >> Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01, >> VdsIdVDSCommandParametersBase:{runAsync='true', >> hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed: >> VDSGenericException: VDSNetworkException: Connection reset by peer >> 2017-02-20 18:41:54,256Z ERROR >> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] >> (DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] >> Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01, >> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', >> hostId='e050c27f-8709-404c-b03e-59c0167a824b', >> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})' >> execution failed: java.net.ConnectException: Connection refused >> > > I checked your eng
Re: [ovirt-users] vdsm issues between engine and host
On Mon, Feb 20, 2017 at 9:47 PM, cmc wrote: > Hi, > > Due to networking and DNS issues. our engine was offlined (it is > physical machine currently, will be converting it to a VM in the > future when time allows). When service was restored, I noticed that > all the VMs were listed as being in an unknown state on one host. The > VMs were fine, but the engine could not ascertain their status as the > host itself was in an unknown state. vdsm was reporting errors and was > not running on the engine (or at least was in status 'failed' in > systemd). I tried starting vdsmd on the engine but it would not start. > I decided to try to restart vdsmd on the host and that did allow the > state of the VMs to be discovered, and the engine listed the host as > up again. However, there are still errors with vdsmd on both the host > and the engine, and the engine cannot start vdsmd. I guess it is able > to monitor the hosts in a limited way as it says they are both up. > There are communication errors between one of the hosts and the > engine: the host is refusing connections by the look of it > > from the engine log: > > 2017-02-20 18:41:51,226Z ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] > (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] > Command 'GetCapabilitiesVDSCommand(HostName = k > vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', > hostId='e050c27f-8709-404c-b03e-59c0167a824b', > vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})' > execution failed: java.net.ConnectExce > ption: Connection refused > 2017-02-20 18:41:51,226Z ERROR > [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] > (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] > Failure to refresh host 'kvm-ldn-01' runtime info: java.n > et.ConnectException: Connection refused > 2017-02-20 18:41:52,772Z ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] > (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] > Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01, > VdsIdVDSCommandParametersBase:{runAsync='true', > hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed: > VDSGenericException: VDSNetworkException: Connection reset by peer > 2017-02-20 18:41:54,256Z ERROR > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] > (DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] > Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01, > VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', > hostId='e050c27f-8709-404c-b03e-59c0167a824b', > vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})' > execution failed: java.net.ConnectException: Connection refused > I checked your engine logs and I saw dns issues much later then the error above: 2017-02-20 19:47:56,516Z ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] Failure to refresh host 'kvm-ldn-01' runtime info: java.net.UnknownHostException: kvm-ldn-01 > from the vdsm.log on the host: > > > Feb 20 18:44:20 kvm-ldn-01 vdsm[42308]: vdsm vds.dispatcher ERROR SSL > error receiving from (':::172.16.75.16', 38350, 0, 0) at 0x33b9bd8>: unexpected eof > Feb 20 18:44:24 kvm-ldn-01 vdsm[42308]: vdsm jsonrpc.JsonRpcServer > ERROR Internal server error > Traceback (most recent call last): > File > "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in > _handle_request... > > Any ideas what might be going on here? I see that ~13 vm was move to up state. Can you please say which host is causing issues and provide the logs. > > Thanks, > > Cam > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm issues between engine and host
> > VDSM should not be running on the engine. Not sure why it was enabled in systemd...I have disabled it now. AFAIK it wasn't before. > I decided to try to restart vdsmd on the host and that did allow the > state of the VMs to be discovered, and the engine listed the host as > up again. However, there are still errors with vdsmd on both the host > and the engine, and the engine cannot start vdsmd. I guess it is able > to monitor the hosts in a limited way as it says they are both up. > There are communication errors between one of the hosts and the > engine: the host is refusing connections by the look of it > > > Is iptables / firewalld set up correctly? > Y. Ports 54321 and 22 are open from engine to host. Is there an easy way to check the config is valid? It certainly used to work ok. Thanks, C ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] vdsm issues between engine and host
On Feb 20, 2017 10:48 PM, "cmc" wrote: Hi, Due to networking and DNS issues. our engine was offlined (it is physical machine currently, will be converting it to a VM in the future when time allows). When service was restored, I noticed that all the VMs were listed as being in an unknown state on one host. The VMs were fine, but the engine could not ascertain their status as the host itself was in an unknown state. vdsm was reporting errors and was not running on the engine (or at least was in status 'failed' in systemd). I tried starting vdsmd on the engine but it would not start. VDSM should not be running on the engine. I decided to try to restart vdsmd on the host and that did allow the state of the VMs to be discovered, and the engine listed the host as up again. However, there are still errors with vdsmd on both the host and the engine, and the engine cannot start vdsmd. I guess it is able to monitor the hosts in a limited way as it says they are both up. There are communication errors between one of the hosts and the engine: the host is refusing connections by the look of it Is iptables / firewalld set up correctly? Y. from the engine log: 2017-02-20 18:41:51,226Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] Command 'GetCapabilitiesVDSCommand(HostName = k vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='e050c27f-8709-404c-b03e-59c0167a824b', vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})' execution failed: java.net.ConnectExce ption: Connection refused 2017-02-20 18:41:51,226Z ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] Failure to refresh host 'kvm-ldn-01' runtime info: java.n et.ConnectException: Connection refused 2017-02-20 18:41:52,772Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand] (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed: VDSGenericException: VDSNetworkException: Connection reset by peer 2017-02-20 18:41:54,256Z ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5] Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true', hostId='e050c27f-8709-404c-b03e-59c0167a824b', vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})' execution failed: java.net.ConnectException: Connection refused from the vdsm.log on the host: Feb 20 18:44:20 kvm-ldn-01 vdsm[42308]: vdsm vds.dispatcher ERROR SSL error receiving from : unexpected eof Feb 20 18:44:24 kvm-ldn-01 vdsm[42308]: vdsm jsonrpc.JsonRpcServer ERROR Internal server error Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in _handle_request... Any ideas what might be going on here? Thanks, Cam ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users