Re: [ovirt-users] vdsm issues between engine and host

2017-02-21 Thread cmc
Hi Piotr,

Thanks for the reply. It all looks healthy now. Regarding DNS, we had
some issues with it at the time. However, I think the main issue was
NetworkManager shutting the interface down seemingly at random. I had
thought it had been disabled when I set the machine up about 5 months
ago (and it has worked fine up until then). That, together with VDSM
being enabled on the engine I can't explain. The only change I had
made was an attempt to set up a hosted engine, which I did incorrectly
by not setting the host into maintenance and doing it there (I instead
tried to set it up as a VM on the running cluster). I can't see why
this may have made the changes above, but I would not know why.
Anyway, I've read the documentation more closely rather than hurrying
through it.

Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.1858] device (enp5s0f0): state change: disconnected ->
prepare (reason 'none') [30 40 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.1860] manager: NetworkManager state is now CONNECTING
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.1867] device (enp5s0f0): state change: prepare -> config
(reason 'none') [40 50 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.2071] device (enp5s0f0): state change: config -> ip-config
(reason 'none') [50 70 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.2120] device (enp5s0f0): state change: ip-config ->
ip-check (reason 'none') [70 80 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.2160] device (enp5s0f0): state change: ip-check ->
secondaries (reason 'none') [80 90 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.2164] device (enp5s0f0): state change: secondaries ->
activated (reason 'none') [90 100 0]
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.2166] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 20 19:47:53 ovirt-engine NetworkManager[1061]: 
[1487620073.2889] manager: NetworkManager state is now
CONNECTED_GLOBAL

Thanks again and sorry to have wasted your time with this,

Cam

On Tue, Feb 21, 2017 at 8:59 AM, Piotr Kliczewski
 wrote:
> On Mon, Feb 20, 2017 at 9:47 PM, cmc  wrote:
>> Hi,
>>
>> Due to networking and DNS issues. our engine was offlined (it is
>> physical machine currently, will be converting it to a VM in the
>> future when time allows). When service was restored, I noticed that
>> all the VMs were listed as being in an unknown state on one host. The
>> VMs were fine, but the engine could not ascertain their status as the
>> host itself was in an unknown state. vdsm was reporting errors and was
>> not running on the engine (or at least was in status 'failed' in
>> systemd). I tried starting vdsmd on the engine but it would not start.
>> I decided to try to restart vdsmd on the host and that did allow the
>> state of the VMs to be discovered, and the engine listed the host as
>> up again. However, there are still errors with vdsmd on both the host
>> and the engine, and the engine cannot start vdsmd. I guess it is able
>> to monitor the hosts in a limited way as it says they are both up.
>> There are communication errors between one of the hosts and the
>> engine: the host is refusing connections by the look of it
>>
>> from the engine log:
>>
>> 2017-02-20 18:41:51,226Z ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
>> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Command 'GetCapabilitiesVDSCommand(HostName = k
>> vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
>> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
>> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
>> execution failed: java.net.ConnectExce
>> ption: Connection refused
>> 2017-02-20 18:41:51,226Z ERROR
>> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
>> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Failure to refresh host 'kvm-ldn-01' runtime info: java.n
>> et.ConnectException: Connection refused
>> 2017-02-20 18:41:52,772Z ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
>> (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01,
>> VdsIdVDSCommandParametersBase:{runAsync='true',
>> hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed:
>> VDSGenericException: VDSNetworkException: Connection reset by peer
>> 2017-02-20 18:41:54,256Z ERROR
>> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
>> (DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
>> Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01,
>> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
>> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
>> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
>> execution failed: 

Re: [ovirt-users] vdsm issues between engine and host

2017-02-21 Thread Piotr Kliczewski
On Mon, Feb 20, 2017 at 9:47 PM, cmc  wrote:
> Hi,
>
> Due to networking and DNS issues. our engine was offlined (it is
> physical machine currently, will be converting it to a VM in the
> future when time allows). When service was restored, I noticed that
> all the VMs were listed as being in an unknown state on one host. The
> VMs were fine, but the engine could not ascertain their status as the
> host itself was in an unknown state. vdsm was reporting errors and was
> not running on the engine (or at least was in status 'failed' in
> systemd). I tried starting vdsmd on the engine but it would not start.
> I decided to try to restart vdsmd on the host and that did allow the
> state of the VMs to be discovered, and the engine listed the host as
> up again. However, there are still errors with vdsmd on both the host
> and the engine, and the engine cannot start vdsmd. I guess it is able
> to monitor the hosts in a limited way as it says they are both up.
> There are communication errors between one of the hosts and the
> engine: the host is refusing connections by the look of it
>
> from the engine log:
>
> 2017-02-20 18:41:51,226Z ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Command 'GetCapabilitiesVDSCommand(HostName = k
> vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
> execution failed: java.net.ConnectExce
> ption: Connection refused
> 2017-02-20 18:41:51,226Z ERROR
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
> (DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Failure to refresh host 'kvm-ldn-01' runtime info: java.n
> et.ConnectException: Connection refused
> 2017-02-20 18:41:52,772Z ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
> (DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01,
> VdsIdVDSCommandParametersBase:{runAsync='true',
> hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed:
> VDSGenericException: VDSNetworkException: Connection reset by peer
> 2017-02-20 18:41:54,256Z ERROR
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
> (DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
> Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01,
> VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
> hostId='e050c27f-8709-404c-b03e-59c0167a824b',
> vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
> execution failed: java.net.ConnectException: Connection refused
>

I checked your engine logs and I saw dns issues much later then the error above:

2017-02-20 19:47:56,516Z ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Failure to refresh host 'kvm-ldn-01' runtime info:
java.net.UnknownHostException: kvm-ldn-01

> from the vdsm.log on the host:
>
>
> Feb 20 18:44:20 kvm-ldn-01 vdsm[42308]: vdsm vds.dispatcher ERROR SSL
> error receiving from  (':::172.16.75.16', 38350, 0, 0) at 0x33b9bd8>: unexpected eof
> Feb 20 18:44:24 kvm-ldn-01 vdsm[42308]: vdsm jsonrpc.JsonRpcServer
> ERROR Internal server error
> Traceback (most recent call last):
>   File
> "/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in
> _handle_request...
>
> Any ideas what might be going on here?

I see that ~13 vm was move to up state.

Can you please say which host is causing issues and provide the logs.

>
> Thanks,
>
> Cam
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] vdsm issues between engine and host

2017-02-20 Thread cmc
>
> VDSM should not be running on the engine.

Not sure why it was enabled in systemd...I have disabled it now. AFAIK
it wasn't before.

> I decided to try to restart vdsmd on the host and that did allow the
> state of the VMs to be discovered, and the engine listed the host as
> up again. However, there are still errors with vdsmd on both the host
> and the engine, and the engine cannot start vdsmd. I guess it is able
> to monitor the hosts in a limited way as it says they are both up.
> There are communication errors between one of the hosts and the
> engine: the host is refusing connections by the look of it
>
>
> Is iptables / firewalld set up correctly?
> Y.

Ports 54321 and 22 are open from engine to host. Is there an easy way to check
the config is valid? It certainly used to work ok.

Thanks,

C
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] vdsm issues between engine and host

2017-02-20 Thread Yaniv Kaul
On Feb 20, 2017 10:48 PM, "cmc"  wrote:

Hi,

Due to networking and DNS issues. our engine was offlined (it is
physical machine currently, will be converting it to a VM in the
future when time allows). When service was restored, I noticed that
all the VMs were listed as being in an unknown state on one host. The
VMs were fine, but the engine could not ascertain their status as the
host itself was in an unknown state. vdsm was reporting errors and was
not running on the engine (or at least was in status 'failed' in
systemd). I tried starting vdsmd on the engine but it would not start.


VDSM should not be running on the engine.

I decided to try to restart vdsmd on the host and that did allow the
state of the VMs to be discovered, and the engine listed the host as
up again. However, there are still errors with vdsmd on both the host
and the engine, and the engine cannot start vdsmd. I guess it is able
to monitor the hosts in a limited way as it says they are both up.
There are communication errors between one of the hosts and the
engine: the host is refusing connections by the look of it


Is iptables / firewalld set up correctly?
Y.


from the engine log:

2017-02-20 18:41:51,226Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Command 'GetCapabilitiesVDSCommand(HostName = k
vm-ldn-01, VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
hostId='e050c27f-8709-404c-b03e-59c0167a824b',
vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
execution failed: java.net.ConnectExce
ption: Connection refused
2017-02-20 18:41:51,226Z ERROR
[org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring]
(DefaultQuartzScheduler2) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Failure to refresh host 'kvm-ldn-01' runtime info: java.n
et.ConnectException: Connection refused
2017-02-20 18:41:52,772Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetAllVmStatsVDSCommand]
(DefaultQuartzScheduler6) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Command 'GetAllVmStatsVDSCommand(HostName = kvm-ldn-01,
VdsIdVDSCommandParametersBase:{runAsync='true',
hostId='e050c27f-8709-404c-b03e-59c0167a824b'})' execution failed:
VDSGenericException: VDSNetworkException: Connection reset by peer
2017-02-20 18:41:54,256Z ERROR
[org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand]
(DefaultQuartzScheduler7) [f8aa18b3-97b9-48e2-a681-cf3aaed330a5]
Command 'GetCapabilitiesVDSCommand(HostName = kvm-ldn-01,
VdsIdAndVdsVDSCommandParametersBase:{runAsync='true',
hostId='e050c27f-8709-404c-b03e-59c0167a824b',
vds='Host[kvm-ldn-01,e050c27f-8709-404c-b03e-59c0167a824b]'})'
execution failed: java.net.ConnectException: Connection refused

from the vdsm.log on the host:


Feb 20 18:44:20 kvm-ldn-01 vdsm[42308]: vdsm vds.dispatcher ERROR SSL
error receiving from : unexpected eof
Feb 20 18:44:24 kvm-ldn-01 vdsm[42308]: vdsm jsonrpc.JsonRpcServer
ERROR Internal server error
Traceback (most recent call last):
  File
"/usr/lib/python2.7/site-packages/yajsonrpc/__init__.py", line 547, in
_handle_request...

Any ideas what might be going on here?

Thanks,

Cam

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users