Thanks for the logs.  I am checking them at the moment. I have noticed so
far that node14 is serving NFS share which had been marked as problematic
(probably because of the downtime during the migration) but it has
recovered.

In the meantime, is is possible to get some meaningful results when
calling:
$ vdsm-client Host getStats
and
$ vdsm-client Host getCapabilities
on node14?

What  is the state for vdsmd service when running systemctl status vdsmd?
One other thing to rule out is the networking/firewall. Here the list of
the ports to be open for the host (the documentation is for hosted engine,
but it applies for standalone setup as well):
https://www.ovirt.org/documentation/installing_ovirt_as_a_self-hosted_engine_using_the_command_line/index.html#host-firewall-requirements_SHE_cli_deploy

btw. I have been hunting for the rare and hard to recreate bug for quite a
long time (without success yet) so any reported connectivity issues between
the manager and hosts are super interesting to me.

Artur

On Mon, Aug 9, 2021 at 11:44 AM Andrei Verovski <andre...@starlett.lv>
wrote:

> Hi, Artur,
>
>
> Thanks for assistance. Zipped engine starting from the day of upgrade
> attached.
> Restart via SSH from oVirt Web GUI works.
> oVirt engine runs on dedicated server, not hosted engine.
>
>
>
>
> On 9 Aug 2021, at 11:24, Artur Socha <aso...@redhat.com> wrote:
>
> Hi Andrei,
> Could you also post a relevant piece of engine.log? I don't have high
> expectations to find the answer there but  I just want  to be sure of it.
> VDSM.log does not show any trace of error from the vdsm point of view. For
> example it looks like it started correctly and subscribed to receiving
> commands from the engine (yet that does not mean I connected to it - only
> in listening mode).
>
> Can you confirm that 'SSH restart' from UI works - by 'works' I mean the
> host is actually restarted after a few minutes and there are no ssh related
> (public key etc) errors in engine.log?
>
> Artur
>
> On Mon, Aug 9, 2021 at 9:55 AM Andrei Verovski <andre...@starlett.lv>
> wrote:
>
>> Hi,
>>
>> I have oVirt 4.4.7.6-1.el8 and one problematic node (HP ProLiant with
>> CentOS 8 stream).
>> After replacing server rack router switch and restart got this error I
>> can’t recover from:
>>
>> VDSM node14 command Get Host Capabilities failed: Message timeout which
>> can be caused by communication issues
>>
>> vdsm-network running fine, but vdsmd can’t start on node14 for whatever
>> reason. All other nodes running fine.
>>
>> Aug 09 10:24:12 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
>> Running dummybr
>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
>> Running tune_system
>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
>> Running test_space
>> Aug 09 10:24:13 node14.mydomain.lv vdsmd_init_common.sh[4825]: vdsm:
>> Running test_lo
>> Aug 09 10:24:13 node14.mydomain.lv systemd[1]: Started Virtual Desktop
>> Server Manager.
>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]:
>> pam_systemd(sudo:session): Failed to create session: Start job for unit
>> user-0.slice failed with 'canceled'
>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session):
>> session opened for user root by (uid=0)
>> Aug 09 10:24:16 node14.mydomain.lv sudo[7721]: pam_unix(sudo:session):
>> session closed for user root
>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available.
>> Error: [Errno 2] No such file or directory
>> Aug 09 10:24:17 node14.mydomain.lv vdsm[6754]: WARN MOM not available,
>> KSM stats will be missing. Error:
>>
>>
>> In web gui -> Management I can’t do anything with the host except
>> restart. Stop aborts with error, all other commands are gray-ed out.
>> Status is “Unassigned”. Host is answering to pings as usual.
>> vdsm.log (from node14) attached.
>>
>> Thanks in advance for any help.
>>
>>
>> _______________________________________________
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/55M65W57Z43ZVPOARDTK7HKHCAMAUGO5/
>>
>
>
> --
> Artur Socha
> Senior Software Engineer, RHV
> Red Hat
>
>
>

-- 
Artur Socha
Senior Software Engineer, RHV
Red Hat
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MRTZJOD25TEF2X7H4O3IZL5ECGNDRSHR/

Reply via email to