[ovirt-users] Re: possible actions on host remaining as nonresponsive
On Thu, Dec 23, 2021 at 7:44 PM Darrell Budic wrote: > Try restarting libvirtd. It will also restart vdsmd, sometimes that fixes > things for me when there has been a storage hiccup. > Thanks for the suggestion, but with the "ssh host restart" action actually the server has restarted completely, so I doubt it can be that > If it’s a HA Engine host, I’ve also had to restart the ha-agent/ha-broker > combo is some situations as well. > > No, it's an external engine On the host: [root@ov300 vdsm]# nodectl check Status: OK Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK vdsmd ... OK [root@rhvh300 vdsm]# I already had the idea to try restarting the engine server, and then I found this similar bugzilla (even if for an older release): https://access.redhat.com/solutions/4222911 RHV host in "not responding" state until ovirt-engine service restarted I rebooted the engine server (that is a VM inside a vSphere environment) and all came back good with the host set as up, together with the other ones. Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/F4XG3ZJK6PXHOH5CVTWBWU3BYAMEYMOH/
[ovirt-users] Re: possible actions on host remaining as nonresponsive
Try restarting libvirtd. It will also restart vdsmd, sometimes that fixes things for me when there has been a storage hiccup. If it’s a HA Engine host, I’ve also had to restart the ha-agent/ha-broker combo is some situations as well. -Darrell > On Dec 23, 2021, at 12:00 PM, Gianluca Cecchi > wrote: > > Hello, > I have a 4.4.8 host that results as nonresponsive. > The DC is FC based > Tried to restart some daemons without effect (vdsmd, mom-vdsmd wdmd) > Then I executed a ssh host reboot but it seems it continues this way after > rebooting > > From storage and network point of view it seems all ok on the host. > > In vdsm.log of the host I see every 5 seconds: > > 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] START > getConnectedStoragePoolsList() from=internal, > task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:48) > 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] FINISH > getConnectedStoragePoolsList return={'poollist': []} from=internal, > task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:54) > 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vds] recovery: waiting for > storage pool to go up (clientIF:735) > 2021-12-23 18:54:53,444+0100 INFO (periodic/0) [vdsm.api] START > repoStats(domains=()) from=internal, > task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:48) > 2021-12-23 18:54:53,445+0100 INFO (periodic/0) [vdsm.api] FINISH repoStats > return={} from=internal, task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:54) > > In engine.log > > 2021-12-23 18:54:38,745+01 INFO > [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] > (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) > [] Thread pool 'hostUpdatesChecker' is using 0 threads out of 5, 5 threads > waiting for tasks. > 2021-12-23 18:55:27,479+01 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] > EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov300 command Get Host > Capabilities failed: Message timeout which can be caused by communication > issues > 2021-12-23 18:55:27,479+01 ERROR > [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] > (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] > Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: > VDSNetworkException: Message timeout which can be caused by communication > issues > > I would like to try to put into maintenance the host and then activate, or > reinstall, but there is a power action still in place since 1 hour ago (when > I executed ssh host reboot attempt that got host rebooted but not connected > apparently) that prevents it... what is its timeout? > > WHat can I check to understand the source of these supposed communication > problems? > > Thanks, > Gianluca > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZUUP2VEHKSJB7XDAUZZ2UUGG3UMFU6AC/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/4GVGGAN5GVS7SHR243CXPNVJOCZ3TXZL/
[ovirt-users] possible actions on host remaining as nonresponsive
Hello, I have a 4.4.8 host that results as nonresponsive. The DC is FC based Tried to restart some daemons without effect (vdsmd, mom-vdsmd wdmd) Then I executed a ssh host reboot but it seems it continues this way after rebooting >From storage and network point of view it seems all ok on the host. In vdsm.log of the host I see every 5 seconds: 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] START getConnectedStoragePoolsList() from=internal, task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:48) 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vdsm.api] FINISH getConnectedStoragePoolsList return={'poollist': []} from=internal, task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:54) 2021-12-23 18:54:53,053+0100 INFO (vmrecovery) [vds] recovery: waiting for storage pool to go up (clientIF:735) 2021-12-23 18:54:53,444+0100 INFO (periodic/0) [vdsm.api] START repoStats(domains=()) from=internal, task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:48) 2021-12-23 18:54:53,445+0100 INFO (periodic/0) [vdsm.api] FINISH repoStats return={} from=internal, task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:54) In engine.log 2021-12-23 18:54:38,745+01 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1) [] Thread pool 'hostUpdatesChecker' is using 0 threads out of 5, 5 threads waiting for tasks. 2021-12-23 18:55:27,479+01 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov300 command Get Host Capabilities failed: Message timeout which can be caused by communication issues 2021-12-23 18:55:27,479+01 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: VDSNetworkException: Message timeout which can be caused by communication issues I would like to try to put into maintenance the host and then activate, or reinstall, but there is a power action still in place since 1 hour ago (when I executed ssh host reboot attempt that got host rebooted but not connected apparently) that prevents it... what is its timeout? WHat can I check to understand the source of these supposed communication problems? Thanks, Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZUUP2VEHKSJB7XDAUZZ2UUGG3UMFU6AC/