[ovirt-users] Re: possible actions on host remaining as nonresponsive

2021-12-23 Thread Gianluca Cecchi
On Thu, Dec 23, 2021 at 7:44 PM Darrell Budic 
wrote:

> Try restarting libvirtd. It will also restart vdsmd, sometimes that fixes
> things for me when there has been a storage hiccup.
>

Thanks for the suggestion, but with the "ssh host restart" action actually
the server has restarted completely, so I doubt it can be that


> If it’s a HA Engine host, I’ve also had to restart the ha-agent/ha-broker
> combo is some situations as well.
>
> No, it's an external engine

On the host:
[root@ov300 vdsm]# nodectl check
Status: OK
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK
vdsmd ... OK
[root@rhvh300 vdsm]#

I already had the idea to try restarting the engine server, and then I
found this similar bugzilla (even if for an older release):
https://access.redhat.com/solutions/4222911
RHV host in "not responding" state until ovirt-engine service restarted

I rebooted the engine server (that is a VM inside a vSphere environment)
and all came back good with the host set as up, together with the other
ones.

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/F4XG3ZJK6PXHOH5CVTWBWU3BYAMEYMOH/


[ovirt-users] Re: possible actions on host remaining as nonresponsive

2021-12-23 Thread Darrell Budic
Try restarting libvirtd. It will also restart vdsmd, sometimes that fixes 
things for me when there has been a storage hiccup.

If it’s a HA Engine host, I’ve also had to restart the ha-agent/ha-broker combo 
is some situations as well.

  -Darrell

> On Dec 23, 2021, at 12:00 PM, Gianluca Cecchi  
> wrote:
> 
> Hello,
> I have a 4.4.8 host that results as nonresponsive.
> The DC is FC based
> Tried to restart some daemons without effect (vdsmd, mom-vdsmd wdmd)
> Then I executed a ssh host reboot but it seems it continues this way after 
> rebooting
> 
> From storage and network point of view it seems all ok on the host.
> 
> In vdsm.log of the host I see every 5 seconds:
> 
> 2021-12-23 18:54:53,053+0100 INFO  (vmrecovery) [vdsm.api] START 
> getConnectedStoragePoolsList() from=internal, 
> task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:48)
> 2021-12-23 18:54:53,053+0100 INFO  (vmrecovery) [vdsm.api] FINISH 
> getConnectedStoragePoolsList return={'poollist': []} from=internal, 
> task_id=916bc455-ce37-4b50-9f38-b69e3b03807f (api:54)
> 2021-12-23 18:54:53,053+0100 INFO  (vmrecovery) [vds] recovery: waiting for 
> storage pool to go up (clientIF:735)
> 2021-12-23 18:54:53,444+0100 INFO  (periodic/0) [vdsm.api] START 
> repoStats(domains=()) from=internal, 
> task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:48)
> 2021-12-23 18:54:53,445+0100 INFO  (periodic/0) [vdsm.api] FINISH repoStats 
> return={} from=internal, task_id=eb5540e0-0f90-4996-bc9a-7c73949f390f (api:54)
> 
> In engine.log
> 
> 2021-12-23 18:54:38,745+01 INFO  
> [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] 
> (EE-ManagedScheduledExecutorService-engineThreadMonitoringThreadPool-Thread-1)
>  [] Thread pool 'hostUpdatesChecker' is using 0 threads out of 5, 5 threads 
> waiting for tasks.
> 2021-12-23 18:55:27,479+01 ERROR 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] 
> EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM ov300 command Get Host 
> Capabilities failed: Message timeout which can be caused by communication 
> issues
> 2021-12-23 18:55:27,479+01 ERROR 
> [org.ovirt.engine.core.vdsbroker.monitoring.HostMonitoring] 
> (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [] 
> Unable to RefreshCapabilities: VDSNetworkException: VDSGenericException: 
> VDSNetworkException: Message timeout which can be caused by communication 
> issues
> 
> I would like to try to put into maintenance the host and then activate, or 
> reinstall, but there is a power action still in place since 1 hour ago (when 
> I executed ssh host reboot attempt that got host rebooted but not connected 
> apparently) that prevents it... what is its timeout?
> 
> WHat can I check to understand the source of these supposed communication 
> problems?
> 
> Thanks,
> Gianluca
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZUUP2VEHKSJB7XDAUZZ2UUGG3UMFU6AC/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4GVGGAN5GVS7SHR243CXPNVJOCZ3TXZL/