Hi,

Last night we have an incident of a failed host. Engine issued a fence but did 
not restart the vms running on that node on other operational hosts. I'd like 
to know if this is normal or I can tune it somehow.

Here are some relevant logs from engine:

2018-09-05 03:00:51,496+03 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] 
(EE-ManagedThreadFactory-engine-Thread-827644) [] Host 'v3' is not responding. 
It will stay in Connecting state for a grace period of 63 seconds and after 
that an attempt to fence the host will be issued.

2018-09-05 03:01:11,945+03 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.PollVmStatsRefresher] 
(EE-ManagedThreadFactory-engineScheduled-Thread-57) [] Failed to fetch vms info 
for host 'v3' - skipping VMs monitoring.
2018-09-05 03:01:48,028+03 WARN  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-827679) [] EVENT_ID: 
VM_SET_TO_UNKNOWN_STATUS(142), VM vm7 was set to the Unknown status.

2018-09-05 03:02:10,033+03 INFO  [org.ovirt.engine.core.bll.pm.StopVdsCommand] 
(EE-ManagedThreadFactory-engine-Thread-827680) [30369e01] Power-Management: 
STOP of host 'v3' initiated.

2018-09-05 03:02:55,935+03 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engine-Thread-827680) [3adcac38] EVENT_ID: 
VM_WAS_SET_DOWN_DUE_TO_HOST_REBOOT_OR_MANUAL_FENCE(143), Vm vm7 was shut down 
due to v3 host reboot or manual fence

2018-09-05 03:02:56,018+03 INFO  [org.ovirt.engine.core.bll.pm.StopVdsCommand] 
(EE-ManagedThreadFactory-engine-Thread-827680) [ea0f582] Power-Management: STOP 
host 'v3' succeeded.

2018-09-05 03:08:20,818+03 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedThreadFactory-engineScheduled-Thread-91) [326878] EVENT_ID: 
VDS_DETECTED(13), Status of host v3 was set to Up.

2018-09-05 03:08:23,391+03 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedThreadFactory-engineScheduled-Thread-88) [] VM 
'3b1262ef-7fff-40af-b85e-9fd01a4f422b'(vm7) was unexpectedly detected as 'Down' 
on VDS '4970369d-21c2-467d-9247-c73ca2d71b3e'(v3) (expected on 'null')

As you can see, engine does a fence on node v3.
vm7 as well as the others running on that node did not re-start.

any tips?

engine is ovirt-engine-4.2.5.3-1.el7.noarch and host is 
vdsm-4.20.35-1.el7.x86_64

best regards,

Giannis
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2FIAENRQOJ7LS5ACX2XJFGT27WOCDU6D/

Reply via email to