I have an oVirt 4.1.1 environment with:

- engine is a vSphere CentOS 7.3 VM with its nic on say vlan1
- 2 x hosts (CentOS 7.3) with their ovirtmgmt lan on a bonding
(active-backup) on say vlan2

network architecture layout is to put hypervisors and mgmt servers in
different vlans

Today we had these 4 events below shown in our engine, with root cause
apparently a maintenance network routing activity (it should have been
transparent, network guys told..., but this is another story ;-)
No alert message inside VMs

4) May 23, 2017 1:43:58 PM Host ov300 power management was verified
3) May 23, 2017 1:43:58 PM Status of host ov300 was set to Up.
2) May 23, 2017 1:43:55 PM Executing power management status on Host ov300
using Proxy Host ov301 and Fence Agent ipmilan:
1) May 23, 2017 1:43:37 PM Host ov300 is not responding. It will stay in
Connecting state for a grace period of 61 seconds and after that an attempt
to fence the host will be issued.

Can anyone tell exactly the meaning of the different lines?
Is the 1) detected because the engine, from only a network point of view,
was not able to ping/reach the hostname of the host ov300, or the "not
responding" is any particular specific check?
Is the "61 seconds" delay tunable?
Is 2) an additional check to verify status of ov300?
In case of failure of test in 2) would the fencing have been immediate or
the delay described in 1) would have taken place?
Are 3) and 4) messages independent from the engine being able to reach
ov300 or the 61 seconds delay would have been true anyway?

Hope I have explained my doubts related to events that could determine a
potential fencing of an active node with its running VMs... with the "only"
temporary problem of connectivity between the engine and one of the nodes...

Thanks in advance,
Users mailing list

Reply via email to