Re: [ovirt-users] What network test validates a host?

2016-06-02 Thread Nicolas Ecarnot

Thank you Edward and Nir for your answers.

--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] What network test validates a host?

2016-06-02 Thread Edward Haas
On Wed, Jun 1, 2016 at 2:27 PM, Nicolas Ecarnot  wrote:

> Hello,
>
> Last week, one of our DC went through a network crash, and surprisingly,
> most of our hosts did resist.
> Some of them lost there connectivity, and were stonithed.
>
> I'd like to be sure to understand what tests are made to declare a host
> valid :
>
> - On the storage part, I guess EVERY[1] host is doing a read+write test
> (using "dd") towards the storage domain(s), every... say 5 seconds (?)
> In case of failure, I guess a countdown is triggered until this host is
> shot.
>
> But the network failure we faced was not on the dedicated storage network,
> but purely on the "LAN" network (5 virtual networks).
>
> - What kind of test is done on each host to declare the connectivity is OK
> on every virtual network?
> I ask that because oVirt has no knowledge of any gateway it could ping,
> and in some cases, some virtual networks don't even have a gateway.
> Is it a ping towards the SPM?
> Towards the engine?
> Is it a ping?
>
> I ask that because I found out that some host restarted nicely, ran some
> VMs, which had their NICs OK, but inside those guests, we find evidences
> that they were not able to communicate with very simple networks usually
> provided but the host.
> So I'm trying to figure out if a host could come back to life, but
> partially sound.
>
> [1] Thus, I don't clearly see the benefit of the SPM concept...
>
> --
> Nicolas ECARNOT
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>

Hello Nicolas,

In general, oVirt Engine checks frequently the host state by asking it to
send a stats report.
As part of that report, nic state is reported.
Engine will move the host to non-operational in case a 'required' network
nic link is down, or if it cannot reach the host through the management
network.

One can also use a VDSM hook to check against a reference IP for
connectivity and fake the nic state.

In case storage domain connectivity fails (attempts to read fails), it will
report back to engine through the stats report and Engine will move the
host to non-operational after a few minutes.

Thanks,
Edy.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] What network test validates a host?

2016-06-01 Thread Nicolas Ecarnot

Hello,

Last week, one of our DC went through a network crash, and surprisingly, 
most of our hosts did resist.

Some of them lost there connectivity, and were stonithed.

I'd like to be sure to understand what tests are made to declare a host 
valid :


- On the storage part, I guess EVERY[1] host is doing a read+write test 
(using "dd") towards the storage domain(s), every... say 5 seconds (?)
In case of failure, I guess a countdown is triggered until this host is 
shot.


But the network failure we faced was not on the dedicated storage 
network, but purely on the "LAN" network (5 virtual networks).


- What kind of test is done on each host to declare the connectivity is 
OK on every virtual network?
I ask that because oVirt has no knowledge of any gateway it could ping, 
and in some cases, some virtual networks don't even have a gateway.

Is it a ping towards the SPM?
Towards the engine?
Is it a ping?

I ask that because I found out that some host restarted nicely, ran some 
VMs, which had their NICs OK, but inside those guests, we find evidences 
that they were not able to communicate with very simple networks usually 
provided but the host.
So I'm trying to figure out if a host could come back to life, but 
partially sound.


[1] Thus, I don't clearly see the benefit of the SPM concept...

--
Nicolas ECARNOT
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users