[ovirt-users] Re: better understand ovirt-engine functions

2018-08-07 Thread stuartk
For future readers, here is supporting material:

An admin's view of the feature:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Administration_Guide/sect-Cluster_Tasks.html

A developer’s view of the feature:
https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/

A discussion of how this works in practice (similar failure scenario):
http://users.ovirt.narkive.com/XWYhDO6R/ovirt-users-strange-fencing-behaviour-3-5-3
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AXTTCEGNLL5BGTOH73JVDY53NTAUBFUI/


[ovirt-users] Re: better understand ovirt-engine functions

2018-08-03 Thread stuartk
I have been reading about fencing:
https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/
https://www.slideshare.net/MartinPeina/host-fencing-in-ovirt-fixing-the-unknown-and-allowing-vms-to-be-highly-available

Looking at Edit Cluster, I see:
X Enable Fencing
 Skip fencing if host has live lease on storage
 Skip fencing on cluster connectivity issues 
Threshold 50
i.e. Fencing is Enabled and neither of the two 'Skip' options are checked

Looking at Edit Host... Power Management, I see:
X Enable Power Management
X Kdump integration
Primary
[ ... and the remaining fields are populated with our ILO address & credential 
info ...]

OK, I get it now.  Here's the story:
- ovirt-engine runs on Cluster A in Data Center A
- When ovirt-engine is unable to reach Cluster B in Data Center B, given enough 
disruption, Fencing will kick in and try to mitigate the problem using various 
techniques, including (eventually) power cycling via the ILO

One path forward for me is to check:
 X Skip fencing on cluster connectivity issues 
Threshold 50
And twink with the Threshold to be more suitable for my cluster:
 X Skip fencing on cluster connectivity issues 
Threshold 2

OK, I have a plausible model for understanding what has been happening.

Thank you for your assistance.

--sk
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GP4KCCM3CS3BW44U6UQQXJM3SPIUVG63/


[ovirt-users] Re: better understand ovirt-engine functions

2018-08-02 Thread stuartk
OK, I've spent time capturing traffic from the Hosts in Cluster B back to Data 
Center A.  I don't believe most of the traffic matters:  syslog, snmp, icmp, 
influxd (grafana), ssh, cfengine

After filtering out all that, I'm left with TCP 54321 -- netstat tells me that 
the Python interpreter owns this port -- I'm guessing that this daemon is 
talking with ovirt-engine down in Data Center A.

No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor 49152+  
(that's what I expected to see ... some sort of storage dependency between the 
two)

So I'm back to wondering under what happens when the conversation between 
ovirt-engine and KVM instances is disrupted?  Does it sound plausible that bad 
things happen?  Or would you say that this seems unlikely ... that management 
functions may be disrupted, but operational functions would be unaffected?

--sk
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ICED3LWK4UACXZJLGMGKHFR4BLXHV5W3/


[ovirt-users] Re: better understand ovirt-engine functions

2018-07-31 Thread stuartk
What do I mean by 'Hosts in Cluster B crashed':

I have (3) events (events during which I twinked with the Data Center A 
network) during which I accumulated symptoms:
Event 1:  A handful of the several dozen VMs in Cluster B Paused due to Storage 
Issues.  Restarted the VMs to restore service.
Event 2:  Same
Event 3:  One of the (3) Hosts in Cluster B rebooted (that's what I mean by 
'crash').  gluster was unhappy also (two of the three Hosts also function as 
Gluster Bricks) ... but that could be a byproduct.

At the start of Event 3, I put hosted-engine into global maintenance mode:
hosted-engine --set-maintenance --mode=global
Why?  Because I was imagining that hosted-engine might perform some sort of 
connectivity checks with its local IP gateway ... and if it couldn't reach it, 
then emit some sort of 'shutdown' commands to *all* the KVM hosts it knows 
about (yes, I'm waiving my hands a lot right here ... ergo my interest in 
reading about what kind of checks ovirt-engine performs and what kind of 
remedial action it might take based on the results of those checks).


You are suggesting that Cluster B depends, storage-wise, on Cluster A (or, more 
precisely, on Storage located at Cluster A's site).  That's where my thoughts 
turned immediately ... but thus far, I don't see it in the pcaps I've gathering 
-- lots of ovirt-engine traffic, but nothing else.  More poking needed.

ovirt 3.5
glusterfs 3.7.6

I want to do more homework, to demonstrate that Cluster B has no storage 
dependency on Data Center A.

But back to my original question:  where might I go to better understand what 
kind of checks ovirt-engine performs on KVM hosts and what kind of remedial 
action it might take, based on the results of those checks?

--sk
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TOAV667Y3NOYPDKAIC3LL7PLT3TPP2FM/


[ovirt-users] better understand ovirt-engine functions

2018-07-23 Thread stuartk
I've been reading through documentation
https://www.ovirt.org/documentation/architecture/architecture/
https://www.ovirt.org/documentation/self-hosted/Self-Hosted_Engine_Guide/

But am struggling still to understand the role ovirt-engine plays.  Would 
anyone have recommends for additional reads?

The problem I'm tackling currently looks like this:
- We have (2) oVirt Data Centers, each populated by a single Cluster.  The Data 
Centers are physically & network-wise 'distant' from one another
- hosted-engine runs on (3) of the (4) Hosts in Data Center A / Cluster A.  
hosted-engine does not run on Data Center B / Cluster B
- When we disrupt network connectivity around Cluster A (yes, that's Cluster 
*A*), Hosts in Cluster B crash (requiring a power cycle) and Guests in Cluster 
B get stopped and paused

I'm struggling to understand why mussing with Cluster A affects Cluster B.  
From pcaps, I can see plenty of TLS traffic from Cluster A's Hosts -- 
presumably from ovirt-engine running on Cluster A -- exchanged with Cluster B.  
So, during my last maintenance window, I put hosted-engine into maintenance 
mode ... but Hosts/VMs in Cluster B were still affected.

Where do I go to better understand what ovirt-engine does when it is 'managing' 
Hosts & VMs?

--sk
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VQKE5JRT3DC5HQVZXP72FVC75RNAOAXM/