[ovirt-users] Re: better understand ovirt-engine functions
For future readers, here is supporting material: An admin's view of the feature: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Administration_Guide/sect-Cluster_Tasks.html A developer’s view of the feature: https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/ A discussion of how this works in practice (similar failure scenario): http://users.ovirt.narkive.com/XWYhDO6R/ovirt-users-strange-fencing-behaviour-3-5-3 ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AXTTCEGNLL5BGTOH73JVDY53NTAUBFUI/
[ovirt-users] Re: better understand ovirt-engine functions
I have been reading about fencing: https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/ https://www.slideshare.net/MartinPeina/host-fencing-in-ovirt-fixing-the-unknown-and-allowing-vms-to-be-highly-available Looking at Edit Cluster, I see: X Enable Fencing Skip fencing if host has live lease on storage Skip fencing on cluster connectivity issues Threshold 50 i.e. Fencing is Enabled and neither of the two 'Skip' options are checked Looking at Edit Host... Power Management, I see: X Enable Power Management X Kdump integration Primary [ ... and the remaining fields are populated with our ILO address & credential info ...] OK, I get it now. Here's the story: - ovirt-engine runs on Cluster A in Data Center A - When ovirt-engine is unable to reach Cluster B in Data Center B, given enough disruption, Fencing will kick in and try to mitigate the problem using various techniques, including (eventually) power cycling via the ILO One path forward for me is to check: X Skip fencing on cluster connectivity issues Threshold 50 And twink with the Threshold to be more suitable for my cluster: X Skip fencing on cluster connectivity issues Threshold 2 OK, I have a plausible model for understanding what has been happening. Thank you for your assistance. --sk ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GP4KCCM3CS3BW44U6UQQXJM3SPIUVG63/
[ovirt-users] Re: better understand ovirt-engine functions
On Thu, Aug 2, 2018 at 1:47 PM, wrote: > OK, I've spent time capturing traffic from the Hosts in Cluster B back to > Data Center A. I don't believe most of the traffic matters: syslog, snmp, > icmp, influxd (grafana), ssh, cfengine > > After filtering out all that, I'm left with TCP 54321 -- netstat tells me > that the Python interpreter owns this port -- I'm guessing that this daemon > is talking with ovirt-engine down in Data Center A. > > No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor > 49152+ (that's what I expected to see ... some sort of storage dependency > between the two) > > So I'm back to wondering under what happens when the conversation between > ovirt-engine and KVM instances is disrupted? Does it sound plausible that > bad things happen? Or would you say that this seems unlikely ... that > management functions may be disrupted, but operational functions would be > unaffected? > > --sk > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/installation_guide/networking-requirements#host-firewall-requirements_RHV_install port 54321 used by both manager and hosts to inter-communicate each one with the others: VDSM communications with the Manager and other virtualization hosts. I think that in case engine in site A is not able to connect to vdsmd of a host in site B (did this happen? you only talk about disruption inside site A but it is not clear the kind of disruption...), I think it should mark it as not responsive and eventually fence it so that it can release VM resources (if VMs running on it) and storage (if SPM) it is carrying on and start on other hosts. But if all cluster B hosts becomes unresponsive from the engine point of view I don't know the default action what would be: perhaps freeze all until something comes back? Did you configure fencing in your clusters? If so, when you stop communication inside site A, could it affect your fencing configuration towards hosts in site B? Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/RH77HTNBGUFLORGIQ43ONBA43R6RCP7F/
[ovirt-users] Re: better understand ovirt-engine functions
OK, I've spent time capturing traffic from the Hosts in Cluster B back to Data Center A. I don't believe most of the traffic matters: syslog, snmp, icmp, influxd (grafana), ssh, cfengine After filtering out all that, I'm left with TCP 54321 -- netstat tells me that the Python interpreter owns this port -- I'm guessing that this daemon is talking with ovirt-engine down in Data Center A. No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor 49152+ (that's what I expected to see ... some sort of storage dependency between the two) So I'm back to wondering under what happens when the conversation between ovirt-engine and KVM instances is disrupted? Does it sound plausible that bad things happen? Or would you say that this seems unlikely ... that management functions may be disrupted, but operational functions would be unaffected? --sk ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ICED3LWK4UACXZJLGMGKHFR4BLXHV5W3/
[ovirt-users] Re: better understand ovirt-engine functions
What do I mean by 'Hosts in Cluster B crashed': I have (3) events (events during which I twinked with the Data Center A network) during which I accumulated symptoms: Event 1: A handful of the several dozen VMs in Cluster B Paused due to Storage Issues. Restarted the VMs to restore service. Event 2: Same Event 3: One of the (3) Hosts in Cluster B rebooted (that's what I mean by 'crash'). gluster was unhappy also (two of the three Hosts also function as Gluster Bricks) ... but that could be a byproduct. At the start of Event 3, I put hosted-engine into global maintenance mode: hosted-engine --set-maintenance --mode=global Why? Because I was imagining that hosted-engine might perform some sort of connectivity checks with its local IP gateway ... and if it couldn't reach it, then emit some sort of 'shutdown' commands to *all* the KVM hosts it knows about (yes, I'm waiving my hands a lot right here ... ergo my interest in reading about what kind of checks ovirt-engine performs and what kind of remedial action it might take based on the results of those checks). You are suggesting that Cluster B depends, storage-wise, on Cluster A (or, more precisely, on Storage located at Cluster A's site). That's where my thoughts turned immediately ... but thus far, I don't see it in the pcaps I've gathering -- lots of ovirt-engine traffic, but nothing else. More poking needed. ovirt 3.5 glusterfs 3.7.6 I want to do more homework, to demonstrate that Cluster B has no storage dependency on Data Center A. But back to my original question: where might I go to better understand what kind of checks ovirt-engine performs on KVM hosts and what kind of remedial action it might take, based on the results of those checks? --sk ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TOAV667Y3NOYPDKAIC3LL7PLT3TPP2FM/
[ovirt-users] Re: better understand ovirt-engine functions
On Mon, Jul 23, 2018 at 7:48 PM wrote: > I've been reading through documentation > https://www.ovirt.org/documentation/architecture/architecture/ > https://www.ovirt.org/documentation/self-hosted/Self-Hosted_Engine_Guide/ > > But am struggling still to understand the role ovirt-engine plays. Would > anyone have recommends for additional reads? > > The problem I'm tackling currently looks like this: > - We have (2) oVirt Data Centers, each populated by a single Cluster. The > Data Centers are physically & network-wise 'distant' from one another > - hosted-engine runs on (3) of the (4) Hosts in Data Center A / Cluster > A. hosted-engine does not run on Data Center B / Cluster B > - When we disrupt network connectivity around Cluster A (yes, that's > Cluster *A*), Hosts in Cluster B crash (requiring a power cycle) and Guests > in Cluster B get stopped and paused > > I'm struggling to understand why mussing with Cluster A affects Cluster > B. From pcaps, I can see plenty of TLS traffic from Cluster A's Hosts -- > presumably from ovirt-engine running on Cluster A -- exchanged with Cluster > B. So, during my last maintenance window, I put hosted-engine into > maintenance mode ... but Hosts/VMs in Cluster B were still affected. > The engine is the brain of your system, it's a kind of orchestrator that starts different tasks on your hosts. You need an engine for new tasks but existing tasks such as running a VM should not be affected at all by the lack of an engine. I fear that your issue is somewhere else. what do you exactly mean with "Hosts in Cluster B crash "? are you sure that your hosts in data center B are not consuming storage exposed by hosts in data center A? > > Where do I go to better understand what ovirt-engine does when it is > 'managing' Hosts & VMs? > > --sk > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/VQKE5JRT3DC5HQVZXP72FVC75RNAOAXM/ > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5LH4NHBO6OEBMH3BCYSQXEEMRAQGL3N2/