[ovirt-users] Re: better understand ovirt-engine functions
For future readers, here is supporting material: An admin's view of the feature: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.5/html/Administration_Guide/sect-Cluster_Tasks.html A developer’s view of the feature: https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/ A discussion of how this works in practice (similar failure scenario): http://users.ovirt.narkive.com/XWYhDO6R/ovirt-users-strange-fencing-behaviour-3-5-3 ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AXTTCEGNLL5BGTOH73JVDY53NTAUBFUI/
[ovirt-users] Re: better understand ovirt-engine functions
I have been reading about fencing: https://www.ovirt.org/develop/developer-guide/engine/automatic-fencing/ https://www.slideshare.net/MartinPeina/host-fencing-in-ovirt-fixing-the-unknown-and-allowing-vms-to-be-highly-available Looking at Edit Cluster, I see: X Enable Fencing Skip fencing if host has live lease on storage Skip fencing on cluster connectivity issues Threshold 50 i.e. Fencing is Enabled and neither of the two 'Skip' options are checked Looking at Edit Host... Power Management, I see: X Enable Power Management X Kdump integration Primary [ ... and the remaining fields are populated with our ILO address & credential info ...] OK, I get it now. Here's the story: - ovirt-engine runs on Cluster A in Data Center A - When ovirt-engine is unable to reach Cluster B in Data Center B, given enough disruption, Fencing will kick in and try to mitigate the problem using various techniques, including (eventually) power cycling via the ILO One path forward for me is to check: X Skip fencing on cluster connectivity issues Threshold 50 And twink with the Threshold to be more suitable for my cluster: X Skip fencing on cluster connectivity issues Threshold 2 OK, I have a plausible model for understanding what has been happening. Thank you for your assistance. --sk ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GP4KCCM3CS3BW44U6UQQXJM3SPIUVG63/
[ovirt-users] Re: better understand ovirt-engine functions
OK, I've spent time capturing traffic from the Hosts in Cluster B back to Data Center A. I don't believe most of the traffic matters: syslog, snmp, icmp, influxd (grafana), ssh, cfengine After filtering out all that, I'm left with TCP 54321 -- netstat tells me that the Python interpreter owns this port -- I'm guessing that this daemon is talking with ovirt-engine down in Data Center A. No sign of gluster-oriented traffic, e.g. TCP/UDP ports 24007/24008 nor 49152+ (that's what I expected to see ... some sort of storage dependency between the two) So I'm back to wondering under what happens when the conversation between ovirt-engine and KVM instances is disrupted? Does it sound plausible that bad things happen? Or would you say that this seems unlikely ... that management functions may be disrupted, but operational functions would be unaffected? --sk ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ICED3LWK4UACXZJLGMGKHFR4BLXHV5W3/
[ovirt-users] Re: better understand ovirt-engine functions
What do I mean by 'Hosts in Cluster B crashed': I have (3) events (events during which I twinked with the Data Center A network) during which I accumulated symptoms: Event 1: A handful of the several dozen VMs in Cluster B Paused due to Storage Issues. Restarted the VMs to restore service. Event 2: Same Event 3: One of the (3) Hosts in Cluster B rebooted (that's what I mean by 'crash'). gluster was unhappy also (two of the three Hosts also function as Gluster Bricks) ... but that could be a byproduct. At the start of Event 3, I put hosted-engine into global maintenance mode: hosted-engine --set-maintenance --mode=global Why? Because I was imagining that hosted-engine might perform some sort of connectivity checks with its local IP gateway ... and if it couldn't reach it, then emit some sort of 'shutdown' commands to *all* the KVM hosts it knows about (yes, I'm waiving my hands a lot right here ... ergo my interest in reading about what kind of checks ovirt-engine performs and what kind of remedial action it might take based on the results of those checks). You are suggesting that Cluster B depends, storage-wise, on Cluster A (or, more precisely, on Storage located at Cluster A's site). That's where my thoughts turned immediately ... but thus far, I don't see it in the pcaps I've gathering -- lots of ovirt-engine traffic, but nothing else. More poking needed. ovirt 3.5 glusterfs 3.7.6 I want to do more homework, to demonstrate that Cluster B has no storage dependency on Data Center A. But back to my original question: where might I go to better understand what kind of checks ovirt-engine performs on KVM hosts and what kind of remedial action it might take, based on the results of those checks? --sk ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TOAV667Y3NOYPDKAIC3LL7PLT3TPP2FM/
[ovirt-users] better understand ovirt-engine functions
I've been reading through documentation https://www.ovirt.org/documentation/architecture/architecture/ https://www.ovirt.org/documentation/self-hosted/Self-Hosted_Engine_Guide/ But am struggling still to understand the role ovirt-engine plays. Would anyone have recommends for additional reads? The problem I'm tackling currently looks like this: - We have (2) oVirt Data Centers, each populated by a single Cluster. The Data Centers are physically & network-wise 'distant' from one another - hosted-engine runs on (3) of the (4) Hosts in Data Center A / Cluster A. hosted-engine does not run on Data Center B / Cluster B - When we disrupt network connectivity around Cluster A (yes, that's Cluster *A*), Hosts in Cluster B crash (requiring a power cycle) and Guests in Cluster B get stopped and paused I'm struggling to understand why mussing with Cluster A affects Cluster B. From pcaps, I can see plenty of TLS traffic from Cluster A's Hosts -- presumably from ovirt-engine running on Cluster A -- exchanged with Cluster B. So, during my last maintenance window, I put hosted-engine into maintenance mode ... but Hosts/VMs in Cluster B were still affected. Where do I go to better understand what ovirt-engine does when it is 'managing' Hosts & VMs? --sk ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VQKE5JRT3DC5HQVZXP72FVC75RNAOAXM/