It's important to understand the oVirt design philosophy. That may be somewhat understated in the documentation, because I am afraid they copied that from VMware's vSphere who might have copied it from Nutanix, who might have copied it from who-know-else... which might explain why they are a little shy about it.
The basic truth is: HA is a chicken and egg issue. Having several management engines won't get you HA, because in a case of conflict, these HA engines can't easily decide who is boss. Which is why oVirt (and vSphere/Nutanix most likely) will concede defeat (or death) on every startup. What that mostly means is that you don't really need to ensure that a smart and higly complex management machine, which can juggle dozens of infrastructure pieces against an optimal plan of operations, is in fact at all times highly available. It's quite enough to have the infrastructure pieces ensure that the last plan this ME produced is faithfully executed. So oVirt has a super intelligent management engine build a plan. That plan is written to super primitive but reliant storage. All hosts will faithfully (and without personal ambitions to improve) execute that last plan, which includes launching the management engine... And that single newly started management engine, can read the basic infrastructure data, as well as the latest plant, to hopefully create a better new plan, before it dies... And that's why, unless your ME always dies before a new plan can be created, you don't need HA for the ME: It's sufficient to have a good-enough plan for all hosts. Like far too many clusters, oVirt relegates HA to a passive storage device, that is always highly available. With SANs and NFS filers, that's hopefully solved in hardware. With HCI Glusters it's done with majority voting, hopefully. All that said... I've rarely had all 3 nodes register just perfectly in a 3 node oVirt HCI cluster. I don't have any idea why that is the case in both 4.3 and 4.4. I have almost always had to add the two additional node via 'add host' to make them available both as compute nodes and as Gluster peers. On the other hand, it just works and doing it twice or a hundred times, wont break a thing. And that is true for almost every component of oVirt: practically all services can fail, or be restarted at any time, without causing a major disruption or outrigth failure. That's where I can't but admire this fail-safe approach (which somewhat unfortunately might not have been invented at Redhat, even if Moshe Bar most likely had a hand in it). It never hurts do make sure that you add those extra nodes with the ability to run the management engine, either, but it's also something you can always add later to any host (just takes Ansible patience to do so). Today I just consider that one of dozens if not hundreds of quirks of oVirt, that I find 'amazing' in a product also sold commercially. _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AYAKEZMSUSPN5GIDA7HELHVRU4GFY36F/