[ovirt-users] Xcp-ng, first impressions as an oVirt HCI alternative

Thomas Hoberg Mon, 14 Feb 2022 11:30:24 -0800

Comments & motivational stuff were moved to the end...

Source/license:
Xen the hypervisor is moved to the Linux foundation. Perpetual open source, 
free to use.
Xcp-ng is a distribution of Xen, produced by a small French company based on 
Xen using (currently) a Linux 4.19 LTS kernel and an EL7 frozen userland from 
July 2020: they promise open source and free to use forever

Mode of operation:
You install Xcp-ng on your (bare metal) hardware. You manage nodes via
XenOrchestrator, which is a big Node.js application you can run in pretty much
any way you want.

Business model:
You can buy support for Xcp-ng at different levels of quality.
XenOrchestrator as AN APPLIANCE exposes different levels of functionality
depending on the level of support you buy.
But you get the full source code of the appliance and can compile it yourself
to support the full set of qualities.
There is a script out there, which allows you to auto-generate the appliance
with a single command.
In short you are never forced to pay, but better help can be purchased.

How does it feel to a CentOS/RHEL user?
The userland on the nodes is EL7, but you shouldn't touch that.
CLI is classic Xen, nothing like KVM or oVirt.
I guess libVirt and virsh should be similar, if they live up to their promise
at all.
Standard user-land on Orchestrator appliance is Debian, but you can build it on
pretty much any Linux with that script: all interaction is meant to be done via
Web-UI.

Installation/setup:
There is an image/ISO much like the oVirt node image. Based on a Linux 4.19 LTS
kernel and an EL7 frozen userland from July 2020 and a freshly maintained Xen
with tools.
Installation on bare metal or VMs (e.g. for nested experimentation) is a snap,
HCL isn't extraordinary. I'm still fighting to get the 2.5/5GBit USB3 NIC
working that I like using for my smallest test systems.

A single command on one node will download the "free"-Orchestrator appliance
(aka Xoa) and install it as a VM on that node. It's installed as auto-launch
and just point your brower to its IP to start with the GUI.

There is various other ways to build or run the GUI, which can be run on
anything remotely Linux, within the nodes or outside: more on this in the next
section.

The management appliance (Xoa) will run with only 2GB of RAM and 10GB of disk
for a couple of hosts. You grow to dozens of hosts, give a little more RAM and
it will be fine. Compare to the oVirt management engine it's very, very light
seems to have vastly less parts that can break.

And if it does, that doesn't matter, because it is pretty much stateless. E.g.
pool membership and configuratoin is on the nodes, so if you connect from
another Xoa they will just carry over. Ditto storage, that configuration which
oVirt keeps in the management engines Postgres database, is on the nodes in Xcp
and can be changed by any connected Xoa.

Operation:
Xen nodes are much more autonomous then oVirt hosts. The use whatever storage
they might have locally, or attached via SAN/NAS/Gluster[!!!] and others.They
will operate without a management engine much like "single node HCI oVirt" or
they can be joined into a pool, which opens up live migration and HA. A pool is
created by telling a node that it's the master now and then adding other nodes
to join in. The master can be changed and nodes can be moved to other pools.
Adding and removing nodes to a pool is very quick and easy and it's the same
for additional storage repositories: Any shared storage added to any node is
immediately visible to the pool and disks can be flipped between local and
shared storage very easily (I haven't tried live disk moves, but they could
work).

Having nodes in a pool qualfies them for live migration (CPU architecture
caveats apply). If storage is local, it will move with the VM, if storage is
shared, only RAM will move.

You can also move VMs not sharing a pool and even across different x86 variants
(e.g. AMD and Intel), when VMs are down. If you've ever daddled with "Export
domains" or "Backup domains" in oVirt, you just can't believe how quick and
easy these things are in Xcp-ng. VMs and their disks can be moved, copied,
cloned, backed-up and restore with a minimum of fuzz including continuous
backups on running machines.

You can label machines as "HA" so they'll always be restarted elsewhere, should
a host go down. You can define policies for how to balance workloads across
hosts and ensure that HA pairs won't share a host, pretty similar to oVirt.

The "free" Xoa has plenty of "upgrade!" buttons all over the place. So I went
ahead and build an appliance from source, that doesn't have these restrictions,
just to see what that would get me.

With this script here: https://github.com/ronivay/XenOrchestraInstallerUpdater
you can build the Xoa on any machine/VM you happen to be running with one of
the many supported Linux variants.

I build one variant to run as a VM on Xcp-ng and I used another to run on an
external machine: All three Xoas don't step on each others toes more than you
make them: very, very cool!

The built-in hypervisor interfaces and VIO drivers of almost any modern 64-Bit
Linux make VMs very easy. For Windows guests there are drivers available, which
make things as easy as with KVM. Unfortunately they don't seem to be exactly
the same, so you may do better to remove guest drivers on machines you want to
move. BTW good luck on trying to import OVAs exported from oVirt: Those contain
tags, no other hypervisor wants to accept, sometimes not even oVirt itself.
Articles on the forum suggest running CloneZilla at both ends for a
strorage-less migration of VMs.

I have not yet tested nested virtualization inside Xcp-ng (Xcp-ng works quite
well in a nested environment), nor have I done tests with pass-through devices
and GPUs. All of that is officially supported, with the typical caveats in case
of older Nvidia drivers.

So far every operation was quick and every button did what I expected it to do.
When things failed, I didn't have to go through endless log files to find out
what was wrong: the error messages were helpful enough to find the issues so
far.

Caveats:
I haven't done extensive failure testing yet, except one: I've been running
four host nodes as VMs on VMware workstation, with XCP-ng VMs being nested
inside. At one point I managed to run out of storage and I had to shut down all
"bare metal VMs" hard. Everything came back clean, only the migration project
that had tripped the storage failure had to be restarted, after I created some
space.

There is no VDO support in the kernel provided. It should be easy enough to add
given the sources and build scripts. But the benefit and the future of VDO are
increasingly debated.

The VMs use the VDI disk format, which is less powerful/flexible as CQOW with
regards to thin allocation and trimming. It's on the roadmap, not yet in the
product.

Hyperconverged Storage:
Xcp-ng so far doesn't come with a hyperconvered storage solution built in. Like
so many clusters it moves the responsability to your storage layer (could be
Gluster...).

They are in the process of making LINSTOR a directly supported HCI option, even
with such features as dispersed (erasure-coded) volumes to manage the write
amplification/resilience as the node numbers increase beyond three. That's not
ready today and they seem to take their sweet time about it. They label it
"XOASAN" and want to make that a paid option. But again, the full source code
is there and in the self-compiled Xoa appliance you should be able to use it
for free.

The current beta release only supports replicated mode and only up to four
nodes. But it seems to work reliably. Write amplification is 4x, so bandwidth
drops to 25% and is limited to the network speed, but reads will go to the
local node at storage hardware bandwidths.

The 2, 3 and 4 node replicated setup works today with the scripts they provide.
That's not quite as efficient as the 2R+1A setup in oVirt, but seems rock solid
and just works, which was never that easy in oVirt.

Hyperconverged is very attractive at 3 nodes, because good fault resilience
can't be done any cheaper. But the industry agrees that it tends to lose
financial attraction when you grow to dozens of machines and nobody in is right
mind would operate a real cloud using HCI. But going from 3 to say two dozen
should be doable and easy: it never was for oVirt.

XCP-ng or rather LINSTOR won't support any number or storage nodes or easy
linear increases like Gluster could (in theory).
So far it's only much, much better at getting things going.

Forum and community:
The documentation is rather good, but can be lighter on things which are
"native Xen", as that would repeat a lot of the effort already expended by
Citrix. It's good that I've been around the block a couple of times already
since VM/370, but there are holes or missing details when you need to ask
questions.

The community isn't giant, but comfortably big enough. The forum's user
interface is vastly better than this one, but then I've never seen something as
slow as ovirt.org for a long time.

Technical questions are answered extremely quickly and mostly by the staff from
the small French company themselves. But mostly it's much easier to find
answers to questions already asked, which is the most typical case.

The general impression is that there are much fewer moving parts and things
things that can go wrong. There is no Ansible, not a single daemon on the
nodes, a management engine that seems very light and with minimal state that
doesn't need a DBA to hold and manage it. The Xen hypervisor seems much smarter
than KVM on its own and the Xoa has both a rich API to do things, but also
offers an API to next-level management.

It may have less overall features than oVirt, but I haven't found anything that
I really missed. It's much easier and quicker to install and operate with
nothing but GUI which is a godsent: I want to use the farm, not spend my time
managing it.

Motivational rant originally at the top... tl;dr

My original priorities for chosing oVirt were:
1. CentOS as RHEL downstream -> stable platform, full vendor vulnerability
management included, big benefit in compliance
2. Integrated HCI -> Just slap a couple (3/6/9) of leftover servers together
for something fault resilient, no other components required, quick start out of
the box solution with a few clicks in a GUI
3. Fully open source (& logs), can always read the source to understand what's
going on, better than your typical support engineer
4. No support or license contract, unless you want/need it, but the ability to
switch that on when it paid for itself

The more famous competitors, vSphere and Nutanix didn't offer any of that.

(Citrix) Xen I excluded, because a) Xen seemd "old school+niche" compared to
KVM b) Citrix reduced "free" to "useless"

I fell in love with Gluster from its design: it felt like a really smart
solution.
I fell out with Gluster from its operation and performance: I can't count how
many times I had to restart daemons and issue "gluster heal" commands to
resettle things after little more than a node update.

I rediscovered Xcp-ng when I discovered that HCI and RHV had ben EOLd.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H4O4MHM5MVTHKR7ARO3APAT7SMBMYZN6/

[ovirt-users] Xcp-ng, first impressions as an oVirt HCI alternative

Reply via email to