Hey guys,

I have tried to write on the IRC channel, regarding this issue, but I'm no sure 
if I'm not doing it right or maybe there are not many people watching the 
#ovirt channel.

We have deployed a 6 node oVirt cluster, and have moved roughly 100 VMs on 
them. We started the cluster with NFS storage, which we were building and 
testing with our Ceph cluster. The Ceph has been ready for a few months now, 
and we have tested running oVirt VMs on it using RBD, CephFS and by using the 
NFS Ganesha server which can be enabled on Ceph.

Initially I tested with RBD using the cinderlib functionality in oVirt. This 
appears to be working fine, and I can live with the fact tha live storage 
migrations from posix storage are not possible. The biggest hurdle vi 
encountered are, that the backup APIs in oVirt cannot be used to create a 
backup and download the backups. This breaks our RHV Veeam backup solution, 
which we have grown quite fond of. But even my homegrown backup solutions from 
earlier don't work either, as they use the APIs.
For this reason we have now changed to CephFS. It has a different set of 
problems, as we can only supply 1 monitor when mounting the CephFS storage, 
making i less robust as it should be. It is also making metadata storage 
dependent on the MDS. As far as I understand, is data access still connecting 
to the OSDs where it resides directly. I have multiple MDS containers on the 
Ceph cluster for loadbalancing and redundancy, but it still feels less tidy 
than RBD with native Ceph support. The good thing is, that as CephFS is a POSIX 
filesystem, the backup APIs work and so does Veeam backup.

The biggest problems I am struggling with, is the unexplained pausing of 
exclusively VMs running the Windows Operating system. This is why I didn't 
notice issues initially, as I have been testing the storage with Linux VMs. The 
machines pause at a random moment in time with a "Storage Error". I cannot 
resume the machine, not even using virsh on the host it is running on. The only 
thing I can do is power it off, and reboot. And in many cases, it then pauses 
during the boot process. I have been watching the VDSM logs, and couldn't work 
out why this happens. When I move the disk to plain NFS storage (not on Ceph), 
this never happens. The result is that most Windows based VMs have not been 
moved yet to CephFS. I have 0 occurrences of this happening with Linux VMs, of 
which I have many (I think we have 80 linux VMs vs 20 Windows VMs).

The Ceph cluster does not show and problems before, during or after this 
happening. Is there anyone with experience wih oVirt and Ceph, that can share 
experiences or help me find the root cause of this problem? I have a couple of 
things I can progress wih:

1. Move a couple of Windows VMs to RBD, and install a Veeam agent on the 
servers instead. This is just opening a new can of worms, as it requires some 
work on the servers each time I add a new server and I also need to make 
network openings from the server to the backup repos. It is just a little less 
neat than the RHV Veeam backup solution using the hypervisor.

2. Move a couple of Windows VMs to the NFS Ganesha implementation. This means 
ALL traffic is going through the NFS containers created on the Ceph cluster, 
and I loose the distributed nature of the oVirt hosts talking direcly to the 
OSDs on the Ceph cluster. If I was to go this way, I should probably create 
some NFS Ganesha servers that connect to Ceph natively on the one end and 
provide NFS services to oVirt.

Both tests would still test Ceph, but using an alternative method to CephFS. My 
preferred solution really is 1., was it not for the backup APIs being rendered 
useless. Is work still being carried out on development in these areas, or has 
oVirt/Ceph development ceased?

Thoughts and comments are welcome. Looking forward to sparring with someone 
that has experience with this :-)

With kind regards
Jelle
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/64KJ7LK7CDF344PWGKOKHWQYGRU4MFHP/

Reply via email to