On 2020-10-01 04:33, Jeremey Wise wrote:
>
> I have for many years used gluster because..well.  3 nodes.. and so
> long as I can pull a drive out.. I can get my data.. and with three
> copies.. I have much higher chance of getting it.
>
> Downsides to gluster: Slower (its my home..meh... and I have SSD to
> avoid MTBF issues ) and with VDO.. and thin provisioning.. not had issue.
>
> BUT....  gluster seems to be falling out of favor.  Especially as I
> move towards OCP.
>
> So..  CEPH.  I have one SSD in each of the three servers.  so I have
> some space to play.
>
> I googled around.. and find no clean deployment notes and guides on
> CEPH + oVirt.
>

Greetings,

First, the legalese...the below is all personal view/experiences...I am
not speaking on my employers behalf on any of it, I just happen to have
most of the experience from my work for them...yadda yadda yadda...blah
blah blah...and so on and so forth. This is all just me and my
thoughts/opinions. :-D

*sigh* It sucks we are in a world where that garbage has to be said when
talking about work-related experiences....Anyway...



We've been running CephFS since Firefly in 2014. Yeah, I know. We were
crazy, but the risk of data loss vs speed was within threshold of what
we were trying to do.

Fast-forward six years and we've got two CephFS clusters as primary
storage for High Performance Clusters where we very much care about
performance AND the risk of data loss. We've also got two deployments of
oVirt with CephFS as the filesystem. In other words, I've got some
experience with this setup and we are /very/ happy with it. :-)

I'm so happy with it, that it is easier/faster for me to list the bad
than to list the good.

1. Red Hat (our OS to satisfy the "enterprise" check-box for the
audit-heads) and I have gone round and round multiple times over the
years. In short, don't expect excellent support out of oVirt for Ceph.
Want to use Ceph via iSCSI or Cinder? Whooo boy do I have some horror
stories for you! One of the many reasons we prefer CephFS. But say that
to them and you get blank looks until they've escalated the ticket
sufficiently high up the chain, and even then it's not reassuring...

However, if you pass CephFS to oVirt as NFS it works...but you don't get
the high-availability nor high-performance aspect of scaling your
metadata nodes when coming from oVirt. You _SHOULD_ scale your metadata
nodes (as with everything in Ceph, scaling in three's is best), but
oVirt won't let you mount "cephmds01,cephmds02,cephmds03". It will
gladly tell you that it works, but the moment you start a VM on it oVirt
freaks out and it has since I reported it years ago (I recently
confirmed this again on 4.4 with CentOS8). But if you just mount
"cephmds01" and then hack around on your IP routes in your switch to
handle the distribution of the data, it's fine. Honestly, even if you
just mount a single host and you /know/ that and you _plan_
upgrades/fails/ect around that, it's still fine. It just really sucks
that RH pushes Ceph and claims it's a valued FS, but then doesn't really
support anything but their cloud variations of Ceph and if you step out
of their very narrow definitions you get a *shrug*.
Sigh...anyway...digressing from that as this isn't the time/place for my
rants. :-D

Point being, if you are going RH don't expect to use any of their helper
scripts or minimal install builds or anything like that. Minimal OS
install, add CephFS drivers, then install oVirt (or...I forget what they
call it..) and configure Ceph like you would NFS. Should be fine
afterwards. But I've rarely found significant differences between the
community version of oVirt and the RH version (when comparing
same/similar versions) including the support for Ceph.

2. We get incredible performance out of Ceph, but it does require
tuning. Ceph crushes the pre-packaged vendors we ran tests against. But
part of the reason is because it is flexible enough that we can swap out
the bits that we need to scale - and we can do that FAR cheaper than the
pre-packaged solutions allow. Yes, in three's for the servers. Three
metadata's, three monitors (we double those two services on the same
servers), and storage in blocks of three. If your SSD's are fast enough,
1 SSD per every two spinning disks is a great ratio. And rebuild times
across the cluster are only as fast as your back-plane so you should
have a dedicated back-plane network in addition to your primary network.
Everyone wants their primary network fast, but your backplane should be
equally fast if not faster (and no, don't add just one "fast" network -
it should be two). So you are going to need to plan and tweak your
install. Just throwing parts at Ceph and expecting it to work will get
you mixed results at best.

3. I'd never run it at my home. My home oVirt system mounts NFS to a ZFS
filesystem. Nothing fancy either. Stripped mirrors ensure good
read/write speed with good fault tolerance. I threw two cheap SSD's as a
log drive and a cache drive (which these two SSD's made HUGE performance
gains for oVirt VM's) and it's been smooth sailing since. It's trivial
to manage/upgrade and FAR less over-head than Ceph.

That's really just the warnings I've got for you. I'm a HUGE fan of
oVirt and we've done some pretty nutty stuff with it in testing and I
trust it for multiple environments where we throw some pretty heavy
loads at it. I've got TONS of praise for oVirt and the whole team that
backs it. It's fantastic.

And I do love Ceph (and specifically CephFS) and we get incredible
performance that I could gush over all day long. If you are planning on
building Ceph on the cheap, plan replications in sets of three, and
prepare for lots of tweaking and tuning. If you are in the position to
buy, I *HIGHLY* recommend at least talking to https://softiron.com (I do
not work for them, I do not get any kick-back from them, I'm just very
pleased with their product). They focus on Ceph and they do it well, but
they still let you tweak as needed. And since they build off of Arm
processors, all the power and heat come from the drives...these things
run super-cool. Loads more efficient then the home-built stuff we ran
for years.

I'm even a huge fan of running oVirt with a CephFS storage! I _REALLY_
wish the combo would be treated better. But most of my frustrations are
many years old at this point, and we've figured out workarounds in the
meantime. It's too much for me to want to mess with at home, but so long
as you plan out your Ceph install and you are just prepared to be the
odd-ball using CephFS+oVirt including the workarounds it's a great setup.

I absolutely believe that we've gotten a HUGE return on investment into
Ceph...but I'm also using it for high-speed data computations in a big
cluster. The oVirt + CephFS is an add-on to the HPC + CephFS. The ROI on
oVirt is also huge because we were never satisfied with other
virtualization solutions and while OpenStack worked for us it was FAR
more overhead than we needed or could support with a team as small as
ours. So I'm a big believer that our specific use case for both is a
massive ROI win.

Should you decide to move forward with CephFS + oVirt and you have
questions, feel free to reach out to me. No promises that your problems
will be the same as mine, but I can at least share some
experiences/config-settings with you.

Good luck!
~Stack~
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/YSMPMBZ435SK6UHYSWHLQLG4YRO5LAQ3/

Reply via email to