Hi Jim,

I don't have any targeted suggestions, because there isn't much to latch on to. 
I can say Gluster replica three  (no arbiters) on dedicated servers serving a 
couple Ovirt VM clusters here have not had these sorts of issues. 

I suspect your long heal times (and the resultant long periods of high load) 
are at least partly related to 1G networking. That is just a matter of IO - 
heals of VMs involve moving a lot of bits. My cluster uses 10G bonded NICs on 
the gluster and ovirt boxes for storage traffic and separate bonded 1G for 
ovirtmgmt and communication with other machines/people, and we're occasionally 
hitting the bandwidth ceiling on the storage network. I'm starting to think 
about 40/100G, different ways of splitting up intensive systems, and 
considering iSCSI for specific volumes, although I really don't want to go 
there.

I don't run FreeNAS[1], but I do run FreeBSD as storage servers for their 
excellent ZFS implementation, mostly for backups. ZFS will make your `heal` 
problem go away, but not your bandwidth problems, which become worse (because 
of fewer NICS pushing traffic). 10G hardware is not exactly in the impulse-buy 
territory, but if you can, I'd recommend doing some testing using it. I think 
at least some of your problems are related.

If that's not possible, my next stops would be optimizing everything I could 
about sharding, healing and optimizing for serving the shard size to squeeze as 
much performance out of 1G as I could, but that will only go so far.

-j

[1] FreeNAS is just a storage-tuned FreeBSD with a GUI.

> On Jul 6, 2018, at 1:19 PM, Jim Kusznir <[email protected]> wrote:
> 
> hi all:
> 
> Once again my production ovirt cluster is collapsing in on itself.  My 
> servers are intermittently unavailable or degrading, customers are noticing 
> and calling in.  This seems to be yet another gluster failure that I haven't 
> been able to pin down.
> 
> I posted about this a while ago, but didn't get anywhere (no replies that I 
> found).  The problem started out as a glusterfsd process consuming large 
> amounts of ram (up to the point where ram and swap were exhausted and the 
> kernel OOM killer killed off the glusterfsd process).  For reasons not clear 
> to me at this time, that resulted in any VMs running on that host and that 
> gluster volume to be paused with I/O error (the glusterfs process is usually 
> unharmed; why it didn't continue I/O with other servers is confusing to me).
> 
> I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and 
> data-hdd).  The first 3 are replica 2+arb; the 4th (data-hdd) is replica 3.  
> The first 3 are backed by an LVM partition (some thin provisioned) on an SSD; 
> the 4th is on a seagate hybrid disk (hdd + some internal flash for 
> acceleration).  data-hdd is the only thing on the disk.  Servers are Dell 
> R610 with the PERC/6i raid card, with the disks individually passed through 
> to the OS (no raid enabled).
> 
> The above RAM usage issue came from the data-hdd volume.  Yesterday, I cought 
> one of the glusterfsd high ram usage before the OOM-Killer had to run.  I was 
> able to migrate the VMs off the machine and for good measure, reboot the 
> entire machine (after taking this opportunity to run the software updates 
> that ovirt said were pending).  Upon booting back up, the necessary volume 
> healing began.  However, this time, the healing caused all three servers to 
> go to very, very high load averages (I saw just under 200 on one server; 
> typically they've been 40-70) with top reporting IO Wait at 7-20%.  Network 
> for this volume is a dedicated gig network.  According to bwm-ng, initially 
> the network bandwidth would hit 50MB/s (yes, bytes), but tailed off to mostly 
> in the kB/s for a while.  All machines' load averages were still 40+ and 
> gluster volume heal data-hdd info reported 5 items needing healing.  Server's 
> were intermittently experiencing IO issues, even on the 3 gluster volumes 
> that appeared largely unaffected.  Even the OS activities on the hosts itself 
> (logging in, running commands) would often be very delayed.  The ovirt engine 
> was seemingly randomly throwing engine down / engine up / engine failed 
> notifications.  Responsiveness on ANY VM was horrific most of the time, with 
> random VMs being inaccessible.
> 
> I let the gluster heal run overnight.  By morning, there were still 5 items 
> needing healing, all three servers were still experiencing high load, and 
> servers were still largely unstable.
> 
> I've noticed that all of my ovirt outages (and I've had a lot, way more than 
> is acceptable for a production cluster) have come from gluster.  I still have 
> 3 VMs who's hard disk images have become corrupted by my last gluster crash 
> that I haven't had time to repair / rebuild yet (I believe this crash was 
> caused by the OOM issue previously mentioned, but I didn't know it at the 
> time).
> 
> Is gluster really ready for production yet?  It seems so unstable to me....  
> I'm looking at replacing gluster with a dedicated NFS server likely FreeNAS.  
> Any suggestions?  What is the "right" way to do production storage on this (3 
> node cluster)?  Can I get this gluster volume stable enough to get my VMs to 
> run reliably again until I can deploy another storage solution?
> 
> --Jim
> _______________________________________________
> Users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/[email protected]/message/YQX3LQFQQPW4JTCB7B6FY2LLR6NA2CB3/
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/4N2M35IHEWNTQA4MBGWYGU5BK2I4RPTK/

Reply via email to