Hi Jim, I don't have any targeted suggestions, because there isn't much to latch on to. I can say Gluster replica three (no arbiters) on dedicated servers serving a couple Ovirt VM clusters here have not had these sorts of issues.
I suspect your long heal times (and the resultant long periods of high load) are at least partly related to 1G networking. That is just a matter of IO - heals of VMs involve moving a lot of bits. My cluster uses 10G bonded NICs on the gluster and ovirt boxes for storage traffic and separate bonded 1G for ovirtmgmt and communication with other machines/people, and we're occasionally hitting the bandwidth ceiling on the storage network. I'm starting to think about 40/100G, different ways of splitting up intensive systems, and considering iSCSI for specific volumes, although I really don't want to go there. I don't run FreeNAS[1], but I do run FreeBSD as storage servers for their excellent ZFS implementation, mostly for backups. ZFS will make your `heal` problem go away, but not your bandwidth problems, which become worse (because of fewer NICS pushing traffic). 10G hardware is not exactly in the impulse-buy territory, but if you can, I'd recommend doing some testing using it. I think at least some of your problems are related. If that's not possible, my next stops would be optimizing everything I could about sharding, healing and optimizing for serving the shard size to squeeze as much performance out of 1G as I could, but that will only go so far. -j [1] FreeNAS is just a storage-tuned FreeBSD with a GUI. > On Jul 6, 2018, at 1:19 PM, Jim Kusznir <[email protected]> wrote: > > hi all: > > Once again my production ovirt cluster is collapsing in on itself. My > servers are intermittently unavailable or degrading, customers are noticing > and calling in. This seems to be yet another gluster failure that I haven't > been able to pin down. > > I posted about this a while ago, but didn't get anywhere (no replies that I > found). The problem started out as a glusterfsd process consuming large > amounts of ram (up to the point where ram and swap were exhausted and the > kernel OOM killer killed off the glusterfsd process). For reasons not clear > to me at this time, that resulted in any VMs running on that host and that > gluster volume to be paused with I/O error (the glusterfs process is usually > unharmed; why it didn't continue I/O with other servers is confusing to me). > > I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and > data-hdd). The first 3 are replica 2+arb; the 4th (data-hdd) is replica 3. > The first 3 are backed by an LVM partition (some thin provisioned) on an SSD; > the 4th is on a seagate hybrid disk (hdd + some internal flash for > acceleration). data-hdd is the only thing on the disk. Servers are Dell > R610 with the PERC/6i raid card, with the disks individually passed through > to the OS (no raid enabled). > > The above RAM usage issue came from the data-hdd volume. Yesterday, I cought > one of the glusterfsd high ram usage before the OOM-Killer had to run. I was > able to migrate the VMs off the machine and for good measure, reboot the > entire machine (after taking this opportunity to run the software updates > that ovirt said were pending). Upon booting back up, the necessary volume > healing began. However, this time, the healing caused all three servers to > go to very, very high load averages (I saw just under 200 on one server; > typically they've been 40-70) with top reporting IO Wait at 7-20%. Network > for this volume is a dedicated gig network. According to bwm-ng, initially > the network bandwidth would hit 50MB/s (yes, bytes), but tailed off to mostly > in the kB/s for a while. All machines' load averages were still 40+ and > gluster volume heal data-hdd info reported 5 items needing healing. Server's > were intermittently experiencing IO issues, even on the 3 gluster volumes > that appeared largely unaffected. Even the OS activities on the hosts itself > (logging in, running commands) would often be very delayed. The ovirt engine > was seemingly randomly throwing engine down / engine up / engine failed > notifications. Responsiveness on ANY VM was horrific most of the time, with > random VMs being inaccessible. > > I let the gluster heal run overnight. By morning, there were still 5 items > needing healing, all three servers were still experiencing high load, and > servers were still largely unstable. > > I've noticed that all of my ovirt outages (and I've had a lot, way more than > is acceptable for a production cluster) have come from gluster. I still have > 3 VMs who's hard disk images have become corrupted by my last gluster crash > that I haven't had time to repair / rebuild yet (I believe this crash was > caused by the OOM issue previously mentioned, but I didn't know it at the > time). > > Is gluster really ready for production yet? It seems so unstable to me.... > I'm looking at replacing gluster with a dedicated NFS server likely FreeNAS. > Any suggestions? What is the "right" way to do production storage on this (3 > node cluster)? Can I get this gluster volume stable enough to get my VMs to > run reliably again until I can deploy another storage solution? > > --Jim > _______________________________________________ > Users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/YQX3LQFQQPW4JTCB7B6FY2LLR6NA2CB3/ _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/4N2M35IHEWNTQA4MBGWYGU5BK2I4RPTK/

