What version of ovirt and gluster? Sounds like something I just saw with
gluster 3.12.x, are you using libgfapi or just fuse mounts?
> From: Sahina Bose <sab...@redhat.com>
> Subject: Re: [ovirt-users] gluster self-heal takes cluster offline
> Date: March 23, 2018 at 1:26:01 AM CDT
> To: Jim Kusznir
> Cc: Ravishankar Narayanankutty; users
>
>
>
> On Fri, Mar 16, 2018 at 2:45 AM, Jim Kusznir <j...@palousetech.com
> <mailto:j...@palousetech.com>> wrote:
> Hi all:
>
> I'm trying to understand why/how (and most importantly, how to fix) an
> substantial issue I had last night. This happened one other time, but I
> didn't know/understand all the parts associated with it until last night.
>
> I have a 3 node hyperconverged (self-hosted engine, Gluster on each node)
> cluster. Gluster is Replica 2 + arbitrar. Current network configuration is
> 2x GigE on load balance ("LAG Group" on switch), plus one GigE from each
> server on a separate vlan, intended for Gluster (but not used). Server
> hardware is Dell R610's, each server as an SSD in it. Server 1 and 2 have
> the full replica, server 3 is the arbitrar.
>
> I put server 2 into maintence so I can work on the hardware, including turn
> it off and such. In the course of the work, I found that I needed to
> reconfigure the SSD's partitioning somewhat, and it resulted in wiping the
> data partition (storing VM images). I figure, its no big deal, gluster will
> rebuild that in short order. I did take care of the extended attr settings
> and the like, and when I booted it up, gluster came up as expected and began
> rebuilding the disk.
>
> How big was the data on this partition? What was the shard size set on the
> gluster volume?
> Out of curiosity, how long did it take to heal and come back to operational?
>
>
> The problem is that suddenly my entire cluster got very sluggish. The entine
> was marking nodes and VMs failed and unfaling them throughout the system,
> fairly randomly. It didn't matter what node the engine or VM was on. At one
> point, it power cycled server 1 for "non-responsive" (even though everything
> was running on it, and the gluster rebuild was working on it). As a result
> of this, about 6 VMs were killed and my entire gluster system went down hard
> (suspending all remaining VMs and the engine), as there were no remaining
> full copies of the data. After several minutes (these are Dell servers,
> after all...), server 1 came back up, and gluster resumed the rebuild, and
> came online on the cluster. I had to manually (virtsh command) unpause the
> engine, and then struggle through trying to get critical VMs back up.
> Everything was super slow, and load averages on the servers were often seen
> in excess of 80 (these are 8 core / 16 thread boxes). Actual CPU usage
> (reported by top) was rarely above 40% (inclusive of all CPUs) for any one
> server. Glusterfs was often seen using 180%-350% of a CPU on server 1 and 2.
>
> I ended up putting the cluster in global HA maintence mode and disabling
> power fencing on the nodes until the process finished. It appeared on at
> least two occasions a functional node was marked bad and had the fencing not
> been disabled, a node would have rebooted, just further exacerbating the
> problem.
>
> Its clear that the gluster rebuild overloaded things and caused the problem.
> I don't know why the load was so high (even IOWait was low), but load
> averages were definately tied to the glusterfs cpu utilization %. At no
> point did I have any problems pinging any machine (host or VM) unless the
> engine decided it was dead and killed it.
>
> Why did my system bite it so hard with the rebuild? I baby'ed it along until
> the rebuild was complete, after which it returned to normal operation.
>
> As of this event, all networking (host/engine management, gluster, and VM
> network) were on the same vlan. I'd love to move things off, but so far any
> attempt to do so breaks my cluster. How can I move my management interfaces
> to a separate VLAN/IP Space? I also want to move Gluster to its own private
> space, but it seems if I change anything in the peers file, the entire
> gluster cluster goes down. The dedicated gluster network is listed as a
> secondary hostname for all peers already.
>
> Will the above network reconfigurations be enough? I got the impression that
> the issue may not have been purely network based, but possibly server IO
> overload. Is this likely / right?
>
> I appreciate input. I don't think gluster's recovery is supposed to do as
> much damage as it did the last two or three times any healing was required.
>
> Thanks!