I’ve also encounter something similar on my setup, ovirt 3.1.9 with a gluster 
3.12.3 storage cluster. All the storage domains in question are setup as 
gluster volumes & sharded, and I’ve enabled libgfapi support in the engine. 
It’s happened primarily to VMs that haven’t been restarted to switch to gfapi 
yet (still have fuse mounts for these), but one or two VMs that have been 
switched to gfapi mounts as well.

I started updating the storage cluster to gluster 3.12.6 yesterday and got more 
annoying/bad behavior as well. Many VMs that were “high disk use” VMs 
experienced hangs, but not as storage related pauses. Instead, they hang and 
their watchdogs eventually reported CPU hangs. All did eventually resume normal 
operation, but it was annoying, to be sure. The Ovirt Engine also lost contact 
with all of my VMs (unknown status, ? in GUI), even though it still had contact 
with the hosts. My gluster cluster reported no errors, volume status was 
normal, and all peers and bricks were connected. Didn’t see anything in the 
gluster logs that indicated problems, but there were reports of failed heals 
that eventually went away. 

Seems like something in vdsm and/or libgfapi isn’t handling the gfapi mounts 
well during healing and the related locks, but I can’t tell what it is. I’ve 
got two more servers in the cluster to upgrade to 3.12.6 yet, and I’ll keep an 
eye on more logs while I’m doing it, will report on it after I get more info.

  -Darrell
> From: Sahina Bose <sab...@redhat.com>
> Subject: Re: [ovirt-users] Ovirt vm's paused due to storage error
> Date: March 22, 2018 at 4:56:13 AM CDT
> To: Endre Karlson
> Cc: users
> 
> Can you provide "gluster volume info" and  the mount logs of the data volume 
> (I assume that this hosts the vdisks for the VM's with storage error).
> 
> Also vdsm.log at the corresponding time.
> 
> On Fri, Mar 16, 2018 at 3:45 AM, Endre Karlson <endre.karl...@gmail.com 
> <mailto:endre.karl...@gmail.com>> wrote:
> Hi, this is is here again and we are getting several vm's going into storage 
> error in our 4 node cluster running on centos 7.4 with gluster and ovirt 
> 4.2.1.
> 
> Gluster version: 3.12.6
> 
> volume status
> [root@ovirt3 ~]# gluster volume status
> Status of volume: data
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick ovirt0:/gluster/brick3/data           49152     0          Y       9102 
> Brick ovirt2:/gluster/brick3/data           49152     0          Y       28063
> Brick ovirt3:/gluster/brick3/data           49152     0          Y       28379
> Brick ovirt0:/gluster/brick4/data           49153     0          Y       9111 
> Brick ovirt2:/gluster/brick4/data           49153     0          Y       28069
> Brick ovirt3:/gluster/brick4/data           49153     0          Y       28388
> Brick ovirt0:/gluster/brick5/data           49154     0          Y       9120 
> Brick ovirt2:/gluster/brick5/data           49154     0          Y       28075
> Brick ovirt3:/gluster/brick5/data           49154     0          Y       28397
> Brick ovirt0:/gluster/brick6/data           49155     0          Y       9129 
> Brick ovirt2:/gluster/brick6_1/data         49155     0          Y       28081
> Brick ovirt3:/gluster/brick6/data           49155     0          Y       28404
> Brick ovirt0:/gluster/brick7/data           49156     0          Y       9138 
> Brick ovirt2:/gluster/brick7/data           49156     0          Y       28089
> Brick ovirt3:/gluster/brick7/data           49156     0          Y       28411
> Brick ovirt0:/gluster/brick8/data           49157     0          Y       9145 
> Brick ovirt2:/gluster/brick8/data           49157     0          Y       28095
> Brick ovirt3:/gluster/brick8/data           49157     0          Y       28418
> Brick ovirt1:/gluster/brick3/data           49152     0          Y       23139
> Brick ovirt1:/gluster/brick4/data           49153     0          Y       23145
> Brick ovirt1:/gluster/brick5/data           49154     0          Y       23152
> Brick ovirt1:/gluster/brick6/data           49155     0          Y       23159
> Brick ovirt1:/gluster/brick7/data           49156     0          Y       23166
> Brick ovirt1:/gluster/brick8/data           49157     0          Y       23173
> Self-heal Daemon on localhost               N/A       N/A        Y       7757 
> Bitrot Daemon on localhost                  N/A       N/A        Y       7766 
> Scrubber Daemon on localhost                N/A       N/A        Y       7785 
> Self-heal Daemon on ovirt2                  N/A       N/A        Y       8205 
> Bitrot Daemon on ovirt2                     N/A       N/A        Y       8216 
> Scrubber Daemon on ovirt2                   N/A       N/A        Y       8227 
> Self-heal Daemon on ovirt0                  N/A       N/A        Y       32665
> Bitrot Daemon on ovirt0                     N/A       N/A        Y       32674
> Scrubber Daemon on ovirt0                   N/A       N/A        Y       32712
> Self-heal Daemon on ovirt1                  N/A       N/A        Y       31759
> Bitrot Daemon on ovirt1                     N/A       N/A        Y       31768
> Scrubber Daemon on ovirt1                   N/A       N/A        Y       31790
>  
> Task Status of Volume data
> ------------------------------------------------------------------------------
> Task                 : Rebalance           
> ID                   : 62942ba3-db9e-4604-aa03-4970767f4d67
> Status               : completed           
>  
> Status of volume: engine
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick ovirt0:/gluster/brick1/engine         49158     0          Y       9155 
> Brick ovirt2:/gluster/brick1/engine         49158     0          Y       28107
> Brick ovirt3:/gluster/brick1/engine         49158     0          Y       28427
> Self-heal Daemon on localhost               N/A       N/A        Y       7757 
> Self-heal Daemon on ovirt1                  N/A       N/A        Y       31759
> Self-heal Daemon on ovirt0                  N/A       N/A        Y       32665
> Self-heal Daemon on ovirt2                  N/A       N/A        Y       8205 
>  
> Task Status of Volume engine
> ------------------------------------------------------------------------------
> There are no active volume tasks
>  
> Status of volume: iso
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick ovirt0:/gluster/brick2/iso            49159     0          Y       9164 
> Brick ovirt2:/gluster/brick2/iso            49159     0          Y       28116
> Brick ovirt3:/gluster/brick2/iso            49159     0          Y       28436
> NFS Server on localhost                     2049      0          Y       7746 
> Self-heal Daemon on localhost               N/A       N/A        Y       7757 
> NFS Server on ovirt1                        2049      0          Y       31748
> Self-heal Daemon on ovirt1                  N/A       N/A        Y       31759
> NFS Server on ovirt0                        2049      0          Y       32656
> Self-heal Daemon on ovirt0                  N/A       N/A        Y       32665
> NFS Server on ovirt2                        2049      0          Y       8194 
> Self-heal Daemon on ovirt2                  N/A       N/A        Y       8205 
>  
> Task Status of Volume iso
> ------------------------------------------------------------------------------
> There are no active volume tasks
> 
> 
> _______________________________________________
> Users mailing list
> Users@ovirt.org <mailto:Users@ovirt.org>
> http://lists.ovirt.org/mailman/listinfo/users 
> <http://lists.ovirt.org/mailman/listinfo/users>
> 
> 
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to