[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-30 Thread Johan Bernhardsson
Is storage working as it should? Does the gluster mount point respond as it should? Can you write files to it? Does the physical drives say that they are ok? Can you write (you shouldn't bypass gluster mount point but you need to test the drives) to the physical drives? For me this sounds

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-30 Thread Jim Kusznir
At the moment, it is responding like I would expect. I do know I have one failed drive on one brick (hardware failure, OS removed drive completely; the underlying /dev/sdb is gone). I have a new disk on order (overnight), but that is also one brick of one volume that is replica 3, so I would

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-30 Thread Sahina Bose
On Wed, May 30, 2018 at 10:42 AM, Jim Kusznir wrote: > hosted-engine --deploy failed (would not come up on my existing gluster > storage). However, I realized no changes were written to my existing > storage. So, I went back to trying to get my old engine running. > > hosted-engine --vm-status

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-30 Thread Jim Kusznir
hosted-engine --deploy failed (would not come up on my existing gluster storage). However, I realized no changes were written to my existing storage. So, I went back to trying to get my old engine running. hosted-engine --vm-status is now taking a very long time (5+minutes) to return, and it

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
Well, things went from bad to very, very bad It appears that during one of the 2 minute lockups, the fencing agents decided that another node in the cluster was down. As a result, 2 of the 3 nodes were simultaneously reset with fencing agent reboot. After the nodes came back up, the engine

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Krutika Dhananjay
Adding Ravi to look into the heal issue. As for the fsync hang and subsequent IO errors, it seems a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo Bonzini from qemu had pointed out that this would be fixed by the following commit: commit

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
I also finally found the following in my system log on one server: [10679.524491] INFO: task glusterclogro:14933 blocked for more than 120 seconds. [10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10679.527144] glusterclogro D 97209832bf40 0

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
I think this is the profile information for one of the volumes that lives on the SSDs and is fully operational with no down/problem disks: [root@ovirt2 yum.repos.d]# gluster volume profile data info Brick: ovirt2.nwfiber.com:/gluster/brick2/data --

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
Thank you for your response. I have 4 gluster volumes. 3 are replica 2 + arbitrator. replica bricks are on ovirt1 and ovirt2, arbitrator on ovirt3. The 4th volume is replica 3, with a brick on all three ovirt machines. The first 3 volumes are on an SSD disk; the 4th is on a Seagate SSHD (same

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
Due to the cluster spiraling downward and increasing customer complaints, I went ahead and finished the upgrade of the nodes to ovirt 4.2 and gluster 3.12. It didn't seem to help at all. I DO have one brick down on ONE of my 4 gluster filesystems/exports/whatever. The other 3 are fully

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Alex K
I would check disks status and accessibility of mount points where your gluster volumes reside. On Tue, May 29, 2018, 22:28 Jim Kusznir wrote: > On one ovirt server, I'm now seeing these messages: > [56474.239725] blk_update_request: 63 callbacks suppressed > [56474.239732] blk_update_request:

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
On one ovirt server, I'm now seeing these messages: [56474.239725] blk_update_request: 63 callbacks suppressed [56474.239732] blk_update_request: I/O error, dev dm-2, sector 0 [56474.240602] blk_update_request: I/O error, dev dm-2, sector 3905945472 [56474.241346] blk_update_request: I/O error,

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
I see in messages on ovirt3 (my 3rd machine, the one upgraded to 4.2): May 29 11:54:41 ovirt3 ovs-vsctl: ovs|1|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory) May 29 11:54:51 ovirt3 ovs-vsctl:

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Sahina Bose
Do you see errors reported in the mount logs for the volume? If so, could you attach the logs? Any issues with your underlying disks. Can you also attach output of volume profiling? On Wed, May 30, 2018 at 12:13 AM, Jim Kusznir wrote: > Ok, things have gotten MUCH worse this morning. I'm

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Jim Kusznir
Ok, things have gotten MUCH worse this morning. I'm getting random errors from VMs, right now, about a third of my VMs have been paused due to storage issues, and most of the remaining VMs are not performing well. At this point, I am in full EMERGENCY mode, as my production services are now

[ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-28 Thread Sahina Bose
[Adding gluster-users to look at the heal issue] On Tue, May 29, 2018 at 9:17 AM, Jim Kusznir wrote: > Hello: > > I've been having some cluster and gluster performance issues lately. I > also found that my cluster was out of date, and was trying to apply updates > (hoping to fix some of