Is storage working as it should? Does the gluster mount point respond as
it should? Can you write files to it? Does the physical drives say that
they are ok? Can you write (you shouldn't bypass gluster mount point but
you need to test the drives) to the physical drives?
For me this sounds
At the moment, it is responding like I would expect. I do know I have one
failed drive on one brick (hardware failure, OS removed drive completely;
the underlying /dev/sdb is gone). I have a new disk on order (overnight),
but that is also one brick of one volume that is replica 3, so I would
On Wed, May 30, 2018 at 10:42 AM, Jim Kusznir wrote:
> hosted-engine --deploy failed (would not come up on my existing gluster
> storage). However, I realized no changes were written to my existing
> storage. So, I went back to trying to get my old engine running.
>
> hosted-engine --vm-status
hosted-engine --deploy failed (would not come up on my existing gluster
storage). However, I realized no changes were written to my existing
storage. So, I went back to trying to get my old engine running.
hosted-engine --vm-status is now taking a very long time (5+minutes) to
return, and it
Well, things went from bad to very, very bad
It appears that during one of the 2 minute lockups, the fencing agents
decided that another node in the cluster was down. As a result, 2 of the 3
nodes were simultaneously reset with fencing agent reboot. After the nodes
came back up, the engine
Adding Ravi to look into the heal issue.
As for the fsync hang and subsequent IO errors, it seems a lot like
https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo Bonzini from
qemu had pointed out that this would be fixed by the following commit:
commit
I also finally found the following in my system log on one server:
[10679.524491] INFO: task glusterclogro:14933 blocked for more than 120
seconds.
[10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[10679.527144] glusterclogro D 97209832bf40 0
I think this is the profile information for one of the volumes that lives
on the SSDs and is fully operational with no down/problem disks:
[root@ovirt2 yum.repos.d]# gluster volume profile data info
Brick: ovirt2.nwfiber.com:/gluster/brick2/data
--
Thank you for your response.
I have 4 gluster volumes. 3 are replica 2 + arbitrator. replica bricks
are on ovirt1 and ovirt2, arbitrator on ovirt3. The 4th volume is replica
3, with a brick on all three ovirt machines.
The first 3 volumes are on an SSD disk; the 4th is on a Seagate SSHD (same
Due to the cluster spiraling downward and increasing customer complaints, I
went ahead and finished the upgrade of the nodes to ovirt 4.2 and gluster
3.12. It didn't seem to help at all.
I DO have one brick down on ONE of my 4 gluster
filesystems/exports/whatever. The other 3 are fully
I would check disks status and accessibility of mount points where your
gluster volumes reside.
On Tue, May 29, 2018, 22:28 Jim Kusznir wrote:
> On one ovirt server, I'm now seeing these messages:
> [56474.239725] blk_update_request: 63 callbacks suppressed
> [56474.239732] blk_update_request:
On one ovirt server, I'm now seeing these messages:
[56474.239725] blk_update_request: 63 callbacks suppressed
[56474.239732] blk_update_request: I/O error, dev dm-2, sector 0
[56474.240602] blk_update_request: I/O error, dev dm-2, sector 3905945472
[56474.241346] blk_update_request: I/O error,
I see in messages on ovirt3 (my 3rd machine, the one upgraded to 4.2):
May 29 11:54:41 ovirt3 ovs-vsctl:
ovs|1|db_ctl_base|ERR|unix:/var/run/openvswitch/db.sock: database
connection failed (No such file or directory)
May 29 11:54:51 ovirt3 ovs-vsctl:
Do you see errors reported in the mount logs for the volume? If so, could
you attach the logs?
Any issues with your underlying disks. Can you also attach output of volume
profiling?
On Wed, May 30, 2018 at 12:13 AM, Jim Kusznir wrote:
> Ok, things have gotten MUCH worse this morning. I'm
Ok, things have gotten MUCH worse this morning. I'm getting random errors
from VMs, right now, about a third of my VMs have been paused due to
storage issues, and most of the remaining VMs are not performing well.
At this point, I am in full EMERGENCY mode, as my production services are
now
[Adding gluster-users to look at the heal issue]
On Tue, May 29, 2018 at 9:17 AM, Jim Kusznir wrote:
> Hello:
>
> I've been having some cluster and gluster performance issues lately. I
> also found that my cluster was out of date, and was trying to apply updates
> (hoping to fix some of
16 matches
Mail list logo