Hello everyone,

Any help would be greatly appreciated in the following problem.

In my lab, the day before yesterday, we had power issues, with a UPS going 
off-line and following the power outage of the NFS/DNS server I have set up to 
serve ovirt with isos and as a DNS server (our other DNS servers are located as 
VMs within the oVirt environment). We found a broadcast storm on the switch 
(due to a faulty NIC on the aformentioned UPS) that the ovirt nodes are 
connected and later on had to re-establish several of the virtual connections 
as well. The above led to one of the hosts becoming NonResponsive, two machines 
becoming unresponsive and three VMs shuting down. 

The oVirt environment, version 4.3.5.2, is a replica 2 + arbiter 1 environment 
and runs GlusterFS with the recommended volumes of data, engine and vmstore.

So far, the times there was some kind of a problem, usually oVirt was able to 
solve it by its own.

This time, however, after we recovered from the above state, the volumes of 
data and vmstore successfully healing , the volume engine became stuck to the 
healing process (Up, unsynched entries, needs healing), and from the web GUI I 
see that the VM HostedEngine is paused due to a storage I/O error while the 
output of virsh list --all command shows that the HostedEngine is running.. How 
is that happening?

I tried to manually trigger the healing process for the volume but nothing with 
gluster volume heal engine

The command 
gluster volume heal engine info 
shows the following 

[root@ov-no3 ~]# gluster volume heal engine info
Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine
Status: Connected
Number of entries: 0

Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
 
Status: Connected
Number of entries: 1

Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine
/80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7
 
Status: Connected
Number of entries: 1

This morning I came upon this Reddit post 
https://www.reddit.com/r/gluster/comments/fl3yb7/entries_stuck_in_heal_pending/ 
where it seems that after a graceful reboot one of the ovirt hosts, the gluster 
came back online after it completed the appropriate healing processes. The 
thing is from what I have read that when there are unsynched entries in the 
gluster a host cannot be put into maintenance mode so that it can be rebooted, 
correct?

Should I try to restart the glusterd service.

Could someone tell me what I should do?

Thank you all for your time and help,
Maria Souvalioti
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BSOF7BXAMVJ4IMYUEB3OBU4T64FGYA2J/

Reply via email to