Hello everyone, Any help would be greatly appreciated in the following problem.
In my lab, the day before yesterday, we had power issues, with a UPS going off-line and following the power outage of the NFS/DNS server I have set up to serve ovirt with isos and as a DNS server (our other DNS servers are located as VMs within the oVirt environment). We found a broadcast storm on the switch (due to a faulty NIC on the aformentioned UPS) that the ovirt nodes are connected and later on had to re-establish several of the virtual connections as well. The above led to one of the hosts becoming NonResponsive, two machines becoming unresponsive and three VMs shuting down. The oVirt environment, version 4.3.5.2, is a replica 2 + arbiter 1 environment and runs GlusterFS with the recommended volumes of data, engine and vmstore. So far, the times there was some kind of a problem, usually oVirt was able to solve it by its own. This time, however, after we recovered from the above state, the volumes of data and vmstore successfully healing , the volume engine became stuck to the healing process (Up, unsynched entries, needs healing), and from the web GUI I see that the VM HostedEngine is paused due to a storage I/O error while the output of virsh list --all command shows that the HostedEngine is running.. How is that happening? I tried to manually trigger the healing process for the volume but nothing with gluster volume heal engine The command gluster volume heal engine info shows the following [root@ov-no3 ~]# gluster volume heal engine info Brick ov-no1.ariadne-t.local:/gluster_bricks/engine/engine Status: Connected Number of entries: 0 Brick ov-no2.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 Brick ov-no3.ariadne-t.local:/gluster_bricks/engine/engine /80f6e393-9718-4738-a14a-64cf43c3d8c2/images/d5de54b6-9f8e-4fba-819b-ebf6780757d2/a48555f4-be23-4467-8a54-400ae7baf9d7 Status: Connected Number of entries: 1 This morning I came upon this Reddit post https://www.reddit.com/r/gluster/comments/fl3yb7/entries_stuck_in_heal_pending/ where it seems that after a graceful reboot one of the ovirt hosts, the gluster came back online after it completed the appropriate healing processes. The thing is from what I have read that when there are unsynched entries in the gluster a host cannot be put into maintenance mode so that it can be rebooted, correct? Should I try to restart the glusterd service. Could someone tell me what I should do? Thank you all for your time and help, Maria Souvalioti _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/BSOF7BXAMVJ4IMYUEB3OBU4T64FGYA2J/