[ovirt-users] Gluster volumes not healing (perhaps after host maintenance?)

David White via Users Sat, 24 Apr 2021 04:37:49 -0700

I discovered that the servers I purchased did not come with 10Gbps network 
cards, like I thought they did. So my storage network has been running on a 
1Gbps connection for the past week, since I deployed the servers into the 
datacenter a little over a week ago. I purchased 10Gbps cards, and put one of 
my hosts into maintenance mode yesterday, prior to replacing the daughter card. 
It is now back online running fine on the 10Gbps card.


All VMs seem to be working, even when I migrate them onto cha2, which is the 
host I did maintenance on yesterday morning.
The other two hosts are still running on the 1Gbps connection, but I plan to do 
maintenance on them next week.

The oVirt manager shows that all 3 hosts are up, and that all of my volumes - 
and all of my bricks - are up. However, every time I look at the storage, it 
appears that the self-heal info for 1 of the volumes is 10 minutes, and the 
self-heal info for another volume is 50+ minutes.

This morning is the first time in the last couple of days that I've paid close 
attention to the numbers, but I don't see them going down.

When I log into each of the hosts, I do see everything is connected in gluster.
It is interesting to me, in this particular case, though that gluster on cha3 
notices the hostname of 10.1.0.10 to be the IP address, and not the hostname 
(cha1).
The host that I did the maintenance on is cha2.

[root@cha3-storage dwhite]# gluster peer statusNumber of Peers: 2Hostname: 
10.1.0.10Uuid: 87a4f344-321a-48b9-adfb-e3d2b56b8e7bState: Peer in Cluster 
(Connected)Hostname: cha2-storage.mgt.barredowlweb.comUuid: 
93e12dee-c37d-43aa-a9e9-f4740b9cab14State: Peer in Cluster (Connected)

When I run `gluster volume heal data`, I see the following:
[root@cha3-storage dwhite]# gluster volume heal data
Launching heal operation to perform index self heal on volume data has been 
unsuccessful:
Commit failed on cha2-storage.mgt.barredowlweb.com. Please check log file for 
details.

I get the same results if I run the command on cha2, for any volume:
[root@cha2-storage dwhite]# gluster volume heal data
Launching heal operation to perform index self heal on volume data has been 
unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file 
for details.
[root@cha2-storage dwhite]# gluster volume heal vmstore
Launching heal operation to perform index self heal on volume vmstore has been 
unsuccessful:
Glusterd Syncop Mgmt brick op 'Heal' failed. Please check glustershd log file 
for details.

I see a lot of stuff like this on cha2 /var/log/glusterfs/glustershd.log:
[2021-04-24 11:33:01.319888] I [rpc-clnt.c:1975:rpc_clnt_reconfig] 
2-engine-client-0: changing port to 49153 (from 0)[2021-04-24 11:33:01.329463] 
I [MSGID: 114057] [client-handshake.c:1128:select_server_supported_programs] 
2-engine-client-0: Using Program [{Program-name=GlusterFS 4.x v1}, 
{Num=1298437}, {Version=400}][2021-04-24 11:33:01.330075] W [MSGID: 114043] 
[client-handshake.c:727:client_setvolume_cbk] 2-engine-client-0: failed to set 
the volume [{errno=2}, {error=No such file or directory}][2021-04-24 
11:33:01.330116] W [MSGID: 114007] 
[client-handshake.c:752:client_setvolume_cbk] 2-engine-client-0: failed to get 
from reply dict [{process-uuid}, {errno=22}, {error=Invalid 
argument}][2021-04-24 11:33:01.330140] E [MSGID: 114044] 
[client-handshake.c:757:client_setvolume_cbk] 2-engine-client-0: SETVOLUME on 
remote-host failed [{remote-error=Brick not found}, {errno=2}, {error=No such 
file or directory}][2021-04-24 11:33:01.330155] I [MSGID: 114051] 
[client-handshake.c:879:client_setvolume_cbk] 2-engine-client-0: sending 
CHILD_CONNECTING event [][2021-04-24 11:33:01.640480] I 
[rpc-clnt.c:1975:rpc_clnt_reconfig] 3-vmstore-client-0: changing port to 49154 
(from 0)The message "W [MSGID: 114007] 
[client-handshake.c:752:client_setvolume_cbk] 3-vmstore-client-0: failed to get 
from reply dict [{process-uuid}, {errno=22}, {error=Invalid argument}]" 
repeated 4 times between [2021-04-24 11:32:49.602164] and [2021-04-24 
11:33:01.649850][2021-04-24 11:33:01.649867] E [MSGID: 114044] 
[client-handshake.c:757:client_setvolume_cbk] 3-vmstore-client-0: SETVOLUME on 
remote-host failed [{remote-error=Brick not found}, {errno=2}, {error=No such 
file or directory}][2021-04-24 11:33:01.649969] I [MSGID: 114051] 
[client-handshake.c:879:client_setvolume_cbk] 3-vmstore-client-0: sending 
CHILD_CONNECTING event [][2021-04-24 11:33:01.650095] I [MSGID: 114018] 
[client.c:2225:client_rpc_notify] 3-vmstore-client-0: disconnected from client, 
process will keep trying to connect glusterd until brick's port is available 
[{conn-name=vmstore-client-0}]

How do I further troubleshoot?

Sent with ProtonMail Secure Email.

publickey - [email protected] - 0x320CD582.asc
Description: application/pgp-keys

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/B6LFTYDHVKPGUEHEQEC74F55FQZTYID4/

[ovirt-users] Gluster volumes not healing (perhaps after host maintenance?)

Reply via email to