Re: [Gluster-users] Issues with glustershd with release 8.4 and 9.1

2021-05-18 Thread Ravishankar N
On Mon, May 17, 2021 at 4:22 PM Marco Fais  wrote:

> Hi,
>
> I am having significant issues with glustershd with releases 8.4 and 9.1.
>
> My oVirt clusters are using gluster storage backends, and were running
> fine with Gluster 7.x (shipped with earlier versions of oVirt Node 4.4.x).
> Recently the oVirt project moved to Gluster 8.4 for the nodes, and hence I
> have moved to this release when upgrading my clusters.
>
> Since then I am having issues whenever one of the nodes is brought down;
> when the nodes come back up online the bricks are typically back up and
> working, but some (random) glustershd processes in the various nodes seem
> to have issues connecting to some of them.
>
>
When the issue happens, can you check if the TCP port number of the brick
(glusterfsd) processes displayed in `gluster volume status` matches with
that of the actual port numbers observed (i.e. the --brick-port argument)
when you run `ps aux | grep glusterfsd` ? If they don't match, then
glusterd has incorrect brick port information in its memory and serving it
to glustershd. Restarting glusterd instead of (killing the bricks + `volume
start force`) should fix it, although we need to find why glusterd serves
incorrect port numbers.

If they do match, then can you take a statedump of glustershd to check that
it is indeed disconnected from the bricks? You will need to verify that
'connected=1' in the statedump. See "Self-heal is stuck/ not getting
completed." section in
https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-afr/.
Statedump can be taken by `kill -SIGUSR1 $pid-of-glustershd`. It will be
generated in the /var/run/gluster/ directory.

Regards,
Ravi




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Messy rpm upgrade

2021-05-18 Thread Zenon Panoussis


This morning I found my gluster volume broken. Whatever
'gluster volume x gv0' commands I tried, they timed out.
The logs were not very helpful. Restarting gluster on all
nodes did not help. Actually nothing helped, and I didn't
even know what to look for where. The volume spans over
three nodes of one brick each, running Centos 7 and 8.

Eventually I realised that the 9.2 rpms were released last
night, and yum cron had upgraded my 9.1 to 9.2 on two of
the nodes. The third one was still running 9.1. I stopped
gluster on all of them, downgraded the two nodes back to
9.1, and the problem was solved; the volume came back up
just fine.

Subsequently I stopped gluster on all nodes, upgraded all
of them to 9.2, and restarted; the volume came back up just
fine again.

Conclusion: something in gluster doesn't like version
mismatches. On systems that run automatic updates, gluster
should be excluded and only be upgraded manually at the same
time across the entire cluster.

One question remains: how can a cluster be upgraded without
taking it down? Stopping, upgrading, and restarting one node
at a time doesn't seem to work.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users