That is not normal.Which version are you using ?
Can you provide the output from all bricks (including the arbiter):getfattr -d
-m . -e hex /BRICK/PATH/TO/output_21
Troubleshooting and restoring the files should be your secondary tasks, so you
should focus on stabilizing the cluster.
First,
My volume is replica 3 arbiter 1, maybe that makes a difference?
Bricks processes tend to die quite often (I have to restart glusterd at
least once a day because "gluster v info | grep ' N '" reports at least
one missing brick; sometimes even if all bricks are reported up I have
to kill all
The 2-nd error indicates conflicts between the nodes. The only way that could
happen on replica 3 is gfid conflict (file/dir was renamed or recreated).
Are you sure that all bricks are online? Usually 'Transport endpoint is not
connected' indicates a brick down situation.
First start with all
The contents do not match exactly, but the only difference is the
"option shared-brick-count" line that sometimes is 0 and sometimes 1.
The command you gave could be useful for the files that still needs
healing with the source still present, but the files related to the
stale gfids have been
I'm not sure if the md5sum has to match , but at least the content should do.
In modern versions of GlusterFS the client side healing is disabled , but it's
worth trying.
You will need to enable cluster.metadata-self-heal, cluster.data-self-heal and
cluster.entry-self-heal and then create a
Ops... Reincluding the list that got excluded in my previous answer :(
I generated md5sums of all files in vols/ on clustor02 and compared to
the other nodes (clustor00 and clustor01).
There are differences in volfiles (shouldn't it always be 1, since every
data brick is on its own fs? quorum
Any issues reported in /var/log/glusterfs/glfsheal-*.log ?
The easiest way to identify the affected entries is to run:
find /FULL/PATH/TO/BRICK/ -samefile
/FULL/PATH/TO/BRICK/.glusterfs/57/e4/57e428c7-6bed-4eb3-b9bd-02ca4c46657a
Best Regards,
Strahil Nikolov
В вторник, 31 януари 2023
Hello all.
I've had one of the 3 nodes serving a "replica 3 arbiter 1" down for
some days (apparently RAM issues, but actually failing mobo).
The other nodes have had some issues (RAM exhaustion, old problem
already ticketed but still no solution) and some brick processes
coredumped.