Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2

2022-08-31 Thread Ilias Chasapakis forumZFD

Hi all,

so we went further and deleted the entries (data and gfid). The split 
brain is now gone, but when we triggered a heal again (simple and full) 
we have many entries stuck in healing (no split-brain items). They are 
there since days/weeks and still appearing.


We would like to heal single files but as they are not in split brain I 
guess this is not possible right? The "source-brick" technique works 
only in that case I think?


A concrete example of one of that files that are stuck in the healing 
queue: I checked the attributes with getfattr and saw that one of the 
nodes does not have nor the data or the gfid. Missing completely. How 
could I trigger a replication from the "good copy" to the gluster node 
that does not have the file? Is it possible for entries *not* in split 
brain? Doing a listing on the mount side (ls) of the affected directory 
did not seem to trigger a heal.


Also the shd logs have some ambiguous (for me) entries. The sink value 
is empty, shouldn´t it be a number indicating it is healing?


[2022-08-28 17:22:11.098604 +] I [MSGID: 108026] 
[afr-self-heal-common.c:1742:afr_log_selfheal] 0-vol-replicate-0: 
Completed metadata selfheal on 94503c97-7731-4aa1-8a14-2c6ea5a84a15. 
sources=1 [2]  sinks=
[2022-08-28 17:22:16.227091 +] I [MSGID: 108026] 
[afr-self-heal-common.c:1742:afr_log_selfheal] 0-gv-ho-replicate-0: 
Completed metadata selfheal on 94503c97-7731-4aa1-8a14-2c6ea5a84a15. 
sources=1 [2]  sinks= 


I try to use the guide here:

https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-afr/#ii-self-heal-is-stuck-not-getting-completed

but find difficult to apply.

Do you have any suggestions on how to "unblock" these stuck entries and 
what is a methodic approach to troubleshooting this situation?


Finally I would like to ask if the risk of updating the glusters (we 
have pending updates now) would be too dangerous without previously 
fixing the unhealed entries. Our hope is that an update could eventually 
fix the problems.


Best regards.
Ilias


Am 18.08.22 um 23:38 schrieb Strahil Nikolov:
If you refer to 
//.glusterfs///gfid 
- it' s a hard link to the file on the brick.

Directories in the .glusterfs are just symbolic links.

Can you clarify what you are planing to delete ?

Best Regards,
Strahil Nikolov

On Wed, Aug 17, 2022 at 14:35, Ilias Chasapakis forumZFD
 wrote:

Hi Thomas,

Thanks again for your replies and patience :)

We have also offline backups of the files.

So, just to verify I understood this correctly, deletion of a
.glusterfs-gfid file doesn't inherently include the risk of the
loss of the complete brick, right?

I saw you already applied this for your purposes so it worked for
you... But just as a confirmation. Of course it is fully
understood that the operational risk is on our side.

It is just an "information-wise" question :)

Best regards
Ilias

Am 17.08.22 um 12:47 schrieb Thomas Bätzler:


Hello Ilias,

Please note that you can and should backup all of the file(s)
involved in the split-brain by accessing them over the brick root
instead of the gluster mount. That is also the reason why you’re
not in danger of a failure cascade wiping out our data.

Be careful when replacing bricks, though. You want that heal to
go in the right direction 

Mit freundlichen Grüßen,

i.A. Thomas Bätzler

-- 


BRINGE Informationstechnik GmbH

Zur Seeplatte 12

D-76228 Karlsruhe

Germany

Fon: +49 721 94246-0

Fon: +49 171 5438457

Fax: +49 721 94246-66

Web: http://www.bringe.de/ <http://www.bringe.de/>

Geschäftsführer: Dipl.-Ing. (FH) Martin Bringe

Ust.Id: DE812936645, HRB 108943 Mannheim

*Von:* Gluster-users 
<mailto:gluster-users-boun...@gluster.org> *Im Auftrag von *Ilias
Chasapakis forumZFD
*Gesendet:* Mittwoch, 17. August 2022 11:18
*An:* gluster-users@gluster.org <mailto:gluster-users@gluster.org>
    *Betreff:* Re: [Gluster-users] Directory in split brain does not
heal - Gfs 9.2

Thanks for the suggestions. My question is if the risk is
actually related to only losing the file/dir or actually creating
inconsistencies that span through the bricks and "break everything".
Of course we have to take action anyway for this not to spread
(as we already now have a second entry that developed an
"unhealable" directory split-brain) so it is just a question of
evaluation before acting.

Am 12.08.22 um 18:12 schrieb Thomas Bätzler:

Am 12.08.2022 um 17:12 schrieb Ilias Chasapakis forumZFD:

Dear fellow gluster users,

we are facing a problem with our replica 3 setup.
Glusterfs version is 9.2.

We have a problem with a directory that is in split-brain
and we cannot manage to heal with:

 

Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2

2022-08-17 Thread Ilias Chasapakis forumZFD

Hi Thomas,

Thanks again for your replies and patience :)

We have also offline backups of the files.

So, just to verify I understood this correctly, deletion of a 
.glusterfs-gfid file doesn't inherently include the risk of the loss of 
the complete brick, right?


I saw you already applied this for your purposes so it worked for you... 
But just as a confirmation. Of course it is fully understood that the 
operational risk is on our side.


It is just an "information-wise" question :)

Best regards
Ilias

Am 17.08.22 um 12:47 schrieb Thomas Bätzler:


Hello Ilias,

Please note that you can and should backup all of the file(s) involved 
in the split-brain by accessing them over the brick root instead of 
the gluster mount. That is also the reason why you’re not in danger of 
a failure cascade wiping out our data.


Be careful when replacing bricks, though. You want that heal to go in 
the right direction 


Mit freundlichen Grüßen,

i.A. Thomas Bätzler

--

BRINGE Informationstechnik GmbH

Zur Seeplatte 12

D-76228 Karlsruhe

Germany

Fon: +49 721 94246-0

Fon: +49 171 5438457

Fax: +49 721 94246-66

Web: http://www.bringe.de/ <http://www.bringe.de/>

Geschäftsführer: Dipl.-Ing. (FH) Martin Bringe

Ust.Id: DE812936645, HRB 108943 Mannheim

*Von:* Gluster-users  *Im Auftrag 
von *Ilias Chasapakis forumZFD

*Gesendet:* Mittwoch, 17. August 2022 11:18
*An:* gluster-users@gluster.org
*Betreff:* Re: [Gluster-users] Directory in split brain does not heal 
- Gfs 9.2


Thanks for the suggestions. My question is if the risk is actually 
related to only losing the file/dir or actually creating 
inconsistencies that span through the bricks and "break everything".
Of course we have to take action anyway for this not to spread (as we 
already now have a second entry that developed an "unhealable" 
directory split-brain) so it is just a question of evaluation before 
acting.


Am 12.08.22 um 18:12 schrieb Thomas Bätzler:

Am 12.08.2022 um 17:12 schrieb Ilias Chasapakis forumZFD:

Dear fellow gluster users,

we are facing a problem with our replica 3 setup. Glusterfs
version is 9.2.

We have a problem with a directory that is in split-brain and
we cannot manage to heal with:

gluster volume heal gfsVol split-brain latest-mtime /folder

The command throws the following error: "failed:Transport
endpoint is not connected."

So the split brain directory entry remains and and so the
whole healing process is not completing and other entries get
stuck.

I saw there is a python script available
https://github.com/joejulian/glusterfs-splitbrain Would that
be a good solution to try? To be honest we are a bit concerned
with deleting the gfid and the files from the brick manually
as it seems it can create inconsistencies and break things...
I can of course give you more information about our setup and
situation, but if you already have some tip, that would be
fantastic.

You could at least verify what's going on: Go to your brick roots
and list /folder from each. You have 3n bricks with n replica
sets. Find the replica set where you can spot a difference. It's
most likely a file or directory that's missing or different. If
it's a file, do a ls -ain on the file on each brick in the replica
set. It'll report an inode number. Do a find .glusterfs -inum from
the brick root. You'll likely see that you have different gfid-files.

To fix the problem, you have to help gluster along by cleaning up
the mess. This is completely "do it at your own risk, it worked
for me, ymmv": cp (not mv!) a copy of the file you want to keep.
On each brick in the replica-set, delete the gfid-file and the
datafile. Try a heal on the volume and verify that you can access
the path in question using the glusterfs mount. Copy back your
salvaged file using the glusterfs mount.

We had this happening quite often on a heavily loaded glusterfs
shared filesystem that held a mail-spool. There would be parallel
accesses trying to mv files and sometimes we'd end up with
mismatched data on the bricks of the replica set. I've reported
this on github, but apparently it wasn't seen as a serious
problem. We've moved on to ceph FS now. That sure has bugs, too,
but hopefully not as aggravating.

MfG,

i.A. Thomas Bätzler

-- 


BRINGE Informationstechnik GmbH

Zur Seeplatte 12

D-76228 Karlsruhe

Germany

Fon: +49 721 94246-0

Fon: +49 171 5438457

Fax: +49 721 94246-66

Web:http://www.bringe.de/

Geschäftsführer: Dipl.-Ing. (FH) Martin Bringe

Ust.Id: DE812936645, HRB 108943 Mannheim





Community Meeting Calendar:

Schedule -

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

Bridge:https://meet

Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2

2022-08-17 Thread Ilias Chasapakis forumZFD
Thanks for the suggestions. My question is if the risk is actually 
related to only losing the file/dir or actually creating inconsistencies 
that span through the bricks and "break everything".
Of course we have to take action anyway for this not to spread (as we 
already now have a second entry that developed an "unhealable" directory 
split-brain) so it is just a question of evaluation before acting.


Am 12.08.22 um 18:12 schrieb Thomas Bätzler:

Am 12.08.2022 um 17:12 schrieb Ilias Chasapakis forumZFD:


Dear fellow gluster users,

we are facing a problem with our replica 3 setup. Glusterfs version 
is 9.2.


We have a problem with a directory that is in split-brain and we 
cannot manage to heal with:



gluster volume heal gfsVol split-brain latest-mtime /folder

The command throws the following error: "failed:Transport endpoint is 
not connected."


So the split brain directory entry remains and and so the whole 
healing process is not completing and other entries get stuck.


I saw there is a python script available 
https://github.com/joejulian/glusterfs-splitbrain 
 Would that be a 
good solution to try? To be honest we are a bit concerned with 
deleting the gfid and the files from the brick manually as it seems 
it can create inconsistencies and break things... I can of course 
give you more information about our setup and situation, but if you 
already have some tip, that would be fantastic.


You could at least verify what's going on: Go to your brick roots and 
list /folder from each. You have 3n bricks with n replica sets. Find 
the replica set where you can spot a difference. It's most likely a 
file or directory that's missing or different. If it's a file, do a ls 
-ain on the file on each brick in the replica set. It'll report an 
inode number. Do a find .glusterfs -inum from the brick root. You'll 
likely see that you have different gfid-files.


To fix the problem, you have to help gluster along by cleaning up the 
mess. This is completely "do it at your own risk, it worked for me, 
ymmv": cp (not mv!) a copy of the file you want to keep. On each brick 
in the replica-set, delete the gfid-file and the datafile. Try a heal 
on the volume and verify that you can access the path in question 
using the glusterfs mount. Copy back your salvaged file using the 
glusterfs mount.


We had this happening quite often on a heavily loaded glusterfs shared 
filesystem that held a mail-spool. There would be parallel accesses 
trying to mv files and sometimes we'd end up with mismatched data on 
the bricks of the replica set. I've reported this on github, but 
apparently it wasn't seen as a serious problem. We've moved on to ceph 
FS now. That sure has bugs, too, but hopefully not as aggravating.


MfG,
i.A. Thomas Bätzler
--
BRINGE Informationstechnik GmbH
Zur Seeplatte 12
D-76228 Karlsruhe
Germany

Fon: +49 721 94246-0
Fon: +49 171 5438457
Fax: +49 721 94246-66
Web:http://www.bringe.de/

Geschäftsführer: Dipl.-Ing. (FH) Martin Bringe
Ust.Id: DE812936645, HRB 108943 Mannheim





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


--
forumZFD
Entschieden für Frieden | Committed to Peace

Ilias Chasapakis
Referent IT | IT Consultant

Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
Am Kölner Brett 8 | 50825 Köln | Germany

Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de

Vorstand nach § 26 BGB, einzelvertretungsberechtigt | Executive Board:
Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz
VR 17651 Amtsgericht Köln

Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX



OpenPGP_signature
Description: OpenPGP digital signature




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2

2022-08-12 Thread Joe Julian
Interesting. I've never seen that. It's always been gfid mismatches for 
me since they first started adding gfids to directories.


On 8/12/22 1:27 PM, Strahil Nikolov wrote:

Usually dirs are in split-brain due to content mismatch.
For example, if a file inside a dir can't be healed automaticaly , 
then the dir is "stuck".


Check the files inside the dir for mismatching - fix that first and 
only then try to heal the dir.



Best Regards,
Strahil Nikolov

On Fri, Aug 12, 2022 at 20:26, Péter Károly JUHÁSZ
 wrote:




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2

2022-08-12 Thread Péter Károly JUHÁSZ
This always helped for me in this kind of situations:
http://docs.gluster.org/Troubleshooting/resolving-splitbrain/

Joe Julian  于 2022年8月12日周五 18:33写道:

> It could work, but I never imagined, back then, that *directories* could
> get in split-brain.
>
> The most likely reason for that split is that there's a gfid mismatch on
> one of the replicas. I'd go to the brick with the odd gfid, move that
> directory out of the brick path, then do a "find folder" on the client
> mount to rebuild the directory structure. Check the directory to make sure
> all the files are right before deleting the moved odd one.
>
> If you need to fix anything, just copy from the moved directory to a
> client-mount on the same machine.
> On 8/12/22 8:12 AM, Ilias Chasapakis forumZFD wrote:
>
> Dear fellow gluster users,
>
> we are facing a problem with our replica 3 setup. Glusterfs version is 9.2.
>
> We have a problem with a directory that is in split-brain and we cannot
> manage to heal with:
>
> gluster volume heal gfsVol split-brain latest-mtime /folder
>
> The command throws the following error: "failed:Transport endpoint is not
> connected."
>
> So the split brain directory entry remains and and so the whole healing
> process is not completing and other entries get stuck.
>
> I saw there is a python script available
> https://github.com/joejulian/glusterfs-splitbrain Would that be a good
> solution to try? To be honest we are a bit concerned with deleting the gfid
> and the files from the brick manually as it seems it can create
> inconsistencies and break things... I can of course give you more
> information about our setup and situation, but if you already have some
> tip, that would be fantastic.
>
> Best regards
>
> Ilias
>
> --
> forumZFD
> Entschieden für Frieden | Committed to Peace
>
> Ilias Chasapakis
> Referent IT | IT Consultant
>
> Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
> Am Kölner Brett 8 | 50825 Köln | Germany
>
> Tel 0221 91273243 | Fax 0221 91273299 | http://www.forumZFD.de
>
> Vorstand nach § 26 BGB, einzelvertretungsberechtigt | Executive Board:
> Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz
> VR 17651 Amtsgericht Köln
>
> Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX
>
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing 
> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://meet.google.com/cpu-eiue-hvk
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Directory in split brain does not heal - Gfs 9.2

2022-08-12 Thread Joe Julian
It could work, but I never imagined, back then, that *directories* could 
get in split-brain.


The most likely reason for that split is that there's a gfid mismatch on 
one of the replicas. I'd go to the brick with the odd gfid, move that 
directory out of the brick path, then do a "find folder" on the client 
mount to rebuild the directory structure. Check the directory to make 
sure all the files are right before deleting the moved odd one.


If you need to fix anything, just copy from the moved directory to a 
client-mount on the same machine.


On 8/12/22 8:12 AM, Ilias Chasapakis forumZFD wrote:


Dear fellow gluster users,

we are facing a problem with our replica 3 setup. Glusterfs version is 
9.2.


We have a problem with a directory that is in split-brain and we 
cannot manage to heal with:



gluster volume heal gfsVol split-brain latest-mtime /folder

The command throws the following error: "failed:Transport endpoint is 
not connected."


So the split brain directory entry remains and and so the whole 
healing process is not completing and other entries get stuck.


I saw there is a python script available 
https://github.com/joejulian/glusterfs-splitbrain 
 Would that be a 
good solution to try? To be honest we are a bit concerned with 
deleting the gfid and the files from the brick manually as it seems it 
can create inconsistencies and break things... I can of course give 
you more information about our setup and situation, but if you already 
have some tip, that would be fantastic.


Best regards

Ilias

--
forumZFD
Entschieden für Frieden | Committed to Peace

Ilias Chasapakis
Referent IT | IT Consultant

Forum Ziviler Friedensdienst e.V. | Forum Civil Peace Service
Am Kölner Brett 8 | 50825 Köln | Germany

Tel 0221 91273243 | Fax 0221 91273299 |http://www.forumZFD.de

Vorstand nach § 26 BGB, einzelvertretungsberechtigt | Executive Board:
Oliver Knabe (Vorsitz | Chair), Jens von Bargen, Alexander Mauz
VR 17651 Amtsgericht Köln

Spenden | Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge:https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users