Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
I am on 7.9-ubuntu1~focal1 amd64 A. ;) "Strahil Nikolov" hunter86...@yahoo.com – 1. Juni 2021 14:56 > Gald to hear that. > > What version are you using ? > It's interesting to find out the reason behind that defunct status. > > > > Best Regards, > Strahil Nikolov > > > > > Strahil, > > > > > > > > I was able to resolve the issue! > > > > > > > > On node1 I found [glusterfsd] . > > > > I put first the failing arbiters in reset-brick, then formated them new, > > > > > > I killed the zombie ps on node 1, then stopped glusterd gracefully, then > > killed all remaining gluster* ps, then started glusterd again, > > > > and finally re-added the arbiters with reset brick commit force - > > > > > > now the arbiters started and have been populated correctly. :) > > > > > > > > Thanks for the support! > > > > Best > > > > > > A. > > > > > > > > > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 21:03 > > > > > I would avoid shrinking the volume. An oVirt user reported issues after > > > volume shrinking. > > > > > > > > > > Did you try to format the arbiter brick and 'replace-brick' ? > > > > > > > > > > Best Regards, > > > > > Strahil Nikolov > > > > > > > > > > > > I can't find anything suspicious in the brick logs other than > > > > authetication refused to clients trying to mount a dir that is not > > > > existing on the arb_n, because the self-heal isn't working. > > > > > > > > > > > > I tried to add another node and replace-brick a faulty arbiter, however > > > > this new arbiter sees the same error. > > > > > > > > > > > > > > > > > > Last idea is to completely remove first subvolume, then re-add as new > > > > hoping it will work. > > > > > > > > > > > > > > > > > > > > > > > > A. > > > > > > > > > > > > > > > > > > > > > > > > "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 > > > > > > > > > > > > > Ok, will do. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > working arbiter: > > > > > > > > > > > > > > > > > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 > > > > > brick > > > > > > > > > > > > > > > > > > > > > > > > > > ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 > > > > > 22:38 .glusterfs > > > > > > > > > > > > > + all data-brick dirs ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > affected arbiter: > > > > > > > > > > > > > > > > > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick > > > > > > > > > > > > > ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 > > > > > .glusterfs > > > > > > > > > > > > > nothing else here > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > > > > > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/health_check > > > > > > > > > > > > > > > > > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > > > > > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > > > > > > > >
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Strahil, I was able to resolve the issue! On node1 I found [glusterfsd] . I put first the failing arbiters in reset-brick, then formated them new, I killed the zombie ps on node 1, then stopped glusterd gracefully, then killed all remaining gluster* ps, then started glusterd again, and finally re-added the arbiters with reset brick commit force - now the arbiters started and have been populated correctly. :) Thanks for the support! Best A. "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 21:03 > I would avoid shrinking the volume. An oVirt user reported issues after > volume shrinking. > > Did you try to format the arbiter brick and 'replace-brick' ? > > Best Regards, > Strahil Nikolov > > > > I can't find anything suspicious in the brick logs other than authetication > > refused to clients trying to mount a dir that is not existing on the arb_n, > > because the self-heal isn't working. > > > > I tried to add another node and replace-brick a faulty arbiter, however > > this new arbiter sees the same error. > > > > > > Last idea is to completely remove first subvolume, then re-add as new > > hoping it will work. > > > > > > > > A. > > > > > > > > "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 > > > > > Ok, will do. > > > > > > > > > > > > > > > working arbiter: > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick > > > > > > > > > > ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 > > > .glusterfs > > > > > + all data-brick dirs ... > > > > > > > > > > > > > > > affected arbiter: > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick > > > > > ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 > > > .glusterfs > > > > > nothing else here > > > > > > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > /var/bricks/arb_0/brick/.glusterfs/health_check > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > /var/bricks/arb_0/brick/.glusterfs/health_check > > > > > > > > > > Output is identical to user:group 36 as all these have UID:GID 0:0, but > > > these files have 0:0 also on the working arbiters. > > > > > And this is all files/dirs that exist on the affected arbs. Nothing more > > > on it. There should be much more, but this seems to missing self heal. > > > > > > > > > > Thanks. > > > > > > > > > > A. > > > > > > > > > > > > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 > > > > > > Hi, > > > > > > > > > > > > I think that the best way is to go through the logs on the affected > > > > arbiter brick (maybe even temporarily increase the log level). > > > > > > > > > > > > What is the output of: > > > > > > > > > > > > find /var/brick/arb_0/brick -not -user 36 -print > > > > > > find /var/brick/arb_0/brick -not group 36 -print > > > > > > > > > > > > Maybe there are some files/dirs that are with wrong ownership. > > > > > > > > > > > > Best Regards, > > > > > > Strahil Nikolov > > > > > > > > > > > > > > > > > > > > Thanks Strahil, > > > > > > > > > > > > > > unfortunately I cannot connect as the mount is denied as in mount.log > > > > > provided. > > > > > > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When > > > > > killing the arb pids on node2 new clients can mount the volume. When > > > > > bringing them up again I experience the same problem. > >
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
I would avoid shrinking the volume. An oVirt user reported issues after volume shrinking. Did you try to format the arbiter brick and 'replace-brick' ? Best Regards,Strahil Nikolov I can't find anything suspicious in the brick logs other than authetication refused to clients trying to mount a dir that is not existing on the arb_n, because the self-heal isn't working. I tried to add another node and replace-brick a faulty arbiter, however this new arbiter sees the same error. Last idea is to completely remove first subvolume, then re-add as new hoping it will work. A. "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 > Ok, will do. > > > working arbiter: > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick > > ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 > .glusterfs > + all data-brick dirs ... > > > affected arbiter: > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick > ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 > .glusterfs > nothing else here > > > find /var/bricks/arb_0/brick -not -user 33 -print > > /var/bricks/arb_0/brick/.glusterfs > /var/bricks/arb_0/brick/.glusterfs/indices > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > /var/bricks/arb_0/brick/.glusterfs/changelogs > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > /var/bricks/arb_0/brick/.glusterfs/00 > /var/bricks/arb_0/brick/.glusterfs/00/00 > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > /var/bricks/arb_0/brick/.glusterfs/landfill > /var/bricks/arb_0/brick/.glusterfs/unlink > /var/bricks/arb_0/brick/.glusterfs/health_check > > find /var/bricks/arb_0/brick -not -user 33 -print > > /var/bricks/arb_0/brick/.glusterfs > /var/bricks/arb_0/brick/.glusterfs/indices > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > /var/bricks/arb_0/brick/.glusterfs/changelogs > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > /var/bricks/arb_0/brick/.glusterfs/00 > /var/bricks/arb_0/brick/.glusterfs/00/00 > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > /var/bricks/arb_0/brick/.glusterfs/landfill > /var/bricks/arb_0/brick/.glusterfs/unlink > /var/bricks/arb_0/brick/.glusterfs/health_check > > Output is identical to user:group 36 as all these have UID:GID 0:0, but these > files have 0:0 also on the working arbiters. > And this is all files/dirs that exist on the affected arbs. Nothing more on > it. There should be much more, but this seems to missing self heal. > > Thanks. > > A. > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 > > Hi, > > > > I think that the best way is to go through the logs on the affected arbiter > > brick (maybe even temporarily increase the log level). > > > > What is the output of: > > > > find /var/brick/arb_0/brick -not -user 36 -print > > find /var/brick/arb_0/brick -not group 36 -print > > > > Maybe there are some files/dirs that are with wrong ownership. > > > > Best Regards, > > Strahil Nikolov > > > > > > > > Thanks Strahil, > > > > > > unfortunately I cannot connect as the mount is denied as in mount.log > > > provided. > > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When > > > killing the arb pids on node2 new clients can mount the volume. When > > > bringing them up again I experience the same problem. > > > > > > I wonder why the root dir on the arb bricks has wrong UID:GID. > > > I added regular data bricks before without any problems on node2. > > > > > > Also when executing "watch df" > > > > > > I see > > > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > .. > > > > > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 > > > > > > .. > > > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > > > > So heal daemon might try to do something, which isn't working. Thus I > > > chowned UID:GID of ../arb_0/brick manually to match, but it did not work > > > either. > > > > > > As I added all 6 arbs at once and 4 are working as expected I really > > > don't get what's wrong with these... > > > > > > A. > > > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > > > > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > > > > Brick : 192.168.0.40:/var/bricks/0/brick > > > > Clients connected : 12 > > > > > > > > Brick : 192.168.0.41:/var/bricks/0/brick > > > > Clients connected : 12 > > > > > > > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > > > > Clients connected : 8 > > > > > > > > Can you try to reconnect them. The most simple way is to kill the > > > > arbiter process and
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Hm, I tried format and reset-brick on node2 - no success. I tried new brick on new node3 and replace-brick - no success as the new arbiter is created wrongly and self-heal does not work. I also restarted all nodes turn by turn without any improvement. If shrinking the volume is not recommended, converting it back to replica 2 possible and if successful.another try? Thanks A. 31.05.2021 21:03:01 Strahil Nikolov : > I would avoid shrinking the volume. An oVirt user reported issues after > volume shrinking. > > Did you try to format the arbiter brick and 'replace-brick' ? > > Best Regards, > Strahil Nikolov >> >> I can't find anything suspicious in the brick logs other than authetication >> refused to clients trying to mount a dir that is not existing on the arb_n, >> because the self-heal isn't working. >> I tried to add another node and replace-brick a faulty arbiter, however this >> new arbiter sees the same error. >> >> Last idea is to completely remove first subvolume, then re-add as new hoping >> it will work. >> >> >> A. >> >> >> "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 >>> Ok, will do. >>> >>> >>> working arbiter: >>> >>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick >>> >>> ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 >>> .glusterfs >>> + all data-brick dirs ... >>> >>> >>> affected arbiter: >>> >>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick >>> ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 >>> .glusterfs >>> nothing else here >>> >>> >>> find /var/bricks/arb_0/brick -not -user 33 -print >>> >>> /var/bricks/arb_0/brick/.glusterfs >>> /var/bricks/arb_0/brick/.glusterfs/indices >>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop >>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty >>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes >>> /var/bricks/arb_0/brick/.glusterfs/changelogs >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap >>> /var/bricks/arb_0/brick/.glusterfs/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 >>> /var/bricks/arb_0/brick/.glusterfs/landfill >>> /var/bricks/arb_0/brick/.glusterfs/unlink >>> /var/bricks/arb_0/brick/.glusterfs/health_check >>> >>> find /var/bricks/arb_0/brick -not -user 33 -print >>> >>> /var/bricks/arb_0/brick/.glusterfs >>> /var/bricks/arb_0/brick/.glusterfs/indices >>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop >>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty >>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes >>> /var/bricks/arb_0/brick/.glusterfs/changelogs >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap >>> /var/bricks/arb_0/brick/.glusterfs/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 >>> /var/bricks/arb_0/brick/.glusterfs/landfill >>> /var/bricks/arb_0/brick/.glusterfs/unlink >>> /var/bricks/arb_0/brick/.glusterfs/health_check >>> >>> Output is identical to user:group 36 as all these have UID:GID 0:0, but >>> these files have 0:0 also on the working arbiters. >>> And this is all files/dirs that exist on the affected arbs. Nothing more on >>> it. There should be much more, but this seems to missing self heal. >>> >>> Thanks. >>> >>> A. >>> >>> >>> "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 >>> > Hi, >>> > >>> > I think that the best way is to go through the logs on the affected >>> > arbiter brick (maybe even temporarily increase the log level). >>> > >>> > What is the output of: >>> > >>> > find /var/brick/arb_0/brick -not -user 36 -print >>> > find /var/brick/arb_0/brick -not group 36 -print >>> > >>> > Maybe there are some files/dirs that are with wrong ownership. >>> > >>> > Best Regards, >>> > Strahil Nikolov >>> > >>> > > >>> > > Thanks Strahil, >>> > > >>> > > unfortunately I cannot connect as the mount is denied as in mount.log >>> > > provided. >>> > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When >>> > > killing the arb pids on node2 new clients can mount the volume. When >>> > > bringing them up again I experience the same problem. >>> > > >>> > > I wonder why the root dir on the arb bricks has wrong UID:GID. >>> > > I added regular data bricks before without any problems on node2. >>> > > >>> > > Also when executing "watch df" >>> > > >>> > > I see >>> > > >>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 >>> > > .. >>> > > >>> > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 >>> > > >>> > > .. >>> > > >>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 >>> > > >>> > > So heal daemon might try to do something, which isn't working. Thus I >>> > > chowned UID:GID of ../arb_0/brick manually to match,
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
I can't find anything suspicious in the brick logs other than authetication refused to clients trying to mount a dir that is not existing on the arb_n, because the self-heal isn't working. I tried to add another node and replace-brick a faulty arbiter, however this new arbiter sees the same error. Last idea is to completely remove first subvolume, then re-add as new hoping it will work. A. "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 > Ok, will do. > > > working arbiter: > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick > > ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 > .glusterfs > + all data-brick dirs ... > > > affected arbiter: > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick > ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 > .glusterfs > nothing else here > > > find /var/bricks/arb_0/brick -not -user 33 -print > > /var/bricks/arb_0/brick/.glusterfs > /var/bricks/arb_0/brick/.glusterfs/indices > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > /var/bricks/arb_0/brick/.glusterfs/changelogs > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > /var/bricks/arb_0/brick/.glusterfs/00 > /var/bricks/arb_0/brick/.glusterfs/00/00 > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > /var/bricks/arb_0/brick/.glusterfs/landfill > /var/bricks/arb_0/brick/.glusterfs/unlink > /var/bricks/arb_0/brick/.glusterfs/health_check > > find /var/bricks/arb_0/brick -not -user 33 -print > > /var/bricks/arb_0/brick/.glusterfs > /var/bricks/arb_0/brick/.glusterfs/indices > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > /var/bricks/arb_0/brick/.glusterfs/changelogs > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > /var/bricks/arb_0/brick/.glusterfs/00 > /var/bricks/arb_0/brick/.glusterfs/00/00 > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > /var/bricks/arb_0/brick/.glusterfs/landfill > /var/bricks/arb_0/brick/.glusterfs/unlink > /var/bricks/arb_0/brick/.glusterfs/health_check > > Output is identical to user:group 36 as all these have UID:GID 0:0, but these > files have 0:0 also on the working arbiters. > And this is all files/dirs that exist on the affected arbs. Nothing more on > it. There should be much more, but this seems to missing self heal. > > Thanks. > > A. > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 > > Hi, > > > > I think that the best way is to go through the logs on the affected arbiter > > brick (maybe even temporarily increase the log level). > > > > What is the output of: > > > > find /var/brick/arb_0/brick -not -user 36 -print > > find /var/brick/arb_0/brick -not group 36 -print > > > > Maybe there are some files/dirs that are with wrong ownership. > > > > Best Regards, > > Strahil Nikolov > > > > > > > > Thanks Strahil, > > > > > > unfortunately I cannot connect as the mount is denied as in mount.log > > > provided. > > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When > > > killing the arb pids on node2 new clients can mount the volume. When > > > bringing them up again I experience the same problem. > > > > > > I wonder why the root dir on the arb bricks has wrong UID:GID. > > > I added regular data bricks before without any problems on node2. > > > > > > Also when executing "watch df" > > > > > > I see > > > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > .. > > > > > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 > > > > > > .. > > > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > > > > So heal daemon might try to do something, which isn't working. Thus I > > > chowned UID:GID of ../arb_0/brick manually to match, but it did not work > > > either. > > > > > > As I added all 6 arbs at once and 4 are working as expected I really > > > don't get what's wrong with these... > > > > > > A. > > > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > > > > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > > > > Brick : 192.168.0.40:/var/bricks/0/brick > > > > Clients connected : 12 > > > > > > > > Brick : 192.168.0.41:/var/bricks/0/brick > > > > Clients connected : 12 > > > > > > > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > > > > Clients connected : 8 > > > > > > > > Can you try to reconnect them. The most simple way is to kill the > > > > arbiter process and 'gluster volume start force' , but always verify > > > > that you have both data bricks up and running. > > > > > > > > > > > > > > > > Yet, this doesn't explain why the heal daemon is not
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Hi, I think that the best way is to go through the logs on the affected arbiter brick (maybe even temporarily increase the log level). What is the output of: find /var/brick/arb_0/brick -not -user 36 -printfind /var/brick/arb_0/brick -not group 36 -print Maybe there are some files/dirs that are with wrong ownership. Best Regards,Strahil Nikolov Thanks Strahil, unfortunately I cannot connect as the mount is denied as in mount.log provided. IPs > n.n.n..100 are clients and simply cannot mount the volume. When killing the arb pids on node2 new clients can mount the volume. When bringing them up again I experience the same problem. I wonder why the root dir on the arb bricks has wrong UID:GID. I added regular data bricks before without any problems on node2. Also when executing "watch df" I see /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 .. /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 .. /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 So heal daemon might try to do something, which isn't working. Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did not work either. As I added all 6 arbs at once and 4 are working as expected I really don't get what's wrong with these... A. "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > Brick : 192.168.0.40:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.41:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > Clients connected : 8 > > Can you try to reconnect them. The most simple way is to kill the arbiter > process and 'gluster volume start force' , but always verify that you have > both data bricks up and running. > > > > Yet, this doesn't explain why the heal daemon is not able to replicate > properly. > > > > Best Regards, > Strahil Nikolov > > > > Meanwhile I tried reset-brick on one of the failing arbiters on node2, but > > with same results. The behaviour is reproducible, arbiter stays empty. > > > > node0: 192.168.0.40 > > > > node1: 192.168.0.41 > > > > node3: 192.168.0.80 > > > > volume info: > > > > Volume Name: gv0 > > Type: Distributed-Replicate > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 6 x (2 + 1) = 18 > > Transport-type: tcp > > Bricks: > > Brick1: 192.168.0.40:/var/bricks/0/brick > > Brick2: 192.168.0.41:/var/bricks/0/brick > > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) > > Brick4: 192.168.0.40:/var/bricks/2/brick > > Brick5: 192.168.0.80:/var/bricks/2/brick > > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) > > Brick7: 192.168.0.40:/var/bricks/1/brick > > Brick8: 192.168.0.41:/var/bricks/1/brick > > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) > > Brick10: 192.168.0.40:/var/bricks/3/brick > > Brick11: 192.168.0.80:/var/bricks/3/brick > > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) > > Brick13: 192.168.0.41:/var/bricks/3/brick > > Brick14: 192.168.0.80:/var/bricks/4/brick > > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) > > Brick16: 192.168.0.41:/var/bricks/2/brick > > Brick17: 192.168.0.80:/var/bricks/5/brick > > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) > > Options Reconfigured: > > cluster.min-free-inodes: 6% > > cluster.min-free-disk: 2% > > performance.md-cache-timeout: 600 > > cluster.rebal-throttle: lazy > > features.scrub-freq: monthly > > features.scrub-throttle: lazy > > features.scrub: Inactive > > features.bitrot: off > > cluster.server-quorum-type: none > > performance.cache-refresh-timeout: 10 > > performance.cache-max-file-size: 64MB > > performance.cache-size: 781901824 > > auth.allow: > > /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) > > performance.cache-invalidation: on > > performance.stat-prefetch: on > > features.cache-invalidation-timeout: 600 > > cluster.quorum-type: auto > > features.cache-invalidation: on > > nfs.disable: on > > transport.address-family: inet > > cluster.self-heal-daemon: on > > cluster.server-quorum-ratio: 51% > > > > volume status: > > > > Status of volume: gv0 > > Gluster process TCP Port RDMA Port Online Pid > > -- > > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 > > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 > > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 > > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 > > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 > > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 > > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 > > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 > > Brick
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
For the arb_0 I seeonly 8 clients , while there should be 12 clients: Brick : 192.168.0.40:/var/bricks/0/brickClients connected : 12 Brick : 192.168.0.41:/var/bricks/0/brickClients connected : 12 Brick : 192.168.0.80:/var/bricks/arb_0/brickClients connected : 8 Can you try to reconnect them. The most simple way is to kill the arbiter process and 'gluster volume start force' , but always verify that you have both data bricks up and running. Yet, this doesn't explain why the heal daemon is not able to replicate properly. Best Regards,Strahil Nikolov Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty. node0: 192.168.0.40 node1: 192.168.0.41 node3: 192.168.0.80 volume info: Volume Name: gv0 Type: Distributed-Replicate Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 Status: Started Snapshot Count: 0 Number of Bricks: 6 x (2 + 1) = 18 Transport-type: tcp Bricks: Brick1: 192.168.0.40:/var/bricks/0/brick Brick2: 192.168.0.41:/var/bricks/0/brick Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) Brick4: 192.168.0.40:/var/bricks/2/brick Brick5: 192.168.0.80:/var/bricks/2/brick Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) Brick7: 192.168.0.40:/var/bricks/1/brick Brick8: 192.168.0.41:/var/bricks/1/brick Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) Brick10: 192.168.0.40:/var/bricks/3/brick Brick11: 192.168.0.80:/var/bricks/3/brick Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) Brick13: 192.168.0.41:/var/bricks/3/brick Brick14: 192.168.0.80:/var/bricks/4/brick Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) Brick16: 192.168.0.41:/var/bricks/2/brick Brick17: 192.168.0.80:/var/bricks/5/brick Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) Options Reconfigured: cluster.min-free-inodes: 6% cluster.min-free-disk: 2% performance.md-cache-timeout: 600 cluster.rebal-throttle: lazy features.scrub-freq: monthly features.scrub-throttle: lazy features.scrub: Inactive features.bitrot: off cluster.server-quorum-type: none performance.cache-refresh-timeout: 10 performance.cache-max-file-size: 64MB performance.cache-size: 781901824 auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 cluster.quorum-type: auto features.cache-invalidation: on nfs.disable: on transport.address-family: inet cluster.self-heal-daemon: on cluster.server-quorum-ratio: 51% volume status: Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid -- Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314 Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692 Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269 Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942 Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058 Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433 Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115 Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602 Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522 Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159 Self-heal Daemon on localhost N/A N/A Y 26199 Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635 Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810 Task Status of Volume gv0 -- There are no active volume tasks volume heal info summary: Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/0/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/arb_0/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Ok, will do. working arbiter: ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 .glusterfs + all data-brick dirs ... affected arbiter: ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 .glusterfs nothing else here find /var/bricks/arb_0/brick -not -user 33 -print /var/bricks/arb_0/brick/.glusterfs /var/bricks/arb_0/brick/.glusterfs/indices /var/bricks/arb_0/brick/.glusterfs/indices/xattrop /var/bricks/arb_0/brick/.glusterfs/indices/dirty /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes /var/bricks/arb_0/brick/.glusterfs/changelogs /var/bricks/arb_0/brick/.glusterfs/changelogs/htime /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap /var/bricks/arb_0/brick/.glusterfs/00 /var/bricks/arb_0/brick/.glusterfs/00/00 /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 /var/bricks/arb_0/brick/.glusterfs/landfill /var/bricks/arb_0/brick/.glusterfs/unlink /var/bricks/arb_0/brick/.glusterfs/health_check find /var/bricks/arb_0/brick -not -user 33 -print /var/bricks/arb_0/brick/.glusterfs /var/bricks/arb_0/brick/.glusterfs/indices /var/bricks/arb_0/brick/.glusterfs/indices/xattrop /var/bricks/arb_0/brick/.glusterfs/indices/dirty /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes /var/bricks/arb_0/brick/.glusterfs/changelogs /var/bricks/arb_0/brick/.glusterfs/changelogs/htime /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap /var/bricks/arb_0/brick/.glusterfs/00 /var/bricks/arb_0/brick/.glusterfs/00/00 /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 /var/bricks/arb_0/brick/.glusterfs/landfill /var/bricks/arb_0/brick/.glusterfs/unlink /var/bricks/arb_0/brick/.glusterfs/health_check Output is identical to user:group 36 as all these have UID:GID 0:0, but these files have 0:0 also on the working arbiters. And this is all files/dirs that exist on the affected arbs. Nothing more on it. There should be much more, but this seems to missing self heal. Thanks. A. "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 > Hi, > > I think that the best way is to go through the logs on the affected arbiter > brick (maybe even temporarily increase the log level). > > What is the output of: > > find /var/brick/arb_0/brick -not -user 36 -print > find /var/brick/arb_0/brick -not group 36 -print > > Maybe there are some files/dirs that are with wrong ownership. > > Best Regards, > Strahil Nikolov > > > > > Thanks Strahil, > > > > unfortunately I cannot connect as the mount is denied as in mount.log > > provided. > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When > > killing the arb pids on node2 new clients can mount the volume. When > > bringing them up again I experience the same problem. > > > > I wonder why the root dir on the arb bricks has wrong UID:GID. > > I added regular data bricks before without any problems on node2. > > > > Also when executing "watch df" > > > > I see > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > .. > > > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 > > > > .. > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > > So heal daemon might try to do something, which isn't working. Thus I > > chowned UID:GID of ../arb_0/brick manually to match, but it did not work > > either. > > > > As I added all 6 arbs at once and 4 are working as expected I really don't > > get what's wrong with these... > > > > A. > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > > > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > > > Brick : 192.168.0.40:/var/bricks/0/brick > > > Clients connected : 12 > > > > > > Brick : 192.168.0.41:/var/bricks/0/brick > > > Clients connected : 12 > > > > > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > > > Clients connected : 8 > > > > > > Can you try to reconnect them. The most simple way is to kill the arbiter > > > process and 'gluster volume start force' , but always verify that you > > > have both data bricks up and running. > > > > > > > > > > > > Yet, this doesn't explain why the heal daemon is not able to replicate > > > properly. > > > > > > > > > > > > Best Regards, > > > Strahil Nikolov > > > > > > > > Meanwhile I tried reset-brick on one of the failing arbiters on node2, > > > > but with same results. The behaviour is reproducible, arbiter stays > > > > empty. > > > > > > > > node0: 192.168.0.40 > > > > > > > > node1: 192.168.0.41 > > > > > > > > node3: 192.168.0.80 > > > > > > > > volume info: > > > > > > > > Volume Name: gv0 > > > > Type: Distributed-Replicate > > > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 > > > > Status: Started > > > > Snapshot Count: 0 > > > > Number of Bricks: 6 x (2 + 1) = 18 > > > > Transport-type: tcp > > > > Bricks: > > > > Brick1:
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Thanks Strahil, unfortunately I cannot connect as the mount is denied as in mount.log provided. IPs > n.n.n..100 are clients and simply cannot mount the volume. When killing the arb pids on node2 new clients can mount the volume. When bringing them up again I experience the same problem. I wonder why the root dir on the arb bricks has wrong UID:GID. I added regular data bricks before without any problems on node2. Also when executing "watch df" I see /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 .. /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 .. /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 So heal daemon might try to do something, which isn't working. Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did not work either. As I added all 6 arbs at once and 4 are working as expected I really don't get what's wrong with these... A. "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > Brick : 192.168.0.40:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.41:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > Clients connected : 8 > > Can you try to reconnect them. The most simple way is to kill the arbiter > process and 'gluster volume start force' , but always verify that you have > both data bricks up and running. > > > > Yet, this doesn't explain why the heal daemon is not able to replicate > properly. > > > > Best Regards, > Strahil Nikolov > > > > Meanwhile I tried reset-brick on one of the failing arbiters on node2, but > > with same results. The behaviour is reproducible, arbiter stays empty. > > > > node0: 192.168.0.40 > > > > node1: 192.168.0.41 > > > > node3: 192.168.0.80 > > > > volume info: > > > > Volume Name: gv0 > > Type: Distributed-Replicate > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 6 x (2 + 1) = 18 > > Transport-type: tcp > > Bricks: > > Brick1: 192.168.0.40:/var/bricks/0/brick > > Brick2: 192.168.0.41:/var/bricks/0/brick > > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) > > Brick4: 192.168.0.40:/var/bricks/2/brick > > Brick5: 192.168.0.80:/var/bricks/2/brick > > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) > > Brick7: 192.168.0.40:/var/bricks/1/brick > > Brick8: 192.168.0.41:/var/bricks/1/brick > > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) > > Brick10: 192.168.0.40:/var/bricks/3/brick > > Brick11: 192.168.0.80:/var/bricks/3/brick > > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) > > Brick13: 192.168.0.41:/var/bricks/3/brick > > Brick14: 192.168.0.80:/var/bricks/4/brick > > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) > > Brick16: 192.168.0.41:/var/bricks/2/brick > > Brick17: 192.168.0.80:/var/bricks/5/brick > > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) > > Options Reconfigured: > > cluster.min-free-inodes: 6% > > cluster.min-free-disk: 2% > > performance.md-cache-timeout: 600 > > cluster.rebal-throttle: lazy > > features.scrub-freq: monthly > > features.scrub-throttle: lazy > > features.scrub: Inactive > > features.bitrot: off > > cluster.server-quorum-type: none > > performance.cache-refresh-timeout: 10 > > performance.cache-max-file-size: 64MB > > performance.cache-size: 781901824 > > auth.allow: > > /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) > > performance.cache-invalidation: on > > performance.stat-prefetch: on > > features.cache-invalidation-timeout: 600 > > cluster.quorum-type: auto > > features.cache-invalidation: on > > nfs.disable: on > > transport.address-family: inet > > cluster.self-heal-daemon: on > > cluster.server-quorum-ratio: 51% > > > > volume status: > > > > Status of volume: gv0 > > Gluster process TCP Port RDMA Port Online Pid > > -- > > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 > > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 > > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 > > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 > > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 > > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 > > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 > > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 > > Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314 > > Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692 > > Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269 > > Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942 > > Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058 > > Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433 > > Brick
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty. node0: 192.168.0.40 node1: 192.168.0.41 node3: 192.168.0.80 volume info: Volume Name: gv0 Type: Distributed-Replicate Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 Status: Started Snapshot Count: 0 Number of Bricks: 6 x (2 + 1) = 18 Transport-type: tcp Bricks: Brick1: 192.168.0.40:/var/bricks/0/brick Brick2: 192.168.0.41:/var/bricks/0/brick Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) Brick4: 192.168.0.40:/var/bricks/2/brick Brick5: 192.168.0.80:/var/bricks/2/brick Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) Brick7: 192.168.0.40:/var/bricks/1/brick Brick8: 192.168.0.41:/var/bricks/1/brick Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) Brick10: 192.168.0.40:/var/bricks/3/brick Brick11: 192.168.0.80:/var/bricks/3/brick Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) Brick13: 192.168.0.41:/var/bricks/3/brick Brick14: 192.168.0.80:/var/bricks/4/brick Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) Brick16: 192.168.0.41:/var/bricks/2/brick Brick17: 192.168.0.80:/var/bricks/5/brick Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) Options Reconfigured: cluster.min-free-inodes: 6% cluster.min-free-disk: 2% performance.md-cache-timeout: 600 cluster.rebal-throttle: lazy features.scrub-freq: monthly features.scrub-throttle: lazy features.scrub: Inactive features.bitrot: off cluster.server-quorum-type: none performance.cache-refresh-timeout: 10 performance.cache-max-file-size: 64MB performance.cache-size: 781901824 auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 cluster.quorum-type: auto features.cache-invalidation: on nfs.disable: on transport.address-family: inet cluster.self-heal-daemon: on cluster.server-quorum-ratio: 51% volume status: Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid -- Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314 Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692 Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269 Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942 Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058 Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433 Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115 Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602 Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522 Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159 Self-heal Daemon on localhost N/A N/A Y 26199 Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635 Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810 Task Status of Volume gv0 -- There are no active volume tasks volume heal info summary: Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/0/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/arb_0/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/arb_1/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/1/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of