Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-06-01 Thread a . schwibbe
I am on 7.9-ubuntu1~focal1 amd64 A. ;) "Strahil Nikolov" hunter86...@yahoo.com – 1. Juni 2021 14:56 > Gald to hear that. > > What version are you using ? > It's interesting to find out the reason behind that defunct status. > > > > Best Regards, > Strahil Nikolov > > > > > Strahil, > > > > > >

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-06-01 Thread a . schwibbe
Strahil, I was able to resolve the issue! On node1 I found [glusterfsd] . I put first the failing arbiters in reset-brick, then formated them new, I killed the zombie ps on node 1, then stopped glusterd gracefully, then killed all remaining gluster* ps, then started glusterd again, and

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread Strahil Nikolov
I would avoid shrinking the volume. An oVirt user reported issues after volume shrinking. Did you try to format the arbiter brick and 'replace-brick' ? Best Regards,Strahil Nikolov I can't find anything suspicious in the brick logs other than authetication refused to clients trying to mount

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread Andreas Schwibbe
Hm, I tried format and reset-brick on node2 - no success. I tried new brick on new node3 and replace-brick - no success as the new arbiter is created wrongly and self-heal does not work. I also restarted all nodes turn by turn without any improvement. If shrinking the volume is not

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread a . schwibbe
I can't find anything suspicious in the brick logs other than authetication refused to clients trying to mount a dir that is not existing on the arb_n, because the self-heal isn't working. I tried to add another node and replace-brick a faulty arbiter, however this new arbiter sees the same

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread Strahil Nikolov
Hi, I think that the best way is to go through the logs on the affected arbiter brick (maybe even temporarily increase the log level). What is the output of: find /var/brick/arb_0/brick -not -user 36 -printfind /var/brick/arb_0/brick -not group 36 -print Maybe there are some files/dirs that are

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread Strahil Nikolov
For the arb_0 I seeonly 8 clients , while there should be 12 clients: Brick : 192.168.0.40:/var/bricks/0/brickClients connected : 12 Brick : 192.168.0.41:/var/bricks/0/brickClients connected : 12 Brick : 192.168.0.80:/var/bricks/arb_0/brickClients connected : 8 Can you try to reconnect them. The

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread a . schwibbe
Ok, will do. working arbiter: ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 .glusterfs + all data-brick dirs ... affected arbiter: ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread a . schwibbe
Thanks Strahil, unfortunately I cannot connect as the mount is denied as in mount.log provided. IPs > n.n.n..100 are clients and simply cannot mount the volume. When killing the arb pids on node2 new clients can mount the volume. When bringing them up again I experience the same problem. I

Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-30 Thread a . schwibbe
Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty. node0: 192.168.0.40 node1: 192.168.0.41 node3: 192.168.0.80 volume info: Volume Name: gv0 Type: Distributed-Replicate Volume ID: