Hm,
I tried format and reset-brick on node2 - no success.
I tried new brick on new node3 and replace-brick - no success as the new
arbiter is created wrongly and self-heal does not work.
I also restarted all nodes turn by turn without any improvement.
If shrinking the volume is not recommended, converting it back to replica 2
possible and if successful.another try?
Thanks
A.
31.05.2021 21:03:01 Strahil Nikolov :
> I would avoid shrinking the volume. An oVirt user reported issues after
> volume shrinking.
>
> Did you try to format the arbiter brick and 'replace-brick' ?
>
> Best Regards,
> Strahil Nikolov
>>
>> I can't find anything suspicious in the brick logs other than authetication
>> refused to clients trying to mount a dir that is not existing on the arb_n,
>> because the self-heal isn't working.
>> I tried to add another node and replace-brick a faulty arbiter, however this
>> new arbiter sees the same error.
>>
>> Last idea is to completely remove first subvolume, then re-add as new hoping
>> it will work.
>>
>>
>> A.
>>
>>
>> "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44
>>> Ok, will do.
>>>
>>>
>>> working arbiter:
>>>
>>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick
>>>
>>> ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38
>>> .glusterfs
>>> + all data-brick dirs ...
>>>
>>>
>>> affected arbiter:
>>>
>>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick
>>> ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23
>>> .glusterfs
>>> nothing else here
>>>
>>>
>>> find /var/bricks/arb_0/brick -not -user 33 -print
>>>
>>> /var/bricks/arb_0/brick/.glusterfs
>>> /var/bricks/arb_0/brick/.glusterfs/indices
>>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop
>>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty
>>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
>>> /var/bricks/arb_0/brick/.glusterfs/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001
>>> /var/bricks/arb_0/brick/.glusterfs/landfill
>>> /var/bricks/arb_0/brick/.glusterfs/unlink
>>> /var/bricks/arb_0/brick/.glusterfs/health_check
>>>
>>> find /var/bricks/arb_0/brick -not -user 33 -print
>>>
>>> /var/bricks/arb_0/brick/.glusterfs
>>> /var/bricks/arb_0/brick/.glusterfs/indices
>>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop
>>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty
>>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
>>> /var/bricks/arb_0/brick/.glusterfs/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001
>>> /var/bricks/arb_0/brick/.glusterfs/landfill
>>> /var/bricks/arb_0/brick/.glusterfs/unlink
>>> /var/bricks/arb_0/brick/.glusterfs/health_check
>>>
>>> Output is identical to user:group 36 as all these have UID:GID 0:0, but
>>> these files have 0:0 also on the working arbiters.
>>> And this is all files/dirs that exist on the affected arbs. Nothing more on
>>> it. There should be much more, but this seems to missing self heal.
>>>
>>> Thanks.
>>>
>>> A.
>>>
>>>
>>> "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11
>>> > Hi,
>>> >
>>> > I think that the best way is to go through the logs on the affected
>>> > arbiter brick (maybe even temporarily increase the log level).
>>> >
>>> > What is the output of:
>>> >
>>> > find /var/brick/arb_0/brick -not -user 36 -print
>>> > find /var/brick/arb_0/brick -not group 36 -print
>>> >
>>> > Maybe there are some files/dirs that are with wrong ownership.
>>> >
>>> > Best Regards,
>>> > Strahil Nikolov
>>> >
>>> > >
>>> > > Thanks Strahil,
>>> > >
>>> > > unfortunately I cannot connect as the mount is denied as in mount.log
>>> > > provided.
>>> > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When
>>> > > killing the arb pids on node2 new clients can mount the volume. When
>>> > > bringing them up again I experience the same problem.
>>> > >
>>> > > I wonder why the root dir on the arb bricks has wrong UID:GID.
>>> > > I added regular data bricks before without any problems on node2.
>>> > >
>>> > > Also when executing "watch df"
>>> > >
>>> > > I see
>>> > >
>>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
>>> > > ..
>>> > >
>>> > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0
>>> > >
>>> > > ..
>>> > >
>>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
>>> > >
>>> > > So heal daemon might try to do something, which isn't working. Thus I
>>> > > chowned UID:GID of ../arb_0/brick manually to match,