Re: [Gluster-users] Replica bricks fungible?

2022-03-01 Thread Andreas Schwibbe
Confirmed for gluster 7.9 in distributed-replicate and pure replicate
volume.

One of my 3 nodes died :(
I removed all bricks from dead node and added to new node.
I then started to add an arbiter volume as the distributed-replicate is
configured for 2 replica 1 arbiter.
I made sure to use the exact mount point and path and double / triple
checked the bricks had the same file content in any given dir exactly
as the running bricks it was about to be paired again. Then I used
replace-brick command to replace dead-node:brick0 with new-node:brick0.
Did this one by one for all bricks...

It took a while to get the replacement-node up and running, so the
cluster was still operational and in use. When finally moved all bricks
self-heal-daemon started heal on several files.
All worked out perfectly and with no downtime.

Finally I detached the dead node.
Done.

A.

Am Mittwoch, dem 09.06.2021 um 15:17 +0200 schrieb Diego Zuccato:
> Il 05/06/2021 14:36, Zenon Panoussis ha scritto:
>
> > > What I'm really asking is: can I physically move a brick
> > > from one server to another such as
> > I can now answer my own question: yes, replica bricks are
> > identical and can be physically moved or copied from one
> > server to another. I have now done it a few times without
> > any problems, though I made sure no healing was pending
> > before the moves.
> Well, if it's officially supported, that could be a really interesting
> option to quickly scale big storage systems.
> I'm thinking about our scenario: 3 servers, 36 12TB disks each. When
> adding a new server (or another pair of servers, to keep an odd number)
> it will require quite a lot of time to rebalance, with heavy
> implications both on IB network and latency for the users. If we could
> simply swap around some disks it could be a lot faster.
> Have you documented the procedure you followed?
>







Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-31 Thread Andreas Schwibbe
Hm,

I tried format and reset-brick on node2 - no success.
I tried new brick on new node3 and replace-brick - no success as the new 
arbiter is created wrongly and self-heal does not work.

I also restarted all nodes turn by turn without any improvement. 

If shrinking the volume is not recommended, converting it back to replica 2 
possible and if successful.another try?

Thanks
A.



31.05.2021 21:03:01 Strahil Nikolov :

> I would avoid shrinking the volume. An oVirt user reported issues after 
> volume shrinking.
> 
> Did you try to format the arbiter brick and 'replace-brick' ?
> 
> Best Regards,
> Strahil Nikolov
>> 
>> I can't find anything suspicious in the brick logs other than authetication 
>> refused to clients trying to mount a dir that is not existing on the arb_n, 
>> because the self-heal isn't working.
>> I tried to add another node and replace-brick a faulty arbiter, however this 
>> new arbiter sees the same error.
>> 
>> Last idea is to completely remove first subvolume, then re-add as new hoping 
>> it will work.
>> 
>> 
>> A.
>> 
>> 
>> "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44
>>> Ok, will do.
>>>
>>>
>>> working arbiter:
>>>
>>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick
>>>
>>> ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 
>>> .glusterfs
>>> + all data-brick dirs ...
>>>
>>>
>>> affected arbiter:
>>>
>>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick
>>> ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 
>>> .glusterfs
>>> nothing else here
>>>
>>>
>>> find /var/bricks/arb_0/brick -not -user 33 -print
>>>
>>> /var/bricks/arb_0/brick/.glusterfs
>>> /var/bricks/arb_0/brick/.glusterfs/indices
>>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop
>>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty
>>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
>>> /var/bricks/arb_0/brick/.glusterfs/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001
>>> /var/bricks/arb_0/brick/.glusterfs/landfill
>>> /var/bricks/arb_0/brick/.glusterfs/unlink
>>> /var/bricks/arb_0/brick/.glusterfs/health_check
>>>
>>> find /var/bricks/arb_0/brick -not -user 33 -print
>>>
>>> /var/bricks/arb_0/brick/.glusterfs
>>> /var/bricks/arb_0/brick/.glusterfs/indices
>>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop
>>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty
>>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime
>>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap
>>> /var/bricks/arb_0/brick/.glusterfs/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00
>>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001
>>> /var/bricks/arb_0/brick/.glusterfs/landfill
>>> /var/bricks/arb_0/brick/.glusterfs/unlink
>>> /var/bricks/arb_0/brick/.glusterfs/health_check
>>>
>>> Output is identical to user:group 36 as all these have UID:GID 0:0, but 
>>> these files have 0:0 also on the working arbiters.
>>> And this is all files/dirs that exist on the affected arbs. Nothing more on 
>>> it. There should be much more, but this seems to missing self heal.
>>>
>>> Thanks.
>>>
>>> A.
>>>
>>>
>>> "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11
>>> > Hi,
>>> >
>>> > I think that the best way is to go through the logs on the affected 
>>> > arbiter brick (maybe even temporarily increase the log level).
>>> >
>>> > What is the output of:
>>> >
>>> > find /var/brick/arb_0/brick -not -user 36 -print
>>> > find /var/brick/arb_0/brick -not group 36 -print
>>> >
>>> > Maybe there are some files/dirs that are with wrong ownership.
>>> >
>>> > Best Regards,
>>> > Strahil Nikolov
>>> >
>>> > >
>>> > > Thanks Strahil,
>>> > >
>>> > > unfortunately I cannot connect as the mount is denied as in mount.log 
>>> > > provided.
>>> > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When 
>>> > > killing the arb pids on node2 new clients can mount the volume. When 
>>> > > bringing them up again I experience the same problem.
>>> > >
>>> > > I wonder why the root dir on the arb bricks has wrong UID:GID.
>>> > > I added regular data bricks before without any problems on node2.
>>> > >
>>> > > Also when executing "watch df"
>>> > >
>>> > > I see
>>> > >
>>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
>>> > > ..
>>> > >
>>> > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0
>>> > >
>>> > > ..
>>> > >
>>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0
>>> > >
>>> > > So heal daemon might try to do something, which isn't working. Thus I 
>>> > > chowned UID:GID of ../arb_0/brick manually to match,