Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-30 Thread a . schwibbe
Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with 
same results. The behaviour is reproducible, arbiter stays empty.


node0: 192.168.0.40

node1: 192.168.0.41

node3: 192.168.0.80


volume info:

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559
Status: Started
Snapshot Count: 0
Number of Bricks: 6 x (2 + 1) = 18
Transport-type: tcp
Bricks:
Brick1: 192.168.0.40:/var/bricks/0/brick
Brick2: 192.168.0.41:/var/bricks/0/brick
Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter)
Brick4: 192.168.0.40:/var/bricks/2/brick
Brick5: 192.168.0.80:/var/bricks/2/brick
Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter)
Brick7: 192.168.0.40:/var/bricks/1/brick
Brick8: 192.168.0.41:/var/bricks/1/brick
Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter)
Brick10: 192.168.0.40:/var/bricks/3/brick
Brick11: 192.168.0.80:/var/bricks/3/brick
Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter)
Brick13: 192.168.0.41:/var/bricks/3/brick
Brick14: 192.168.0.80:/var/bricks/4/brick
Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter)
Brick16: 192.168.0.41:/var/bricks/2/brick
Brick17: 192.168.0.80:/var/bricks/5/brick
Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter)
Options Reconfigured:
cluster.min-free-inodes: 6%
cluster.min-free-disk: 2%
performance.md-cache-timeout: 600
cluster.rebal-throttle: lazy
features.scrub-freq: monthly
features.scrub-throttle: lazy
features.scrub: Inactive
features.bitrot: off
cluster.server-quorum-type: none
performance.cache-refresh-timeout: 10
performance.cache-max-file-size: 64MB
performance.cache-size: 781901824
auth.allow: 
/(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136)
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
cluster.quorum-type: auto
features.cache-invalidation: on
nfs.disable: on
transport.address-family: inet
cluster.self-heal-daemon: on
cluster.server-quorum-ratio: 51%

volume status:

Status of volume: gv0
Gluster process TCP Port RDMA Port Online Pid
--
Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066
Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082
Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186
Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075
Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325
Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903
Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084
Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104
Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314
Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692
Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269
Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942
Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058
Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433
Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115
Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602
Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522
Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159
Self-heal Daemon on localhost N/A N/A Y 26199
Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635
Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810

Task Status of Volume gv0
--
There are no active volume tasks

volume heal info summary:

Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs
Status: Connected
Total Number of entries: 1006
Number of entries in heal pending: 1006
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/0/brick
Status: Connected
Total Number of entries: 1006
Number of entries in heal pending: 1006
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/arb_0/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.40:/var/bricks/2/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.80:/var/bricks/2/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.41:/var/bricks/arb_1/brick
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick 192.168.0.40:/var/bricks/1/brick
Status: Connected
Total Number of entries: 1006
Number of entries in heal pending: 1006
Number of 

[Gluster-users] Geo-replication adding new master node

2021-05-30 Thread David Cunningham
Hello,

We have a GlusterFS configuration with mirrored nodes on the master side
geo-replicating to mirrored nodes on the secondary side.

When geo-replication is initially created it seems to automatically add all
the mirrored nodes on the master side as geo-replication master nodes,
which is fine. My first question is, if we add a new master side node how
can we add it as a geo-replication master?
This doesn't seem to happen automatically, according to the output of
"gluster volume geo-replication gvol0 secondary::gvol0 status". If we use
the normal "gluster volume geo-replication gvol0 secondary::slave-vol
create push-pem force" it says that the secondary side volume is not empty,
which is true because we're adding a master node to the existing
geo-replication.

My second question is whether we can geo-replicate to multiple nodes on the
secondary side? Ideally we would normally have something like:
master A -> secondary A
master B -> secondary B
master C -> secondary C
so that any master or secondary node could go offline but geo-replication
would keep working.

Thank you very much in advance.

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed

2021-05-30 Thread a . schwibbe
I am seeking help here after looking for solutions on the web for my 
distributed-replicated volume.
My volume is operated since v3.10 and I upgraded through to 7.9, replaced 
nodes, replaced bricks without a problem. I love it.
Finally I wanted to extend my 6x2 distributed replicated volume with arbiters 
for better split-brain protection.

So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 
arb bricks) and it successfully converted to 6 x (2 +1) and self-heal 
immideately started. Looking good.


Version: 7.9

Number of Bricks: 6 x (2 + 1) = 18

cluster.max-op-version: 70200

Peers: 3 (node[0..2])

Layout

|node0 |node1 |node2
|brick0 |brick0 |arbit0

|arbit1 |brick1 |brick1




I then recognized that arbiter volumes on node0 & node1 have been healed 
successfully.
Unfortunately all arbiter volumes on node2 have not been healed!
I realized that the main dir on my arb mount point has been added (mount point 
/var/brick/arb_0 now contains dir "brick") however this dir on _all_ other 
bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty 
arb-volumes does contain ".glusterfs", however it has only very few entries. 
Other than that "brick" is empty.
At that point I changed brick dir owner with chown to 33:33 and hoped for 
self-heal to work. It did not.
I hoped a rebalance fix-layout would fix things. It did not.
I hoped for a glusterd restart on node2 (as this is happening to both arb 
volumes on this node exclusively) would help. It did not.

Active mount points via nfs-ganesha or fuse continue to work.
Existing clients cause errors in the arb-brick logs on node2 for missing files 
or dirs, but clients seem not affected. r/w operations work.

New clients are not able to fuse mount the volume for "authentication error".
heal statistics heal-count show several hundred files need healing, this count 
is rising. Watching df on the arb-brick mount point on node2 shows every now 
and then a few bytes written, but then removed immideately after that.

Any help/recommendation for you highly appreciated.
Thank you!

A.




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users