Re: [Gluster-users] Replica bricks fungible?
Confirmed for gluster 7.9 in distributed-replicate and pure replicate volume. One of my 3 nodes died :( I removed all bricks from dead node and added to new node. I then started to add an arbiter volume as the distributed-replicate is configured for 2 replica 1 arbiter. I made sure to use the exact mount point and path and double / triple checked the bricks had the same file content in any given dir exactly as the running bricks it was about to be paired again. Then I used replace-brick command to replace dead-node:brick0 with new-node:brick0. Did this one by one for all bricks... It took a while to get the replacement-node up and running, so the cluster was still operational and in use. When finally moved all bricks self-heal-daemon started heal on several files. All worked out perfectly and with no downtime. Finally I detached the dead node. Done. A. Am Mittwoch, dem 09.06.2021 um 15:17 +0200 schrieb Diego Zuccato: > Il 05/06/2021 14:36, Zenon Panoussis ha scritto: > > > > What I'm really asking is: can I physically move a brick > > > from one server to another such as > > I can now answer my own question: yes, replica bricks are > > identical and can be physically moved or copied from one > > server to another. I have now done it a few times without > > any problems, though I made sure no healing was pending > > before the moves. > Well, if it's officially supported, that could be a really interesting > option to quickly scale big storage systems. > I'm thinking about our scenario: 3 servers, 36 12TB disks each. When > adding a new server (or another pair of servers, to keep an odd number) > it will require quite a lot of time to rebalance, with heavy > implications both on IB network and latency for the users. If we could > simply swap around some disks it could be a lot faster. > Have you documented the procedure you followed? > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs volume not mounted at boot time (RedHat 8.4)
Hey Dario, I also have libvirtd running. No problems on Ubuntu dists, everything is started/mounted in correct order, but can't recommend on RedHat. I'd look at chkconfig man-page, you could edit boot order of your services if necessary. Andreas "Dario Lesca" d.le...@solinos.it – 27. August 2021 17:12 > Thanks Andreas. > > I have try to remove only the "noauto" in my fstab line but none is change. > > Then I have follow you suggest and I try to leave only "defaults,_netdev" > In this case the volume after reboot is mounted. > Good! > > But my problem is that this volume must mount before libvirtd start, or maybe > it is better to say, libvirtd must start after "glusterd is start and volume > is mount". > This is why I added those x-systemd directives > > There is some solution to this issue? > > Many thanks > Dario > > > > Il giorno ven, 27/08/2021 alle 14.53 +, a.schwi...@gmx.net ha scritto: > > Dario, > > > > > > Your fstab line includes mount option "noauto" so it won't automatically > > mount on boot!? > > > > Try to remove noauto, reboot. > > > > I usually only mount local gluster mount point with defaults,_netdev > > > > > > > > Cheers > > > > > > Andreas > > > > > > > > "Dario Lesca" d.le...@solinos.it– 27. August 2021 16:38 > > > > > Hello everybody. > > > > > > > > > I have setup a glusterfs volume without problem, all work fine. > > > > > > But if I reboot a node, when the node start the volume is not mounted. > > > > > > If I access to the node via SSH and run "mount /virt-gfs/" the volume > > > > > > is mounted correctly. > > > > > > > > > This is the /etc/fstab entry: > > > > > > virt2.local:/gfsvol1 /virt-gfs glusterfs > > > defaults,_netdev,noauto,x-systemd.automount,x-systemd.device-timeout=120,x-systemd.requires=glusterd.service,x-systemd.before=libvirtd.service > > > 0 0 > > > > > > > > > For testing, I have also set SElinux to "permissive" but none is > > > > > > change. > > > > > > > > > Someone can help me? > > > > > > > > > If you need some other info, let me know > > > > > > > > > Many thanks > > > > > > > > > > > > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs volume not mounted at boot time (RedHat 8.4)
Dario, Your fstab line includes mount option "noauto" so it won't automatically mount on boot!? Try to remove noauto, reboot. I usually only mount local gluster mount point with defaults,_netdev Cheers Andreas "Dario Lesca" d.le...@solinos.it – 27. August 2021 16:38 > Hello everybody. > > I have setup a glusterfs volume without problem, all work fine. > But if I reboot a node, when the node start the volume is not mounted. > If I access to the node via SSH and run "mount /virt-gfs/" the volume > is mounted correctly. > > This is the /etc/fstab entry: > virt2.local:/gfsvol1 /virt-gfs glusterfs > defaults,_netdev,noauto,x-systemd.automount,x-systemd.device-timeout=120,x-systemd.requires=glusterd.service,x-systemd.before=libvirtd.service > 0 0 > > For testing, I have also set SElinux to "permissive" but none is > change. > > Someone can help me? > > If you need some other info, let me know > > Many thanks > > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
I am on 7.9-ubuntu1~focal1 amd64 A. ;) "Strahil Nikolov" hunter86...@yahoo.com – 1. Juni 2021 14:56 > Gald to hear that. > > What version are you using ? > It's interesting to find out the reason behind that defunct status. > > > > Best Regards, > Strahil Nikolov > > > > > Strahil, > > > > > > > > I was able to resolve the issue! > > > > > > > > On node1 I found [glusterfsd] . > > > > I put first the failing arbiters in reset-brick, then formated them new, > > > > > > I killed the zombie ps on node 1, then stopped glusterd gracefully, then > > killed all remaining gluster* ps, then started glusterd again, > > > > and finally re-added the arbiters with reset brick commit force - > > > > > > now the arbiters started and have been populated correctly. :) > > > > > > > > Thanks for the support! > > > > Best > > > > > > A. > > > > > > > > > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 21:03 > > > > > I would avoid shrinking the volume. An oVirt user reported issues after > > > volume shrinking. > > > > > > > > > > Did you try to format the arbiter brick and 'replace-brick' ? > > > > > > > > > > Best Regards, > > > > > Strahil Nikolov > > > > > > > > > > > > I can't find anything suspicious in the brick logs other than > > > > authetication refused to clients trying to mount a dir that is not > > > > existing on the arb_n, because the self-heal isn't working. > > > > > > > > > > > > I tried to add another node and replace-brick a faulty arbiter, however > > > > this new arbiter sees the same error. > > > > > > > > > > > > > > > > > > Last idea is to completely remove first subvolume, then re-add as new > > > > hoping it will work. > > > > > > > > > > > > > > > > > > > > > > > > A. > > > > > > > > > > > > > > > > > > > > > > > > "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 > > > > > > > > > > > > > Ok, will do. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > working arbiter: > > > > > > > > > > > > > > > > > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 > > > > > brick > > > > > > > > > > > > > > > > > > > > > > > > > > ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 > > > > > 22:38 .glusterfs > > > > > > > > > > > > > + all data-brick dirs ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > affected arbiter: > > > > > > > > > > > > > > > > > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick > > > > > > > > > > > > > ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 > > > > > .glusterfs > > > > > > > > > > > > > nothing else here > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > > > > > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/health_check > > > > > > > > > > > > > > > > > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > > > > > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > > > > > > > >
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Strahil, I was able to resolve the issue! On node1 I found [glusterfsd] . I put first the failing arbiters in reset-brick, then formated them new, I killed the zombie ps on node 1, then stopped glusterd gracefully, then killed all remaining gluster* ps, then started glusterd again, and finally re-added the arbiters with reset brick commit force - now the arbiters started and have been populated correctly. :) Thanks for the support! Best A. "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 21:03 > I would avoid shrinking the volume. An oVirt user reported issues after > volume shrinking. > > Did you try to format the arbiter brick and 'replace-brick' ? > > Best Regards, > Strahil Nikolov > > > > I can't find anything suspicious in the brick logs other than authetication > > refused to clients trying to mount a dir that is not existing on the arb_n, > > because the self-heal isn't working. > > > > I tried to add another node and replace-brick a faulty arbiter, however > > this new arbiter sees the same error. > > > > > > Last idea is to completely remove first subvolume, then re-add as new > > hoping it will work. > > > > > > > > A. > > > > > > > > "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 > > > > > Ok, will do. > > > > > > > > > > > > > > > working arbiter: > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick > > > > > > > > > > ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 > > > .glusterfs > > > > > + all data-brick dirs ... > > > > > > > > > > > > > > > affected arbiter: > > > > > > > > > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick > > > > > ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 > > > .glusterfs > > > > > nothing else here > > > > > > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > /var/bricks/arb_0/brick/.glusterfs/health_check > > > > > > > > > > find /var/bricks/arb_0/brick -not -user 33 -print > > > > > > > > > > /var/bricks/arb_0/brick/.glusterfs > > > > > /var/bricks/arb_0/brick/.glusterfs/indices > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > > > > > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > > > > > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > > > > > /var/bricks/arb_0/brick/.glusterfs/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00 > > > > > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > > > > > /var/bricks/arb_0/brick/.glusterfs/landfill > > > > > /var/bricks/arb_0/brick/.glusterfs/unlink > > > > > /var/bricks/arb_0/brick/.glusterfs/health_check > > > > > > > > > > Output is identical to user:group 36 as all these have UID:GID 0:0, but > > > these files have 0:0 also on the working arbiters. > > > > > And this is all files/dirs that exist on the affected arbs. Nothing more > > > on it. There should be much more, but this seems to missing self heal. > > > > > > > > > > Thanks. > > > > > > > > > > A. > > > > > > > > > > > > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 > > > > > > Hi, > > > > > > > > > > > > I think that the best way is to go through the logs on the affected > > > > arbiter brick (maybe even temporarily increase the log level). > > > > > > > > > > > > What is the output of: > > > > > > > > > > > > find /var/brick/arb_0/brick -not -user 36 -print > > > > > > find /var/brick/arb_0/brick -not group 36 -print > > > > > > > > > > > > Maybe there are some files/dirs that are with wrong ownership. > > > > > > > > > > > > Best Regards, > > > > > > Strahil Nikolov > > > > > > > > > > > > > > > > > > > > Thanks Strahil, > > > > > > > > > > > > > > unfortunately I cannot connect as the mount is denied as in mount.log > > > > > provided. > > > > > > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When > > > > > killing the arb pids on node2 new clients can mount the volume. When > > > > > bringing them up again I experience the same problem. > >
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Hm, I tried format and reset-brick on node2 - no success. I tried new brick on new node3 and replace-brick - no success as the new arbiter is created wrongly and self-heal does not work. I also restarted all nodes turn by turn without any improvement. If shrinking the volume is not recommended, converting it back to replica 2 possible and if successful.another try? Thanks A. 31.05.2021 21:03:01 Strahil Nikolov : > I would avoid shrinking the volume. An oVirt user reported issues after > volume shrinking. > > Did you try to format the arbiter brick and 'replace-brick' ? > > Best Regards, > Strahil Nikolov >> >> I can't find anything suspicious in the brick logs other than authetication >> refused to clients trying to mount a dir that is not existing on the arb_n, >> because the self-heal isn't working. >> I tried to add another node and replace-brick a faulty arbiter, however this >> new arbiter sees the same error. >> >> Last idea is to completely remove first subvolume, then re-add as new hoping >> it will work. >> >> >> A. >> >> >> "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 >>> Ok, will do. >>> >>> >>> working arbiter: >>> >>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick >>> >>> ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 >>> .glusterfs >>> + all data-brick dirs ... >>> >>> >>> affected arbiter: >>> >>> ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick >>> ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 >>> .glusterfs >>> nothing else here >>> >>> >>> find /var/bricks/arb_0/brick -not -user 33 -print >>> >>> /var/bricks/arb_0/brick/.glusterfs >>> /var/bricks/arb_0/brick/.glusterfs/indices >>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop >>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty >>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes >>> /var/bricks/arb_0/brick/.glusterfs/changelogs >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap >>> /var/bricks/arb_0/brick/.glusterfs/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 >>> /var/bricks/arb_0/brick/.glusterfs/landfill >>> /var/bricks/arb_0/brick/.glusterfs/unlink >>> /var/bricks/arb_0/brick/.glusterfs/health_check >>> >>> find /var/bricks/arb_0/brick -not -user 33 -print >>> >>> /var/bricks/arb_0/brick/.glusterfs >>> /var/bricks/arb_0/brick/.glusterfs/indices >>> /var/bricks/arb_0/brick/.glusterfs/indices/xattrop >>> /var/bricks/arb_0/brick/.glusterfs/indices/dirty >>> /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes >>> /var/bricks/arb_0/brick/.glusterfs/changelogs >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/htime >>> /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap >>> /var/bricks/arb_0/brick/.glusterfs/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00 >>> /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 >>> /var/bricks/arb_0/brick/.glusterfs/landfill >>> /var/bricks/arb_0/brick/.glusterfs/unlink >>> /var/bricks/arb_0/brick/.glusterfs/health_check >>> >>> Output is identical to user:group 36 as all these have UID:GID 0:0, but >>> these files have 0:0 also on the working arbiters. >>> And this is all files/dirs that exist on the affected arbs. Nothing more on >>> it. There should be much more, but this seems to missing self heal. >>> >>> Thanks. >>> >>> A. >>> >>> >>> "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 >>> > Hi, >>> > >>> > I think that the best way is to go through the logs on the affected >>> > arbiter brick (maybe even temporarily increase the log level). >>> > >>> > What is the output of: >>> > >>> > find /var/brick/arb_0/brick -not -user 36 -print >>> > find /var/brick/arb_0/brick -not group 36 -print >>> > >>> > Maybe there are some files/dirs that are with wrong ownership. >>> > >>> > Best Regards, >>> > Strahil Nikolov >>> > >>> > > >>> > > Thanks Strahil, >>> > > >>> > > unfortunately I cannot connect as the mount is denied as in mount.log >>> > > provided. >>> > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When >>> > > killing the arb pids on node2 new clients can mount the volume. When >>> > > bringing them up again I experience the same problem. >>> > > >>> > > I wonder why the root dir on the arb bricks has wrong UID:GID. >>> > > I added regular data bricks before without any problems on node2. >>> > > >>> > > Also when executing "watch df" >>> > > >>> > > I see >>> > > >>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 >>> > > .. >>> > > >>> > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 >>> > > >>> > > .. >>> > > >>> > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 >>> > > >>> > > So heal daemon might try to do something, which isn't working. Thus I >>> > > chowned UID:GID of ../arb_0/brick manually to match,
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
I can't find anything suspicious in the brick logs other than authetication refused to clients trying to mount a dir that is not existing on the arb_n, because the self-heal isn't working. I tried to add another node and replace-brick a faulty arbiter, however this new arbiter sees the same error. Last idea is to completely remove first subvolume, then re-add as new hoping it will work. A. "a.schwi...@gmx.net" a.schwi...@gmx.net – 31. Mai 2021 13:44 > Ok, will do. > > > working arbiter: > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick > > ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 > .glusterfs > + all data-brick dirs ... > > > affected arbiter: > > ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick > ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 > .glusterfs > nothing else here > > > find /var/bricks/arb_0/brick -not -user 33 -print > > /var/bricks/arb_0/brick/.glusterfs > /var/bricks/arb_0/brick/.glusterfs/indices > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > /var/bricks/arb_0/brick/.glusterfs/changelogs > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > /var/bricks/arb_0/brick/.glusterfs/00 > /var/bricks/arb_0/brick/.glusterfs/00/00 > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > /var/bricks/arb_0/brick/.glusterfs/landfill > /var/bricks/arb_0/brick/.glusterfs/unlink > /var/bricks/arb_0/brick/.glusterfs/health_check > > find /var/bricks/arb_0/brick -not -user 33 -print > > /var/bricks/arb_0/brick/.glusterfs > /var/bricks/arb_0/brick/.glusterfs/indices > /var/bricks/arb_0/brick/.glusterfs/indices/xattrop > /var/bricks/arb_0/brick/.glusterfs/indices/dirty > /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes > /var/bricks/arb_0/brick/.glusterfs/changelogs > /var/bricks/arb_0/brick/.glusterfs/changelogs/htime > /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap > /var/bricks/arb_0/brick/.glusterfs/00 > /var/bricks/arb_0/brick/.glusterfs/00/00 > /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 > /var/bricks/arb_0/brick/.glusterfs/landfill > /var/bricks/arb_0/brick/.glusterfs/unlink > /var/bricks/arb_0/brick/.glusterfs/health_check > > Output is identical to user:group 36 as all these have UID:GID 0:0, but these > files have 0:0 also on the working arbiters. > And this is all files/dirs that exist on the affected arbs. Nothing more on > it. There should be much more, but this seems to missing self heal. > > Thanks. > > A. > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 > > Hi, > > > > I think that the best way is to go through the logs on the affected arbiter > > brick (maybe even temporarily increase the log level). > > > > What is the output of: > > > > find /var/brick/arb_0/brick -not -user 36 -print > > find /var/brick/arb_0/brick -not group 36 -print > > > > Maybe there are some files/dirs that are with wrong ownership. > > > > Best Regards, > > Strahil Nikolov > > > > > > > > Thanks Strahil, > > > > > > unfortunately I cannot connect as the mount is denied as in mount.log > > > provided. > > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When > > > killing the arb pids on node2 new clients can mount the volume. When > > > bringing them up again I experience the same problem. > > > > > > I wonder why the root dir on the arb bricks has wrong UID:GID. > > > I added regular data bricks before without any problems on node2. > > > > > > Also when executing "watch df" > > > > > > I see > > > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > .. > > > > > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 > > > > > > .. > > > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > > > > So heal daemon might try to do something, which isn't working. Thus I > > > chowned UID:GID of ../arb_0/brick manually to match, but it did not work > > > either. > > > > > > As I added all 6 arbs at once and 4 are working as expected I really > > > don't get what's wrong with these... > > > > > > A. > > > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > > > > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > > > > Brick : 192.168.0.40:/var/bricks/0/brick > > > > Clients connected : 12 > > > > > > > > Brick : 192.168.0.41:/var/bricks/0/brick > > > > Clients connected : 12 > > > > > > > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > > > > Clients connected : 8 > > > > > > > > Can you try to reconnect them. The most simple way is to kill the > > > > arbiter process and 'gluster volume start force' , but always verify > > > > that you have both data bricks up and running. > > > > > > > > > > > > > > > > Yet, this doesn't explain why the heal daemon is not
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Ok, will do. working arbiter: ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 13 33 33 146 Mai 29 22:38 brick ls- lna /var/bricks/arb_0/brick >>> drw--- 262 0 0 8192 Mai 29 22:38 .glusterfs + all data-brick dirs ... affected arbiter: ls -ln /var/bricks/arb_0/ >>> drwxr-xr-x 3 0 0 24 Mai 30 16:23 brick ls -lna /var/bricks/arb_0/brick >>> drw--- 7 0 0 99 Mai 30 16:23 .glusterfs nothing else here find /var/bricks/arb_0/brick -not -user 33 -print /var/bricks/arb_0/brick/.glusterfs /var/bricks/arb_0/brick/.glusterfs/indices /var/bricks/arb_0/brick/.glusterfs/indices/xattrop /var/bricks/arb_0/brick/.glusterfs/indices/dirty /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes /var/bricks/arb_0/brick/.glusterfs/changelogs /var/bricks/arb_0/brick/.glusterfs/changelogs/htime /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap /var/bricks/arb_0/brick/.glusterfs/00 /var/bricks/arb_0/brick/.glusterfs/00/00 /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 /var/bricks/arb_0/brick/.glusterfs/landfill /var/bricks/arb_0/brick/.glusterfs/unlink /var/bricks/arb_0/brick/.glusterfs/health_check find /var/bricks/arb_0/brick -not -user 33 -print /var/bricks/arb_0/brick/.glusterfs /var/bricks/arb_0/brick/.glusterfs/indices /var/bricks/arb_0/brick/.glusterfs/indices/xattrop /var/bricks/arb_0/brick/.glusterfs/indices/dirty /var/bricks/arb_0/brick/.glusterfs/indices/entry-changes /var/bricks/arb_0/brick/.glusterfs/changelogs /var/bricks/arb_0/brick/.glusterfs/changelogs/htime /var/bricks/arb_0/brick/.glusterfs/changelogs/csnap /var/bricks/arb_0/brick/.glusterfs/00 /var/bricks/arb_0/brick/.glusterfs/00/00 /var/bricks/arb_0/brick/.glusterfs/00/00/----0001 /var/bricks/arb_0/brick/.glusterfs/landfill /var/bricks/arb_0/brick/.glusterfs/unlink /var/bricks/arb_0/brick/.glusterfs/health_check Output is identical to user:group 36 as all these have UID:GID 0:0, but these files have 0:0 also on the working arbiters. And this is all files/dirs that exist on the affected arbs. Nothing more on it. There should be much more, but this seems to missing self heal. Thanks. A. "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 13:11 > Hi, > > I think that the best way is to go through the logs on the affected arbiter > brick (maybe even temporarily increase the log level). > > What is the output of: > > find /var/brick/arb_0/brick -not -user 36 -print > find /var/brick/arb_0/brick -not group 36 -print > > Maybe there are some files/dirs that are with wrong ownership. > > Best Regards, > Strahil Nikolov > > > > > Thanks Strahil, > > > > unfortunately I cannot connect as the mount is denied as in mount.log > > provided. > > IPs > n.n.n..100 are clients and simply cannot mount the volume. When > > killing the arb pids on node2 new clients can mount the volume. When > > bringing them up again I experience the same problem. > > > > I wonder why the root dir on the arb bricks has wrong UID:GID. > > I added regular data bricks before without any problems on node2. > > > > Also when executing "watch df" > > > > I see > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > .. > > > > /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 > > > > .. > > > > /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 > > > > So heal daemon might try to do something, which isn't working. Thus I > > chowned UID:GID of ../arb_0/brick manually to match, but it did not work > > either. > > > > As I added all 6 arbs at once and 4 are working as expected I really don't > > get what's wrong with these... > > > > A. > > > > "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > > > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > > > Brick : 192.168.0.40:/var/bricks/0/brick > > > Clients connected : 12 > > > > > > Brick : 192.168.0.41:/var/bricks/0/brick > > > Clients connected : 12 > > > > > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > > > Clients connected : 8 > > > > > > Can you try to reconnect them. The most simple way is to kill the arbiter > > > process and 'gluster volume start force' , but always verify that you > > > have both data bricks up and running. > > > > > > > > > > > > Yet, this doesn't explain why the heal daemon is not able to replicate > > > properly. > > > > > > > > > > > > Best Regards, > > > Strahil Nikolov > > > > > > > > Meanwhile I tried reset-brick on one of the failing arbiters on node2, > > > > but with same results. The behaviour is reproducible, arbiter stays > > > > empty. > > > > > > > > node0: 192.168.0.40 > > > > > > > > node1: 192.168.0.41 > > > > > > > > node3: 192.168.0.80 > > > > > > > > volume info: > > > > > > > > Volume Name: gv0 > > > > Type: Distributed-Replicate > > > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 > > > > Status: Started > > > > Snapshot Count: 0 > > > > Number of Bricks: 6 x (2 + 1) = 18 > > > > Transport-type: tcp > > > > Bricks: > > > > Brick1:
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Thanks Strahil, unfortunately I cannot connect as the mount is denied as in mount.log provided. IPs > n.n.n..100 are clients and simply cannot mount the volume. When killing the arb pids on node2 new clients can mount the volume. When bringing them up again I experience the same problem. I wonder why the root dir on the arb bricks has wrong UID:GID. I added regular data bricks before without any problems on node2. Also when executing "watch df" I see /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 .. /dev/md50 11700224 33128 11667096 1% /var/bricks/arb_0 .. /dev/md50 11700224 33108 11667116 1% /var/bricks/arb_0 So heal daemon might try to do something, which isn't working. Thus I chowned UID:GID of ../arb_0/brick manually to match, but it did not work either. As I added all 6 arbs at once and 4 are working as expected I really don't get what's wrong with these... A. "Strahil Nikolov" hunter86...@yahoo.com – 31. Mai 2021 11:12 > For the arb_0 I seeonly 8 clients , while there should be 12 clients: > Brick : 192.168.0.40:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.41:/var/bricks/0/brick > Clients connected : 12 > > Brick : 192.168.0.80:/var/bricks/arb_0/brick > Clients connected : 8 > > Can you try to reconnect them. The most simple way is to kill the arbiter > process and 'gluster volume start force' , but always verify that you have > both data bricks up and running. > > > > Yet, this doesn't explain why the heal daemon is not able to replicate > properly. > > > > Best Regards, > Strahil Nikolov > > > > Meanwhile I tried reset-brick on one of the failing arbiters on node2, but > > with same results. The behaviour is reproducible, arbiter stays empty. > > > > node0: 192.168.0.40 > > > > node1: 192.168.0.41 > > > > node3: 192.168.0.80 > > > > volume info: > > > > Volume Name: gv0 > > Type: Distributed-Replicate > > Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 6 x (2 + 1) = 18 > > Transport-type: tcp > > Bricks: > > Brick1: 192.168.0.40:/var/bricks/0/brick > > Brick2: 192.168.0.41:/var/bricks/0/brick > > Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) > > Brick4: 192.168.0.40:/var/bricks/2/brick > > Brick5: 192.168.0.80:/var/bricks/2/brick > > Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) > > Brick7: 192.168.0.40:/var/bricks/1/brick > > Brick8: 192.168.0.41:/var/bricks/1/brick > > Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) > > Brick10: 192.168.0.40:/var/bricks/3/brick > > Brick11: 192.168.0.80:/var/bricks/3/brick > > Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) > > Brick13: 192.168.0.41:/var/bricks/3/brick > > Brick14: 192.168.0.80:/var/bricks/4/brick > > Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) > > Brick16: 192.168.0.41:/var/bricks/2/brick > > Brick17: 192.168.0.80:/var/bricks/5/brick > > Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) > > Options Reconfigured: > > cluster.min-free-inodes: 6% > > cluster.min-free-disk: 2% > > performance.md-cache-timeout: 600 > > cluster.rebal-throttle: lazy > > features.scrub-freq: monthly > > features.scrub-throttle: lazy > > features.scrub: Inactive > > features.bitrot: off > > cluster.server-quorum-type: none > > performance.cache-refresh-timeout: 10 > > performance.cache-max-file-size: 64MB > > performance.cache-size: 781901824 > > auth.allow: > > /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) > > performance.cache-invalidation: on > > performance.stat-prefetch: on > > features.cache-invalidation-timeout: 600 > > cluster.quorum-type: auto > > features.cache-invalidation: on > > nfs.disable: on > > transport.address-family: inet > > cluster.self-heal-daemon: on > > cluster.server-quorum-ratio: 51% > > > > volume status: > > > > Status of volume: gv0 > > Gluster process TCP Port RDMA Port Online Pid > > -- > > Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 > > Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 > > Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 > > Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 > > Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 > > Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 > > Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 > > Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 > > Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314 > > Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692 > > Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269 > > Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942 > > Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058 > > Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433 > > Brick
Re: [Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
Meanwhile I tried reset-brick on one of the failing arbiters on node2, but with same results. The behaviour is reproducible, arbiter stays empty. node0: 192.168.0.40 node1: 192.168.0.41 node3: 192.168.0.80 volume info: Volume Name: gv0 Type: Distributed-Replicate Volume ID: 9bafc4d2-d9b6-4b6d-a631-1cf42d1d2559 Status: Started Snapshot Count: 0 Number of Bricks: 6 x (2 + 1) = 18 Transport-type: tcp Bricks: Brick1: 192.168.0.40:/var/bricks/0/brick Brick2: 192.168.0.41:/var/bricks/0/brick Brick3: 192.168.0.80:/var/bricks/arb_0/brick (arbiter) Brick4: 192.168.0.40:/var/bricks/2/brick Brick5: 192.168.0.80:/var/bricks/2/brick Brick6: 192.168.0.41:/var/bricks/arb_1/brick (arbiter) Brick7: 192.168.0.40:/var/bricks/1/brick Brick8: 192.168.0.41:/var/bricks/1/brick Brick9: 192.168.0.80:/var/bricks/arb_1/brick (arbiter) Brick10: 192.168.0.40:/var/bricks/3/brick Brick11: 192.168.0.80:/var/bricks/3/brick Brick12: 192.168.0.41:/var/bricks/arb_0/brick (arbiter) Brick13: 192.168.0.41:/var/bricks/3/brick Brick14: 192.168.0.80:/var/bricks/4/brick Brick15: 192.168.0.40:/var/bricks/arb_0/brick (arbiter) Brick16: 192.168.0.41:/var/bricks/2/brick Brick17: 192.168.0.80:/var/bricks/5/brick Brick18: 192.168.0.40:/var/bricks/arb_1/brick (arbiter) Options Reconfigured: cluster.min-free-inodes: 6% cluster.min-free-disk: 2% performance.md-cache-timeout: 600 cluster.rebal-throttle: lazy features.scrub-freq: monthly features.scrub-throttle: lazy features.scrub: Inactive features.bitrot: off cluster.server-quorum-type: none performance.cache-refresh-timeout: 10 performance.cache-max-file-size: 64MB performance.cache-size: 781901824 auth.allow: /(192.168.0.*),/usr/andreas(192.168.0.120),/usr/otis(192.168.0.168),/usr/otis(192.168.0.111),/usr/otis(192.168.0.249),/media(192.168.0.*),/virt(192.168.0.*),/cloud(192.168.0.247),/zm(192.168.0.136) performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 cluster.quorum-type: auto features.cache-invalidation: on nfs.disable: on transport.address-family: inet cluster.self-heal-daemon: on cluster.server-quorum-ratio: 51% volume status: Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid -- Brick 192.168.0.40:/var/bricks/0/brick 49155 0 Y 713066 Brick 192.168.0.41:/var/bricks/0/brick 49152 0 Y 2082 Brick 192.168.0.80:/var/bricks/arb_0/brick 49152 0 Y 26186 Brick 192.168.0.40:/var/bricks/2/brick 49156 0 Y 713075 Brick 192.168.0.80:/var/bricks/2/brick 49154 0 Y 325 Brick 192.168.0.41:/var/bricks/arb_1/brick 49157 0 Y 1746903 Brick 192.168.0.40:/var/bricks/1/brick 49157 0 Y 713084 Brick 192.168.0.41:/var/bricks/1/brick 49153 0 Y 14104 Brick 192.168.0.80:/var/bricks/arb_1/brick 49159 0 Y 2314 Brick 192.168.0.40:/var/bricks/3/brick 49153 0 Y 2978692 Brick 192.168.0.80:/var/bricks/3/brick 49155 0 Y 23269 Brick 192.168.0.41:/var/bricks/arb_0/brick 49158 0 Y 1746942 Brick 192.168.0.41:/var/bricks/3/brick 49155 0 Y 897058 Brick 192.168.0.80:/var/bricks/4/brick 49156 0 Y 27433 Brick 192.168.0.40:/var/bricks/arb_0/brick 49152 0 Y 3561115 Brick 192.168.0.41:/var/bricks/2/brick 49156 0 Y 902602 Brick 192.168.0.80:/var/bricks/5/brick 49157 0 Y 29522 Brick 192.168.0.40:/var/bricks/arb_1/brick 49154 0 Y 3561159 Self-heal Daemon on localhost N/A N/A Y 26199 Self-heal Daemon on 192.168.0.41 N/A N/A Y 2240635 Self-heal Daemon on 192.168.0.40 N/A N/A Y 3912810 Task Status of Volume gv0 -- There are no active volume tasks volume heal info summary: Brick 192.168.0.40:/var/bricks/0/brick <--- contains 100177 files in 25015 dirs Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/0/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/arb_0/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.80:/var/bricks/2/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.41:/var/bricks/arb_1/brick Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick 192.168.0.40:/var/bricks/1/brick Status: Connected Total Number of entries: 1006 Number of entries in heal pending: 1006 Number of
[Gluster-users] Conv distr-repl 2 to repl 3 arb 1 now 2 of 6 arb bricks won't get healed
I am seeking help here after looking for solutions on the web for my distributed-replicated volume. My volume is operated since v3.10 and I upgraded through to 7.9, replaced nodes, replaced bricks without a problem. I love it. Finally I wanted to extend my 6x2 distributed replicated volume with arbiters for better split-brain protection. So I add-brick with replication 3 arbiter 1 (as I had a 6x2 I obviously added 6 arb bricks) and it successfully converted to 6 x (2 +1) and self-heal immideately started. Looking good. Version: 7.9 Number of Bricks: 6 x (2 + 1) = 18 cluster.max-op-version: 70200 Peers: 3 (node[0..2]) Layout |node0 |node1 |node2 |brick0 |brick0 |arbit0 |arbit1 |brick1 |brick1 I then recognized that arbiter volumes on node0 & node1 have been healed successfully. Unfortunately all arbiter volumes on node2 have not been healed! I realized that the main dir on my arb mount point has been added (mount point /var/brick/arb_0 now contains dir "brick") however this dir on _all_ other bricks has numeric ID 33, but on this on it has 0). The brick dir on the faulty arb-volumes does contain ".glusterfs", however it has only very few entries. Other than that "brick" is empty. At that point I changed brick dir owner with chown to 33:33 and hoped for self-heal to work. It did not. I hoped a rebalance fix-layout would fix things. It did not. I hoped for a glusterd restart on node2 (as this is happening to both arb volumes on this node exclusively) would help. It did not. Active mount points via nfs-ganesha or fuse continue to work. Existing clients cause errors in the arb-brick logs on node2 for missing files or dirs, but clients seem not affected. r/w operations work. New clients are not able to fuse mount the volume for "authentication error". heal statistics heal-count show several hundred files need healing, this count is rising. Watching df on the arb-brick mount point on node2 shows every now and then a few bytes written, but then removed immideately after that. Any help/recommendation for you highly appreciated. Thank you! A. Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users