Re: [Gluster-users] Healing Delays
On 2/10/2016 12:48 AM, Lindsay Mathieson wrote: Only the heal count does not change, it just does not seem to start. It can take hours before it shifts, but once it does, its quite rapid. Node 1 has restarted and the heal count has been static at 511 shards for 45 minutes now. Nodes 1 & 2 have low CPU load, node 3 has glusterfsd pegged at 800% CPU. Ok, had a try at systematically reproducing it this morning and was actually unable to do so - quite weird. Testing was the same as last night - move all the VM's off a server and reboot it, wait for the healing to finish. This time I tried it with various different settings. Test 1 -- cluster.granular-entry-heal no cluster.locking-scheme full Shards / Min: 350 / 8 Test 2 -- cluster.granular-entry-heal yes cluster.locking-scheme granular Shards / Min: 391 / 10 Test 3 -- cluster.granular-entry-heal yes cluster.locking-scheme granular heal command issued Shards / Min: 358 / 11 Test 3 -- cluster.granular-entry-heal yes cluster.locking-scheme granular heal full command issued Shards / Min: 358 / 27 Best results were with cluster.granular-entry-heal=yes, cluster.locking-scheme=granular but they were all quite good. Don't know why it was so much worse last night - i/o load, cpu and memory were the same. However one thin that is different which I can't easily reproduce was that the cluster had been running for several weeks, but last night I rebooted all nodes. Could gluster be developing an issue after running for some time? -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Healing Delays
On 2/10/2016 12:48 AM, Lindsay Mathieson wrote: 511 shards for 45 minutes At (roughly) the one hour mark it started ticking over, heal completed at 1.5 hours. -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Healing Delays
Any errors/warnings in the glustershd logs? -Krutika On Sat, Oct 1, 2016 at 8:18 PM, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > This was raised earlier but I don't believe it was ever resolved and it is > becoming a serious issue for me. > > > I'm doing rolling upgrades on our three node cluster (Replica 3, Sharded, > VM Workload). > > > I update one node, reboot it, wait for healing to complete, do the next > one. > > > Only the heal count does not change, it just does not seem to start. It > can take hours before it shifts, but once it does, its quite rapid. Node 1 > has restarted and the heal count has been static at 511 shards for 45 > minutes now. Nodes 1 & 2 have low CPU load, node 3 has glusterfsd pegged at > 800% CPU. > > > This was *not* the case in earlier versions of gluster (3.7.11 I think), > healing would start almost right away. I think it started doing this when > the afr locking improvements where made. > > > I have experimented with full & diff heal modes, doesn't make any > difference. > > Current: > > Gluster Version 4.8.4 > > Volume Name: datastore4 > Type: Replicate > Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4 > Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4 > Brick3: vna.proxmox.softlog:/tank/vmdata/datastore4 > Options Reconfigured: > cluster.self-heal-window-size: 1024 > cluster.locking-scheme: granular > cluster.granular-entry-heal: on > performance.readdir-ahead: on > cluster.data-self-heal: on > features.shard: on > cluster.quorum-type: auto > cluster.server-quorum-type: server > nfs.disable: on > nfs.addr-namelookup: off > nfs.enable-ino32: off > performance.strict-write-ordering: off > performance.stat-prefetch: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > cluster.eager-lock: enable > network.remote-dio: enable > features.shard-block-size: 64MB > cluster.background-self-heal-count: 16 > > > Thanks, > > > > > > -- > Lindsay Mathieson > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Healing Delays
This was raised earlier but I don't believe it was ever resolved and it is becoming a serious issue for me. I'm doing rolling upgrades on our three node cluster (Replica 3, Sharded, VM Workload). I update one node, reboot it, wait for healing to complete, do the next one. Only the heal count does not change, it just does not seem to start. It can take hours before it shifts, but once it does, its quite rapid. Node 1 has restarted and the heal count has been static at 511 shards for 45 minutes now. Nodes 1 & 2 have low CPU load, node 3 has glusterfsd pegged at 800% CPU. This was *not* the case in earlier versions of gluster (3.7.11 I think), healing would start almost right away. I think it started doing this when the afr locking improvements where made. I have experimented with full & diff heal modes, doesn't make any difference. Current: Gluster Version 4.8.4 Volume Name: datastore4 Type: Replicate Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4 Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4 Brick3: vna.proxmox.softlog:/tank/vmdata/datastore4 Options Reconfigured: cluster.self-heal-window-size: 1024 cluster.locking-scheme: granular cluster.granular-entry-heal: on performance.readdir-ahead: on cluster.data-self-heal: on features.shard: on cluster.quorum-type: auto cluster.server-quorum-type: server nfs.disable: on nfs.addr-namelookup: off nfs.enable-ino32: off performance.strict-write-ordering: off performance.stat-prefetch: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off cluster.eager-lock: enable network.remote-dio: enable features.shard-block-size: 64MB cluster.background-self-heal-count: 16 Thanks, -- Lindsay Mathieson ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Gluster Community Newsletter, September 2016
Important happenings for Gluster this month: GlusterFS-3.9rc1 is out for testing! Gluster 3.8.4 is released, users are advised to update: http://blog.gluster.org/2016/09/glusterfs-3-8-4-is-available-gluster-users-are-advised-to-update/ Gluster Developer Summit: Next week, Gluster Developers Summit happens in Berlin from October 6 through 7th. Our schedule: https://www.gluster.org/events/schedule2016/ We will be recording scheduled talks, and posting them to our YouTube channel! Gluster-users: glusterfs-3.9rc1 has been released for testing Amudhan P tests the 3.8.3 Bitrot signature process http://www.gluster.org/pipermail/gluster-users/2016-September/028354.html Pranith calls for volunteers for a FreeBSD port of Gluster: http://www.gluster.org/pipermail/gluster-users/2016-September/028295.html Pranith calls for volunteers for a Mac-OSX port of Gluster: http://www.gluster.org/pipermail/gluster-users/2016-September/028296.html Gandalf asks about Production cluster planning: http://www.gluster.org/pipermail/gluster-users/2016-September/028371.html Kaushal posts an update on Gluster D-2.0: http://www.gluster.org/pipermail/gluster-users/2016-September/028420.html Pranith queries what application workloads are too slow in gluster: http://www.gluster.org/pipermail/gluster-users/2016-September/028448.html Sahina Bose updates on an integrated solution for disaster recovery with oVirt and Gluster: http://www.gluster.org/pipermail/gluster-users/2016-September/028499.html Kaleb updates the Community Gluster Package Matrix: http://www.gluster.org/pipermail/gluster-users/2016-September/028511.html Gluster-devel: Kelviw asks about FUSE kernel cache about dentry and inode: http://www.gluster.org/pipermail/gluster-devel/2016-September/050722.html Mohit Agrawal asks about xattr healing in DHT- http://www.gluster.org/pipermail/gluster-devel/2016-September/050825.html Rajesh provides a POC for documentation in ASCIIdoc: http://www.gluster.org/pipermail/gluster-devel/2016-September/050995.html Jeff Darcy starts a conversation about libunwind: http://www.gluster.org/pipermail/gluster-devel/2016-September/050839.html Pranith updates on the status of block and object storage on gluster: http://www.gluster.org/pipermail/gluster-devel/2016-September/050848.html Nithya discusses debugging on FreeBSD: http://www.gluster.org/pipermail/gluster-devel/2016-September/050852.html Luis Pabon follows up with block store related API design: http://www.gluster.org/pipermail/gluster-devel/2016-September/050875.html Luis Pabon brings up the Kubernetes Dynamic Provisioner for Gluster: http://www.gluster.org/pipermail/gluster-devel/2016-September/050879.html Avra Sengupta follows up with questions on merging zfs snapshot support into GlusterFS: http://www.gluster.org/pipermail/gluster-devel/2016-September/050882.html Poornima asks for reviews of 3.9 patches: http://www.gluster.org/pipermail/gluster-devel/2016-September/050917.html Jeff Darcy has good news, bad news and a plea for help on Multiplexing: http://www.gluster.org/pipermail/gluster-devel/2016-September/050928.html Mrugesh Karnik introduces Tendrl: http://www.gluster.org/pipermail/gluster-devel/2016-September/050945.html Gluster-infra: Nigel Babu proposes Zuul: http://www.gluster.org/pipermail/gluster-infra/2016-September/002727.html Nigel Babu reviews August's Gluster Infra Updates: http://www.gluster.org/pipermail/gluster-infra/2016-September/002781.html Michael Scherer proposes switching build.gluster.org to ansible: http://www.gluster.org/pipermail/gluster-infra/2016-September/002778.html Nigel Babu posts of a migration for formicary.gluster.org: http://www.gluster.org/pipermail/gluster-infra/2016-September/002829.html Michael Scherer posts of completed salt migration to ansible: http://www.gluster.org/pipermail/gluster-infra/2016-September/002867.html Top 5 contributors: Niels de Vos, Kaleb Keithley, Nigel Babu, Atin Mukherjee, Aravinda VK, Upcoming CFPs: FOSDEM: https://fosdem.org/2017/news/2016-07-20-call-for-participation/ - February 4-5, 2017 DevConf - http://www.devconf.cz/ - Jan 27-29 -- Amye Scavarda | a...@redhat.com | Gluster Community Lead ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Some files are not healed (but not in split-brain), manual fix with setfattr does not work
Hi all, I noticed that I have two files which are not healed: root@giant5:~# gluster volume heal gv0 info Gathering Heal info on volume gv0 has been successful Brick giant1:/gluster/sdc/gv0 Number of entries: 1 /holicki/lqcd/slurm-7251.out Brick giant2:/gluster/sdc/gv0 Number of entries: 1 /holicki/lqcd/slurm-7251.out Brick giant3:/gluster/sdc/gv0 Number of entries: 1 /holicki/lqcd/slurm-7249.out Brick giant4:/gluster/sdc/gv0 Number of entries: 1 /holicki/lqcd/slurm-7249.out Brick giant5:/gluster/sdc/gv0 Number of entries: 1 Brick giant6:/gluster/sdc/gv0 Number of entries: 1 Brick giant1:/gluster/sdd/gv0 Number of entries: 1 /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out Brick giant2:/gluster/sdd/gv0 Number of entries: 1 /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out Brick giant3:/gluster/sdd/gv0 Number of entries: 0 Brick giant4:/gluster/sdd/gv0 Number of entries: 0 Brick giant5:/gluster/sdd/gv0 Number of entries: 0 Brick giant6:/gluster/sdd/gv0 Number of entries: 0 (Disregard the file "slurm-7251.out", this is/was IO in progress.) The logs are filled with entries like this: [2016-09-30 12:45:26.611375] I [afr-self-heal-data.c:655:afr_sh_data_fix] 0-gv0-replicate-3: no active sinks for performing self-heal on file /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out [2016-09-30 12:45:36.874802] I [afr-self-heal-data.c:655:afr_sh_data_fix] 0-gv0-replicate-3: no active sinks for performing self-heal on file /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out [2016-09-30 12:45:53.701884] I [afr-self-heal-data.c:655:afr_sh_data_fix] 0-gv0-replicate-3: no active sinks for performing self-heal on file /jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out I checked with md5sum that both files are identical. Then, I used setfattr as proposed in an older thread in this mailing list: setfattr -n trusted.afr.gv0-client-7 -v 0x /gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out I did this on both nodes for both clients, so it now looks like this (on both nodes/bricks): getfattr -d -m . -e hex /gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out getfattr: Removing leading '/' from absolute path names # file: gluster/sdd/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu1.000/slurm-5660.out trusted.afr.gv0-client-6=0x trusted.afr.gv0-client-7=0x trusted.gfid=0xcb7978fa42e74a0b97928a87126338ac I triggered heal, but the files do not disappear from heal info. But also, they are not listed in split-brain oder heal-failed. I used gfid-resolver.sh for the other file: e9793d5e-7174-49b0-9fa9-90f8c35948e7== File: /gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out This file is also marked as dirty: root@giant5:/var/log/glusterfs# getfattr -d -m . -e hex /gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out getfattr: Removing leading '/' from absolute path names # file: gluster/sdc/gv0/jwilhelm/CUDA/Ns16_Nt32_m0.0100_beta1.7000_lambda0.0050/mu0.800/slurm-5663.out trusted.afr.gv0-client-4=0x0001 trusted.afr.gv0-client-5=0x0001 trusted.gfid=0xe9793d5e717449b09fa990f8c35948e7 How can I fix this, i.e. get the files healed? I'm using gluster 3.4.2 on Ubuntu 14.04.3. I also thought about scheduling a downtime and upgrading gluster, but I don't know if I can do this as long as there are files to be healed. Thanks for any advice. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users