[Gluster-users] Monitorig gluster 3.6.1
Hi, I have monitoring gluster with scripts that lunch scripts. All scripts are redirected to a one script that check if is active any process glusterd and if the repsonse its false, the script lunch the check. All checks are: - gluster volume volname info - gluster volume heal volname info - gluster volume heal volname split-brain - gluster volume volname status detail - gluster volume volname statistics Since I enable the monitoring in our pre-production gluster, the gluster is down 2 times. We suspect that the monitoring are overloading but should not. The question is, there any way to check those states otherwise? Thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Client load high (300) using fuse mount
Hi! I am trying to set up a Wordpress cluster using GlusterFS used for storage. Web nodes will access the same Wordpress install on a volume mounted via FUSE from a 3 peer GlusterFS TSP. I started with one web node and Wordpress on local storage. The load average was constantly about 5. iotop showed about 300kB/s disk reads or less. The load average was below 6. When I mounted the GlusterFS volume to the web node the 1min load average went over 300. Each of the 3 peers is transmitting about 10MB/s to my web node regardless of the load. TSP peers are on 10Gbit NICs and the web node is on a 1Gbit NIC. I'm out of ideas here... Could it be the network? What should I look at for optimizing the network stack on the client? Options set on TSP: Options Reconfigured: performance.cache-size: 4GB network.ping-timeout: 15 cluster.quorum-type: auto network.remote-dio: on cluster.eager-lock: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.cache-refresh-timeout: 4 performance.io-thread-count: 32 nfs.disable: on Regards, Mitja -- -- Mitja Mihelič ARNES, Tehnološki park 18, p.p. 7, SI-1001 Ljubljana, Slovenia tel: +386 1 479 8877, fax: +386 1 479 88 78 ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 86, Issue 1 - Message 5: client load high using FUSE mount
- Original Message - From: gluster-users-requ...@gluster.org To: gluster-users@gluster.org Sent: Monday, June 1, 2015 8:00:01 AM Subject: Gluster-users Digest, Vol 86, Issue 1 Message: 5 Date: Mon, 01 Jun 2015 13:11:13 +0200 From: Mitja Miheli? mitja.mihe...@arnes.si To: gluster-users@gluster.org Subject: [Gluster-users] Client load high (300) using fuse mount Message-ID: 556c3dd1.1080...@arnes.si Content-Type: text/plain; charset=utf-8; format=flowed Hi! I am trying to set up a Wordpress cluster using GlusterFS used for storage. Web nodes will access the same Wordpress install on a volume mounted via FUSE from a 3 peer GlusterFS TSP. I started with one web node and Wordpress on local storage. The load average was constantly about 5. iotop showed about 300kB/s disk reads or less. The load average was below 6. When I mounted the GlusterFS volume to the web node the 1min load average went over 300. Each of the 3 peers is transmitting about 10MB/s to my web node regardless of the load. TSP peers are on 10Gbit NICs and the web node is on a 1Gbit NIC. 30 MB/s is about 1/3 line speed for a 1-Gbps NIC port. Sounds like network latency and lack of client-side caching might be your bottleneck, might want to put a 10-Gbps NIC port on your client. You did disable client-side caching (md-cache and io-cache translators) below, was that your intent? Also, defaults for these translators are very conservative, if only 1 client you may want to increase time that data is cached (in the client) using FUSE mount options entry-timeout=30 and attribute-timeout=30. Unlike non-distributed Linux filesystems, Gluster is very conservative about client side caching to avoid cache coherency issues. I'm out of ideas here... Could it be the network? What should I look at for optimizing the network stack on the client? Options set on TSP: Options Reconfigured: performance.cache-size: 4GB network.ping-timeout: 15 cluster.quorum-type: auto network.remote-dio: on cluster.eager-lock: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.cache-refresh-timeout: 4 performance.io-thread-count: 32 nfs.disable: on Too many tunings, what are these intended to do? The gluster volume reset command allows you to undo this. in Gluster 3.7, the gluster volume get your-volume all command lets you see what the defaults are. Regards, Mitja -- -- Mitja Miheli? ARNES, Tehnolo?ki park 18, p.p. 7, SI-1001 Ljubljana, Slovenia tel: +386 1 479 8877, fax: +386 1 479 88 78 ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Features - Object Count
- Original Message - From: M S Vishwanath Bhat msvb...@gmail.com To: aasenov1989 aasenov1...@gmail.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Monday, June 1, 2015 3:02:08 PM Subject: Re: [Gluster-users] Features - Object Count On 29 May 2015 at 18:11, aasenov1989 aasenov1...@gmail.com wrote: Hi, So is there a way to find how many files I have on each brick of the volume? I don't think gluster provides a way to exactly get the number of files in a brick or volume. Sorry if my solution is very obvious. But I generally use find to get the number of files in a particular brick. find /brick/path ! -path /brick/path/.glusterfs* | wc -l Hi, You can also do getfattr -d -m . -e hex brick_path This command is to get the extended attributes of a directory. When you issue this command after enabling quota then you can see an extended attribute with name trusted.glusterfs.quota.size That basically holds the size, file count and directory count. The extended attribute consists of 48 hexadecimal numbers. First 16 will give you the size, next 16 the file count and last 16 the directory count. Hope this helps. Thanks, Sachin Pandit. Best Regards, Vishwanath Regards, Asen Asenov On Fri, May 29, 2015 at 3:33 PM, Atin Mukherjee atin.mukherje...@gmail.com wrote: Sent from Samsung Galaxy S4 On 29 May 2015 17:59, aasenov1989 aasenov1...@gmail.com wrote: Hi, Thnaks for the help. I was able to retrieve number of objects for entire volume. But I didn't figure out how to set quota for particular brick. I have replicated volume with 2 bricks on 2 nodes: Bricks: Brick1: host1:/dataDir Brick2: host2:/dataDir Both bricks are up and files are replicated. But when I try to set quota on a particular brick: IIUC, You won't be able to set quota at brick level as multiple bricks comprise a volume which is exposed to the user. Quota team can correct me if I am wrong. gluster volume quota TestVolume limit-objects /dataDir/ 9223372036854775807 quota command failed : Failed to get trusted.gfid attribute on path /dataDir/. Reason : No such file or directory please enter the path relative to the volume What should be the path to brick directories relative to the volume? Regards, Asen Asenov On Fri, May 29, 2015 at 12:35 PM, Sachin Pandit span...@redhat.com wrote: - Original Message - From: aasenov1989 aasenov1...@gmail.com To: Humble Devassy Chirammal humble.deva...@gmail.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, May 29, 2015 12:22:43 AM Subject: Re: [Gluster-users] Features - Object Count Thanks Humble, But as far as I understand the object count is connected with the quotas set per folders. What I want is to get number of files I have in entire volume - even when volume is distributed across multiple computers. I think the purpose of this feature: http://gluster.readthedocs.org/en/latest/Feature%20Planning/GlusterFS%203.7/Object%20Count/ Hi, You are absolutely correct. You can retrieve number of files in the entire volume if you have the limit-objects set on the root. If limit-objects is set on the directory present in a mount point then it will only show the number of files and directories of that particular directory. In your case, if you want to retrieve number of files and directories present in the entire volume then you might have to set the object limit on the root. Thanks, Sachin Pandit. is to provide such functionality. Am I right or there is no way to retrieve number of files for entire volume? Regards, Asen Asenov On Thu, May 28, 2015 at 8:09 PM, Humble Devassy Chirammal humble.deva...@gmail.com wrote: Hi Asen, https://gluster.readthedocs.org/en/latest/Features/quota-object-count/ , hope this helps. --Humble On Thu, May 28, 2015 at 8:38 PM, aasenov1989 aasenov1...@gmail.com wrote: Hi, I wanted to ask how to use this feature in gluster 3.7.0, as I was unable to find anything. How can I retrieve number of objects in volume and number of objects in particular brick? Thanks in advance. Regards, Asen Asenov ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing
Re: [Gluster-users] Gluster 3.6.1
On 05/29/2015 01:29 PM, Félix de Lelelis wrote: Hi, I have a cluster with 3 nodes on pre-production. Yesterday, one node was down. The errror that I have seen is that: [2015-05-28 19:04:27.305560] E [glusterd-syncop.c:1578:gd_sync_task_begin] 0-management: Unable to acquire lock for cfe-gv1 The message I [MSGID: 106006] [glusterd-handler.c:4257:__glusterd_nodesvc_rpc_notify] 0-management: nfs has disconnected from glusterd. repeated 5 times between [2015-05-28 19:04:09.346088] and [2015-05-28 19:04:24.349191] pending frames: frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-05-28 19:04:27 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.6.1 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fd86e2f1232] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7fd86e30871d] /usr/lib64/libc.so.6(+0x35640)[0x7fd86d30c640] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_remove_pending_entry+0x2c)[0x7fd85f52450c] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(+0x5ae28)[0x7fd85f511e28] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_op_sm+0x237)[0x7fd85f50f027] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(__glusterd_brick_op_cbk+0x2fe)[0x7fd85f53be5e] /usr/lib64/glusterfs/3.6.1/xlator/mgmt/glusterd.so(glusterd_big_locked_cbk+0x4c)[0x7fd85f53d48c] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fd86e0c50b0] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x171)[0x7fd86e0c5321] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fd86e0c1273] /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0x8530)[0x7fd85d17d530] /usr/lib64/glusterfs/3.6.1/rpc-transport/socket.so(+0xace4)[0x7fd85d17fce4] /usr/lib64/libglusterfs.so.0(+0x76322)[0x7fd86e346322] /usr/sbin/glusterd(main+0x502)[0x7fd86e79afb2] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd86d2f8af5] /usr/sbin/glusterd(+0x6351)[0x7fd86e79b351] - That is a problem with software? is a bug ? The problem what I see here is concurrent volume status transactions were run at a given point of time (From the cmd log history in BZ 1226254). 3.6.1 has some fixes missing to take care of these issues identified on the same line. If you upgrade your cluster to 3.6.3 problem will go away. However 3.6.3 still misses one more fix http://review.gluster.org/#/c/10023/ which will be released in 3.6.4. I would request you to upgrade your cluster to 3.6.3 if not 3.7. Thanks. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -- ~Atin ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Features - Object Count
Hi, Thanks for the reply. I understand what you're saying, but then can you give me an idea how to solve my problem. My setup is as follows: I have 1 volume comprised of 2 nodes, each node having 2 bricks(total 4 bricks). The volume is replicated in such a way that each brick from node1 replicate files to corresponding brick in node2. When I upload, lets say 1 million files to node1, I want to find out when those files are replicated to the second node. My idea was to set inode quota on a volume (to check number of files in the volume) and on a brick(to check number of files in a brick) and to verify that total number of files in bricks in each node is equal to number of files in entire volume. This way I can be sure that the files are replicated correctly. But, as far as I understand, what I can do is to set quota on entire volume to keep track of total number of files. Then to check each brick and count number of files in that brick. But I have to perform this operation on both nodes, as I can't check number of files in brick that is not in local node. Also it's not really efficient every time to count millions of files So is there a way to perform such a check? Thanks in advance. Regards, Asen Asenov On Mon, Jun 1, 2015 at 7:31 AM, Sachin Pandit span...@redhat.com wrote: - Original Message - From: aasenov1989 aasenov1...@gmail.com To: Atin Mukherjee atin.mukherje...@gmail.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org, Sachin Pandit span...@redhat.com Sent: Friday, May 29, 2015 6:11:36 PM Subject: Re: [Gluster-users] Features - Object Count Hi, So is there a way to find how many files I have on each brick of the volume? Hi, Quota limit can only be set on the volume level and directories which is present in it. We don't have a gluster command which lists out the number of files present in a brick, as linux commands can take care of that very well. Thanks, Sachin Pandit. Regards, Asen Asenov On Fri, May 29, 2015 at 3:33 PM, Atin Mukherjee atin.mukherje...@gmail.com wrote: Sent from Samsung Galaxy S4 On 29 May 2015 17:59, aasenov1989 aasenov1...@gmail.com wrote: Hi, Thnaks for the help. I was able to retrieve number of objects for entire volume. But I didn't figure out how to set quota for particular brick. I have replicated volume with 2 bricks on 2 nodes: Bricks: Brick1: host1:/dataDir Brick2: host2:/dataDir Both bricks are up and files are replicated. But when I try to set quota on a particular brick: IIUC, You won't be able to set quota at brick level as multiple bricks comprise a volume which is exposed to the user. Quota team can correct me if I am wrong. gluster volume quota TestVolume limit-objects /dataDir/ 9223372036854775807 quota command failed : Failed to get trusted.gfid attribute on path /dataDir/. Reason : No such file or directory please enter the path relative to the volume What should be the path to brick directories relative to the volume? Regards, Asen Asenov On Fri, May 29, 2015 at 12:35 PM, Sachin Pandit span...@redhat.com wrote: - Original Message - From: aasenov1989 aasenov1...@gmail.com To: Humble Devassy Chirammal humble.deva...@gmail.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, May 29, 2015 12:22:43 AM Subject: Re: [Gluster-users] Features - Object Count Thanks Humble, But as far as I understand the object count is connected with the quotas set per folders. What I want is to get number of files I have in entire volume - even when volume is distributed across multiple computers. I think the purpose of this feature: http://gluster.readthedocs.org/en/latest/Feature%20Planning/GlusterFS%203.7/Object%20Count/ Hi, You are absolutely correct. You can retrieve number of files in the entire volume if you have the limit-objects set on the root. If limit-objects is set on the directory present in a mount point then it will only show the number of files and directories of that particular directory. In your case, if you want to retrieve number of files and directories present in the entire volume then you might have to set the object limit on the root. Thanks, Sachin Pandit. is to provide such functionality. Am I right or there is no way to retrieve number of files for entire volume? Regards, Asen Asenov On Thu, May 28, 2015 at 8:09 PM, Humble Devassy Chirammal humble.deva...@gmail.com wrote: Hi Asen, https://gluster.readthedocs.org/en/latest/Features/quota-object-count/ , hope this helps. --Humble On Thu, May 28, 2015 at 8:38 PM, aasenov1989 aasenov1...@gmail.com wrote:
Re: [Gluster-users] Features - Object Count
On 29 May 2015 at 18:11, aasenov1989 aasenov1...@gmail.com wrote: Hi, So is there a way to find how many files I have on each brick of the volume? I don't think gluster provides a way to exactly get the number of files in a brick or volume. Sorry if my solution is very obvious. But I generally use find to get the number of files in a particular brick. find /brick/path ! -path /brick/path/.glusterfs* | wc -l Best Regards, Vishwanath Regards, Asen Asenov On Fri, May 29, 2015 at 3:33 PM, Atin Mukherjee atin.mukherje...@gmail.com wrote: Sent from Samsung Galaxy S4 On 29 May 2015 17:59, aasenov1989 aasenov1...@gmail.com wrote: Hi, Thnaks for the help. I was able to retrieve number of objects for entire volume. But I didn't figure out how to set quota for particular brick. I have replicated volume with 2 bricks on 2 nodes: Bricks: Brick1: host1:/dataDir Brick2: host2:/dataDir Both bricks are up and files are replicated. But when I try to set quota on a particular brick: IIUC, You won't be able to set quota at brick level as multiple bricks comprise a volume which is exposed to the user. Quota team can correct me if I am wrong. gluster volume quota TestVolume limit-objects /dataDir/ 9223372036854775807 quota command failed : Failed to get trusted.gfid attribute on path /dataDir/. Reason : No such file or directory please enter the path relative to the volume What should be the path to brick directories relative to the volume? Regards, Asen Asenov On Fri, May 29, 2015 at 12:35 PM, Sachin Pandit span...@redhat.com wrote: - Original Message - From: aasenov1989 aasenov1...@gmail.com To: Humble Devassy Chirammal humble.deva...@gmail.com Cc: Gluster-users@gluster.org List gluster-users@gluster.org Sent: Friday, May 29, 2015 12:22:43 AM Subject: Re: [Gluster-users] Features - Object Count Thanks Humble, But as far as I understand the object count is connected with the quotas set per folders. What I want is to get number of files I have in entire volume - even when volume is distributed across multiple computers. I think the purpose of this feature: http://gluster.readthedocs.org/en/latest/Feature%20Planning/GlusterFS%203.7/Object%20Count/ Hi, You are absolutely correct. You can retrieve number of files in the entire volume if you have the limit-objects set on the root. If limit-objects is set on the directory present in a mount point then it will only show the number of files and directories of that particular directory. In your case, if you want to retrieve number of files and directories present in the entire volume then you might have to set the object limit on the root. Thanks, Sachin Pandit. is to provide such functionality. Am I right or there is no way to retrieve number of files for entire volume? Regards, Asen Asenov On Thu, May 28, 2015 at 8:09 PM, Humble Devassy Chirammal humble.deva...@gmail.com wrote: Hi Asen, https://gluster.readthedocs.org/en/latest/Features/quota-object-count/ , hope this helps. --Humble On Thu, May 28, 2015 at 8:38 PM, aasenov1989 aasenov1...@gmail.com wrote: Hi, I wanted to ask how to use this feature in gluster 3.7.0, as I was unable to find anything. How can I retrieve number of objects in volume and number of objects in particular brick? Thanks in advance. Regards, Asen Asenov ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] 3.6.3 split brain on web browser cache dir w. replica 3 volume
I have a replica 3 volume I am using to serve my home directory. I have notices a couple of split-brains recently on files used by browsers(for the most recent see below, I had an earlier one on .config/google-chrome/Default/Session Storage/) . When I was running replica 2 I don't recall seeing more than two entries of the form: trusted.afr.volname.client-?. I did have two other servers that I have removed from service recently but I am curious to know if there is some way to map what the server reports as trusted.afr.volname-client-? to a hostname? Thanks, Alastair # gluster volume heal homes info Brick gluster-2:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster1:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster0:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 # getfattr -d -m . -e hex /export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair getfattr: Removing leading '/' from absolute path names # file: export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.dirty=0x trusted.afr.homes-client-0=0x trusted.afr.homes-client-1=0x trusted.afr.homes-client-2=0x trusted.afr.homes-client-3=0x0002 trusted.afr.homes-client-4=0x trusted.gfid=0x3ae398227cea4f208d7652dbfb93e3e5 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.edf41dc8-2122-4aa3-bc20-29225564ca8c.contri=0x162d2200 trusted.glusterfs.quota.size=0x162d2200 ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync
Some news, Looks like changelog is not working anymore. When I touch a file in master it doesnt propagate to slave… .processing folder contain a thousand of changelog not processed. I had to stop the geo-rep, reset changelog.changelog to the volume and restart the geo-rep. It’s now sending missing files using hybrid crawl. So geo-repo is not working as expected. Another thing, we use symlink to point to latest release build, and it seems that symlinks are not synced when they change from master to slave. Any idea on how I can debug this ? -- Cyril Peponnet On May 29, 2015, at 3:01 AM, Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.com wrote: Yes, geo-rep internally uses fuse mount. I will explore further and get back to you if there is a way. Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.com Cc: gluster-users gluster-users@gluster.orgmailto:gluster-users@gluster.org Sent: Thursday, May 28, 2015 10:12:57 PM Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync One more thing: nfs.volume-access read-only works only for nfs clients, glusterfs client have still write access features.read-only on need a vol restart and set RO for everyone but in this case, geo-rep goes faulty. [2015-05-28 09:42:27.917897] E [repce(/export/raid/usr_global):188:__call__] RepceClient: call 8739:139858642609920:1432831347.73 (keep_alive) failed on peer with OSError [2015-05-28 09:42:27.918102] E [syncdutils(/export/raid/usr_global):240:log_raise_exception] top: FAIL: Traceback (most recent call last): File /usr/libexec/glusterfs/python/syncdaemon/syncdutils.py, line 266, in twrap tf(*aa) File /usr/libexec/glusterfs/python/syncdaemon/master.py, line 391, in keep_alive cls.slave.server.keep_alive(vi) File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 204, in __call__ return self.ins(self.meth, *a) File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 189, in __call__ raise res OSError: [Errno 30] Read- So there is no proper way to protect the salve against write. -- Cyril Peponnet On May 28, 2015, at 8:54 AM, Cyril Peponnet cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com wrote: Hi Kotresh, Inline. Again, thank for you time. -- Cyril Peponnet On May 27, 2015, at 10:47 PM, Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com wrote: Hi Cyril, Replies inline. Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com Cc: gluster-users gluster-users@gluster.orgmailto:gluster-users@gluster.orgmailto:gluster-users@gluster.org Sent: Wednesday, May 27, 2015 9:28:00 PM Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync Hi and thanks again for those explanation. Due to lot of missing files and not up to date (with gfid mismatch some time), I reset the index (or I think I do) by: deleting the geo-reop, reset geo-replication.indexing (set it to off does not work for me), and recreate it again. Resetting index does not initiate geo-replication from the version changelog is introduced. It works only for the versions prior to it. NOTE 1: Recreation of geo-rep session will work only if slave doesn't contain file with mismatch gfids. If there are, slave should be cleaned up before recreating. I started it again to transfert missing files Ill take of gfid missmatch afterward. Our vol is almost 5TB and it took almost 2 month to crawl to the slave I did’nt want to start over :/ NOTE 2: Another method exists now to initiate a full sync. It also expects slave files should not be in gfid mismatch state (meaning, slave volume should not written by any other means other than geo-replication). The method is to reset stime on all the bricks of master. Following are the steps to trigger full sync!!!. Let me know if any comments/doubts. 1. Stop geo-replication 2. Remove stime extended attribute all the master brick root using following command. setfattr -x trusted.glusterfs.MASTER_VOL_UUID.SLAVE_VOL_UUID.stime brick-root NOTE: 1. If AFR is setup, do this for all replicated set 2. Above mentioned stime key can be got as follows: Using 'gluster volume info mastervol', get all brick paths
[Gluster-users] GlusterFS 3.7 - slow/poor performances
Dear all, I have a crash test cluster where i’ve tested the new version of GlusterFS (v3.7) before upgrading my HPC cluster in production. But… all my tests show me very very low performances. For my benches, as you can read below, I do some actions (untar, du, find, tar, rm) with linux kernel sources, dropping cache, each on distributed, replicated, distributed-replicated, single (single brick) volumes and the native FS of one brick. # time (echo 3 /proc/sys/vm/drop_caches; tar xJf ~/linux-4.1-rc5.tar.xz; sync; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; du -sh linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; find linux-4.1-rc5/|wc -l; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; tar czf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) # time (echo 3 /proc/sys/vm/drop_caches; rm -rf linux-4.1-rc5.tgz linux-4.1-rc5/; echo 3 /proc/sys/vm/drop_caches) And here are the process times: --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s |~47s | ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s |~57s | ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --- | native FS |~11s |~4s | ~2s |~56s | ~10s | --- I get the same results, whether with default configurations with custom configurations. if I look at the side of the ifstat command, I can note my IO write processes never exceed 3MBs... EXT4 native FS seems to be faster (roughly 15-20% but no more) than XFS one My [test] storage cluster config is composed by 2 identical servers (biCPU Intel Xeon X5355, 8GB of RAM, 2x2TB HDD (no-RAID) and Gb ethernet) My volume settings: single: 1server 1 brick replicated: 2 servers 1 brick each distributed: 2 servers 2 bricks each dist-repl: 2 bricks in the same server and replica 2 All seems to be OK in gluster status command line. Do you have an idea why I obtain so bad results? Thanks in advance. Geoffrey --- Geoffrey Letessier Responsable informatique ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.6.3 split brain on web browser cache dir w. replica 3 volume
On 06/01/2015 08:15 PM, Alastair Neil wrote: I have a replica 3 volume I am using to serve my home directory. I have notices a couple of split-brains recently on files used by browsers(for the most recent see below, I had an earlier one on .config/google-chrome/Default/Session Storage/) . When I was running replica 2 I don't recall seeing more than two entries of the form: trusted.afr.volname.client-?. I did have two other servers that I have removed from service recently but I am curious to know if there is some way to map what the server reports as trusted.afr.volname-client-? to a hostname? Your volfile (/var/lib/glusterd/vols/volname/trusted-volname.tcp-fuse.vol) should contain which brick (remote-subvolume + remote-host) a given trusted.afr* maps to. Hope that helps, Ravi Thanks, Alastair # gluster volume heal homes info Brick gluster-2:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster1:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 Brick gluster0:/export/brick2/home/ /a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair - Is in split-brain Number of entries: 1 # getfattr -d -m . -e hex /export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair getfattr: Removing leading '/' from absolute path names # file: export/brick2/home/a/n/aneil2/.cache/mozilla/firefox/xecgwc8s.Alastair security.selinux=0x73797374656d5f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.dirty=0x trusted.afr.homes-client-0=0x trusted.afr.homes-client-1=0x trusted.afr.homes-client-2=0x trusted.afr.homes-client-3=0x0002 trusted.afr.homes-client-4=0x trusted.gfid=0x3ae398227cea4f208d7652dbfb93e3e5 trusted.glusterfs.dht=0x0001 trusted.glusterfs.quota.dirty=0x3000 trusted.glusterfs.quota.edf41dc8-2122-4aa3-bc20-29225564ca8c.contri=0x162d2200 trusted.glusterfs.quota.size=0x162d2200 ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync
Hi Cyril, Could you please attach the geo-replication logs? Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.com Cc: gluster-users gluster-users@gluster.org Sent: Monday, June 1, 2015 10:34:42 PM Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync Some news, Looks like changelog is not working anymore. When I touch a file in master it doesnt propagate to slave… .processing folder contain a thousand of changelog not processed. I had to stop the geo-rep, reset changelog.changelog to the volume and restart the geo-rep. It’s now sending missing files using hybrid crawl. So geo-repo is not working as expected. Another thing, we use symlink to point to latest release build, and it seems that symlinks are not synced when they change from master to slave. Any idea on how I can debug this ? -- Cyril Peponnet On May 29, 2015, at 3:01 AM, Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.com wrote: Yes, geo-rep internally uses fuse mount. I will explore further and get back to you if there is a way. Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.com Cc: gluster-users gluster-users@gluster.orgmailto:gluster-users@gluster.org Sent: Thursday, May 28, 2015 10:12:57 PM Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync One more thing: nfs.volume-access read-only works only for nfs clients, glusterfs client have still write access features.read-only on need a vol restart and set RO for everyone but in this case, geo-rep goes faulty. [2015-05-28 09:42:27.917897] E [repce(/export/raid/usr_global):188:__call__] RepceClient: call 8739:139858642609920:1432831347.73 (keep_alive) failed on peer with OSError [2015-05-28 09:42:27.918102] E [syncdutils(/export/raid/usr_global):240:log_raise_exception] top: FAIL: Traceback (most recent call last): File /usr/libexec/glusterfs/python/syncdaemon/syncdutils.py, line 266, in twrap tf(*aa) File /usr/libexec/glusterfs/python/syncdaemon/master.py, line 391, in keep_alive cls.slave.server.keep_alive(vi) File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 204, in __call__ return self.ins(self.meth, *a) File /usr/libexec/glusterfs/python/syncdaemon/repce.py, line 189, in __call__ raise res OSError: [Errno 30] Read- So there is no proper way to protect the salve against write. -- Cyril Peponnet On May 28, 2015, at 8:54 AM, Cyril Peponnet cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com wrote: Hi Kotresh, Inline. Again, thank for you time. -- Cyril Peponnet On May 27, 2015, at 10:47 PM, Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com wrote: Hi Cyril, Replies inline. Thanks and Regards, Kotresh H R - Original Message - From: Cyril N PEPONNET (Cyril) cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.commailto:cyril.pepon...@alcatel-lucent.com To: Kotresh Hiremath Ravishankar khire...@redhat.commailto:khire...@redhat.commailto:khire...@redhat.com Cc: gluster-users gluster-users@gluster.orgmailto:gluster-users@gluster.orgmailto:gluster-users@gluster.org Sent: Wednesday, May 27, 2015 9:28:00 PM Subject: Re: [Gluster-users] Geo-Replication - Changelog socket is not present - Falling back to xsync Hi and thanks again for those explanation. Due to lot of missing files and not up to date (with gfid mismatch some time), I reset the index (or I think I do) by: deleting the geo-reop, reset geo-replication.indexing (set it to off does not work for me), and recreate it again. Resetting index does not initiate geo-replication from the version changelog is introduced. It works only for the versions prior to it. NOTE 1: Recreation of geo-rep session will work only if slave doesn't contain file with mismatch gfids. If there are, slave should be cleaned up before recreating. I started it again to transfert missing files Ill take of gfid missmatch afterward. Our vol is almost 5TB and it took almost 2 month to crawl to the slave I did’nt want to start over :/ NOTE 2: Another method exists now to initiate a full sync. It also expects slave files should not be in gfid mismatch state (meaning, slave volume should not written by any other means other than geo-replication). The method is to reset stime on all the bricks of master. Following are the steps to trigger full
Re: [Gluster-users] split brain on / just after installation
On 06/02/2015 09:10 AM, Carl L Hoffman wrote: Hello - I was wondering if someone could please help me. I've just setup Gluster 3.6 on two Ubuntu 14.04 hosts. Gluster is setup to replicate two volumes (prod-volume, dev-volume) between the two hosts. Replication is working fine. The glustershd.log shows: Are you sure you are running gluster 3.6? The 'afr_sh_print_split_brain_log' message appears only in gluster 3.5 or lower. [2015-06-02 03:28:04.495162] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-prod-volume-replicate-0: Unable to self-heal contents of 'gfid:----0001' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] and the prod-volume logs shows: [2015-06-02 02:54:28.286268] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-prod-volume-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2015-06-02 02:54:28.287476] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-prod-volume-replicate-0: background meta-data self-heal failed on / I've checked against https://github.com/gluster/glusterfs/blob/6c578c03f0d44913d264494de5df004544c96271/doc/features/heal-info-and-split-brain-resolution.md but I can't see any scenario that covers mine. The output of bluster volume heal prod-volume info is: Is the metadata same on both bricks on the root? (Compare `ls -ld /export/prodvol/brick` and `getfattr -d -m . -e hex /export/prodvol/brick` on both servers to see if anything is mismatching). -Ravi Gathering Heal info on volume prod-volume has been successful Brick server1:/export/prodvol/brick Number of entries: 1 / Brick server2 Number of entries: 1 / and doesn't show anything in split-brain. But the output of gluster volume heal prod-volume info split brain shows: Gathering Heal info on volume prod-volume has been successful Brick server1:/export/prodvol/brick Number of entries: 6 atpath on brick --- 2015-06-02 03:28:04 / 2015-06-02 03:18:04 / 2015-06-02 03:08:04 / 2015-06-02 02:58:04 / 2015-06-02 02:48:04 / 2015-06-02 02:48:04 / Brick server2:/export/prodvol/brick Number of entries: 5 atpath on brick --- 2015-06-02 03:28:00 / 2015-06-02 03:18:00 / 2015-06-02 03:08:00 / 2015-06-02 02:58:00 / 2015-06-02 02:48:04 / And the number continues to grow. The count on server2 is always one behind server1. Could someone please help? Cheers, ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster peer rejected and failed to start
On 06/02/2015 10:00 AM, vyyy杨雨阳 wrote: Hi We have a gluster (Version 3.6.3) cluster with 6 nodes, I tried to add 4 more nodes, but ‘Peer Rejected’, then I tried to resolve it by dump /var/lib/glusterd and probe again, not success, this is a question, But strange thing is: A node already in cluster also shown “Peer Reject” I tried to restart glusterd, It failed I found that /var/lib/glusterd/peers is empty, I copied the files from other nodes, still can’t start glusterd It seems like you are trying to peer probe nodes which are either either part of some other clusters (uncleaned nodes). Could you check whether the nodes which you are adding have empty /var/lib/glusterd? If not clean them and retry. ~Atin etc-glusterfs-glusterd.vol.log shown that cluster member as “unknown peer ” [2015-06-02 01:52:14.650635] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 04f22ee8-8e00-4c32-a924-b40a0e413aa6 [2015-06-02 01:52:14.650786] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 674a78b5-0590-48d4-8752-d4608832ed1d [2015-06-02 01:52:14.657881] C [glusterd-handler.c:2369:__glusterd_handle_friend_update] 0-: Received friend update request from unknown peer 83e1a9db-3134-45e4-acd2-387b12b5b207 [2015-06-02 01:52:17.747865] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 04f22ee8-8e00-4c32-a924-b40a0e413aa6 doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:17.747908] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-06-02 01:52:40.338885] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:40.338929] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully [2015-06-02 01:52:41.310451] W [glusterd-handler.c:697:__glusterd_handle_cluster_lock] 0-management: 674a78b5-0590-48d4-8752-d4608832ed1d doesn't belong to the cluster. Ignoring request. [2015-06-02 01:52:41.310486] E [rpcsvc.c:544:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully Debug info is as following, /usr/sbin/glusterd [root@SH02SVR5951 peers]# /usr/sbin/glusterd --debug [2015-06-02 04:09:24.626690] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.3 (args: /usr/sbin/glusterd --debug) [2015-06-02 04:09:24.626739] D [logging.c:1763:__gf_log_inject_timer_event] 0-logging-infra: Starting timer now. Timeout = 120, current buf size = 5 [2015-06-02 04:09:24.627052] D [MSGID: 0] [glusterfsd.c:613:get_volfp] 0-glusterfsd: loading volume file /etc/glusterfs/glusterd.vol [2015-06-02 04:09:24.629683] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2015-06-02 04:09:24.629706] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2015-06-02 04:09:24.629764] D [glusterd.c:391:glusterd_rpcsvc_options_build] 0-: listen-backlog value: 128 [2015-06-02 04:09:24.629895] D [rpcsvc.c:2198:rpcsvc_init] 0-rpc-service: RPC service inited. [2015-06-02 04:09:24.629904] D [rpcsvc.c:1801:rpcsvc_program_register] 0-rpc-service: New program registered: GF-DUMP, Num: 123451501, Ver: 1, Port: 0 [2015-06-02 04:09:24.629930] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so [2015-06-02 04:09:24.631989] D [socket.c:3807:socket_init] 0-socket.management: SSL support on the I/O path is NOT enabled [2015-06-02 04:09:24.632005] D [socket.c:3810:socket_init] 0-socket.management: SSL support for glusterd is NOT enabled [2015-06-02 04:09:24.632013] D [socket.c:3827:socket_init] 0-socket.management: using system polling thread [2015-06-02 04:09:24.632024] D [name.c:550:server_fill_address_family] 0-socket.management: option address-family not specified, defaulting to inet [2015-06-02 04:09:24.632072] D [rpc-transport.c:262:rpc_transport_load] 0-rpc-transport: attempt to load file /usr/lib64/glusterfs/3.6.3/rpc-transport/rdma.so [2015-06-02 04:09:24.632102] E [rpc-transport.c:266:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.6.3/rpc-transport/rdma.so: cannot open shared object file: No such file or directory [2015-06-02 04:09:24.632112] W [rpc-transport.c:270:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine [2015-06-02 04:09:24.632122] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-06-02 04:09:24.632132] D [rpcsvc.c:1801:rpcsvc_program_register] 0-rpc-service: New
Re: [Gluster-users] Gluster 3.7.0 released
On 06/01/2015 09:01 PM, Ted Miller wrote: On 5/27/2015 1:17 PM, Atin Mukherjee wrote: On 05/27/2015 07:33 PM, Ted Miller wrote: responses below Ted Miller On 5/26/2015 12:01 AM, Atin Mukherjee wrote: On 05/26/2015 03:12 AM, Ted Miller wrote: From: Niels de Vos nde...@redhat.com Sent: Monday, May 25, 2015 4:44 PM On Mon, May 25, 2015 at 06:49:26PM +, Ted Miller wrote: From: Humble Devassy Chirammal humble.deva...@gmail.com Sent: Monday, May 18, 2015 9:37 AM Hi All, GlusterFS 3.7.0 RPMs for RHEL, CentOS, Fedora and packages for Debian are available at download.gluster.orghttp://download.gluster.org [1]. [1] http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.0/ --Humble On Thu, May 14, 2015 at 2:49 PM, Vijay Bellur vbel...@redhat.commailto:vbel...@redhat.com wrote: Hi All, I am happy to announce that Gluster 3.7.0 is now generally available. 3.7.0 contains several [snip] Cheers, Vijay [snip] [snip] I have no idea about the problem below, it sounds like something the GlusterD developers could help with. Niels Command 'gluster volume status' on the C5 machine makes everything look fine: Status of volume: ISO2 Gluster process Port Online Pid -- Brick 10.x.x.2:/bricks/01/iso249162 Y 4679 Brick 10.x.x.4:/bricks/01/iso249183 Y 6447 Brick 10.x.x.9:/bricks/01/iso249169 Y 1985 But the same command on either of the C6 machines shows the C5 machine (10.x.x.2) missing in action (though it does recognize that there are NFS and heal daemons there): Status of volume: ISO2 Gluster process TCP Port RDMA Port Online Pid -- Brick 10.41.65.4:/bricks/01/iso249183 0 Y 6447 Brick 10.41.65.9:/bricks/01/iso249169 0 Y 1985 NFS Server on localhost 2049 0 Y 2279 Self-heal Daemon on localhost N/A N/A Y 2754 NFS Server on 10.41.65.22049 0 Y 4757 Self-heal Daemon on 10.41.65.2 N/A N/A Y 4764 NFS Server on 10.41.65.42049 0 Y 6543 Self-heal Daemon on 10.41.65.4 N/A N/A Y 6551 So, is this just an oversight (I hope), or has support for C5 been dropped? If support for C5 is gone, how do I downgrade my Centos6 machines back to 3.6.x? (I know how to change the repo, but the actual sequence of yum commands and gluster commands is unknown to me). Could you attach the glusterd log file of 10.x.x.2 machine attached as etc-glusterfs-glusterd.vol.log.newer.2, starting from last machine reboot and the node from where you triggered volume status. attached as etc-glusterfs-glusterd.vol.log.newer4 starting same time as .2 log Could you also share gluster volume info output of all the nodes? I have several volumes, so I chose the one that shows up first on the listings: *from 10.41.65.2:* [root@office2 /var/log/glusterfs]$ gluster volume info Volume Name: ISO2 Type: Replicate Volume ID: 090da4b3-c666-41fe-8283-2c029228b3f7 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.41.65.2:/bricks/01/iso2 Brick2: 10.41.65.4:/bricks/01/iso2 Brick3: 10.41.65.9:/bricks/01/iso2 [root@office2 /var/log/glusterfs]$ gluster volume status ISO2 Status of volume: ISO2 Gluster process PortOnline Pid -- Brick 10.41.65.2:/bricks/01/iso2 49162 Y 4463 Brick 10.41.65.4:/bricks/01/iso2 49183 Y 6447 Brick 10.41.65.9:/bricks/01/iso2 49169 Y 1985 NFS Server on localhost 2049Y 4536 Self-heal Daemon on localhost N/A Y 4543 NFS Server on 10.41.65.9 2049Y 2279 Self-heal Daemon on 10.41.65.9 N/A Y 2754 NFS Server on 10.41.65.4 2049Y 6543 Self-heal Daemon on 10.41.65.4 N/A Y 6551 Task Status of Volume ISO2 -- There are no active volume tasks [root@office2 ~]$ gluster peer status Number of Peers: 2 Hostname: 10.41.65.9 Uuid: cf2ae9c7-833e-4a73-a996-e72158011c69 State: Peer in Cluster (Connected) Hostname: 10.41.65.4 Uuid: bd3ca8b7-f2da-44ce-8739-c0db5e40158c State: Peer in Cluster (Connected) *from 10.41.65.4:* [root@office4b ~]# gluster volume info ISO2 Volume Name: ISO2 Type: Replicate Volume ID: 090da4b3-c666-41fe-8283-2c029228b3f7 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.41.65.2:/bricks/01/iso2 Brick2: 10.41.65.4:/bricks/01/iso2 Brick3: 10.41.65.9:/bricks/01/iso2 [root@office4b ~]#
[Gluster-users] split brain on / just after installation
Hello - I was wondering if someone could please help me. I've just setup Gluster 3.6 on two Ubuntu 14.04 hosts. Gluster is setup to replicate two volumes (prod-volume, dev-volume) between the two hosts. Replication is working fine. The glustershd.log shows: [2015-06-02 03:28:04.495162] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-prod-volume-replicate-0: Unable to self-heal contents of 'gfid:----0001' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] and the prod-volume logs shows: [2015-06-02 02:54:28.286268] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-prod-volume-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2015-06-02 02:54:28.287476] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-prod-volume-replicate-0: background meta-data self-heal failed on / I've checked against https://github.com/gluster/glusterfs/blob/6c578c03f0d44913d264494de5df004544c96271/doc/features/heal-info-and-split-brain-resolution.md but I can't see any scenario that covers mine. The output of bluster volume heal prod-volume info is: Gathering Heal info on volume prod-volume has been successful Brick server1:/export/prodvol/brick Number of entries: 1 / Brick server2 Number of entries: 1 / and doesn't show anything in split-brain. But the output of gluster volume heal prod-volume info split brain shows: Gathering Heal info on volume prod-volume has been successful Brick server1:/export/prodvol/brick Number of entries: 6 atpath on brick --- 2015-06-02 03:28:04 / 2015-06-02 03:18:04 / 2015-06-02 03:08:04 / 2015-06-02 02:58:04 / 2015-06-02 02:48:04 / 2015-06-02 02:48:04 / Brick server2:/export/prodvol/brick Number of entries: 5 atpath on brick --- 2015-06-02 03:28:00 / 2015-06-02 03:18:00 / 2015-06-02 03:08:00 / 2015-06-02 02:58:00 / 2015-06-02 02:48:04 / And the number continues to grow. The count on server2 is always one behind server1. Could someone please help? Cheers, ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Client load high (300) using fuse mount
hi Mitja, Could you please give output of the following commands: 1) gluster volume info 2) gluster volume profile volname start 3) Wait while the CPU is high for 5-10 minutes 4) gluster volume profile volname info output-you-need-to-attach-to-this-mail.txt 4th command tells what are the operations that are issued a lot. Pranith On 06/01/2015 04:41 PM, Mitja Mihelič wrote: Hi! I am trying to set up a Wordpress cluster using GlusterFS used for storage. Web nodes will access the same Wordpress install on a volume mounted via FUSE from a 3 peer GlusterFS TSP. I started with one web node and Wordpress on local storage. The load average was constantly about 5. iotop showed about 300kB/s disk reads or less. The load average was below 6. When I mounted the GlusterFS volume to the web node the 1min load average went over 300. Each of the 3 peers is transmitting about 10MB/s to my web node regardless of the load. TSP peers are on 10Gbit NICs and the web node is on a 1Gbit NIC. I'm out of ideas here... Could it be the network? What should I look at for optimizing the network stack on the client? Options set on TSP: Options Reconfigured: performance.cache-size: 4GB network.ping-timeout: 15 cluster.quorum-type: auto network.remote-dio: on cluster.eager-lock: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.cache-refresh-timeout: 4 performance.io-thread-count: 32 nfs.disable: on Regards, Mitja ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users