Re: [Gluster-users] Please advise for our file server cluster
- Original Message - From: Gao g...@pztop.com To: gluster-users@gluster.org Sent: Monday, June 8, 2015 12:58:56 PM Subject: Re: [Gluster-users] Please advise for our file server cluster On 15-06-05 04:30 PM, Gao wrote: Hi, We are a small business and now we are planning to build a new file server system. I did some research and I decide to use GlusterFS as the cluster system to build a 2-node system. Our goals are trying to minimize the downtime and to avoid single point of failure. Meanwhile, I need keep an eye on the budget. In our office we have 20+ computers running Ubuntu. Few(6) machines use Windows 8. We use a SAMBA server to take care file sharing. What file sizes / access patterns are you planning on using? Smallfile and stat / metadata operations on Windows / Samba will be much slower than using glusterfs or NFS mounts. Be sure to clearly identify your performance requirements before you go to size your HW. I did some research and here are some main components I selected for the system: M/B: Asus P9D-E/4L (It has 6 SATA ports so I can use softRAID5 for data storage. 4 NIC ports so I can do link aggregation) CPU: XEON E3-1220v3 3.1GHz (is this over kill? the MB also support i3 though.) Memory: 4x8GB ECC DDR3 SSD: 120 GB for OS Hard Drive: 4 (or 5) 3TB 7200RPM drive to form soft RAID5 10GBe card: Intel X540-T1 Seems reasonable. I would expect 40-60 MB / sec writes and 80-100 MB / sec reads over gigabit with sequential workloads. Over 10G I would expect ~200-400 MB / sec for sequential reads and writes. Glusterfs and NFS mounts will perform better but it sounds like you need samba for your windows hosts. About the hardware I am not confident. One thing is the 10GBe card. Is it sufficient? I chose this because it's less expensive. But I don't want it drag the system down once I build them. Also, if I only need 2 nodes, can I just use CAT6 cable to link them together? or I have to use a 10GBe switch? It all depends on your performance requirements. You will need a 10G switch if you want the clients to access the servers over 10G. If you don't need more than 120 MB / sec you can use gigabit, but if you need more then you will have to goto the 10G NICs. Could someone give me some advice? Thanks. Gao Any help? Please. -- ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
OK, I found at least one of the bugs. The /usr/libexec/ganesha/ganesha.sh has the following lines: if [ -e /etc/os-release ]; then RHEL6_PCS_CNAME_OPTION= fi This is OK for RHEL 7, but does not work for = 7. I have changed it to the following, to make it working: if [ -e /etc/os-release ]; then eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release) [ $REDHAT_SUPPORT_PRODUCT == Fedora ] RHEL6_PCS_CNAME_OPTION= fi Apart from that, the VIP_node I was using were wrong, and I should have converted all the “-“ to underscores, maybe this could be mentioned in the documentation when you will have it ready. Now, the cluster starts, but the VIPs apparently not: Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 PCSD Status: atlas-node1: Online atlas-node2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled But the issue that is puzzling me more is the following: # showmount -e localhost rpc mount export: RPC: Timed out And when I try to enable the ganesha exports on a volume I get this error: # gluster volume set atlas-home-01 ganesha.enable on volume set: failed: Failed to create NFS-Ganesha export config file. But I see the file created in /etc/ganesha/exports/*.conf Still, showmount hangs and times out. Any help? Thanks, Alessandro Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi, indeed, it does not work :-) OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 and nfs-ganesha 2.2.0: 1) ensured that the machines are able to resolve their IPs (but this was already true since they were in the DNS); 2) disabled NetworkManager and enabled network on both machines; 3) created a gluster shared volume 'gluster_shared_storage' and mounted it on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs native mount (on CentOS 7.1 there is a link by default /var/run - ../run) 4) created an empty /etc/ganesha/ganesha.conf; 5) installed pacemaker pcs resource-agents corosync on all cluster machines; 6) set the ‘hacluster’ user the same password on all machines; 7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on both nodes I issued the commands for both nodes) 8) IPv6 is configured by default on all nodes, although the infrastructure is not ready for IPv6 9) enabled pcsd and started it on all nodes 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per machine: === atlas-node1 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node1 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2 === atlas-node2 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node2 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2” 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message: # gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please check the log file for details Looking at the logs I found nothing really special but this: == /var/log/glusterfs/etc-glusterfs-glusterd.vol.log == [2015-06-08 17:57:15.672844] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host]
Re: [Gluster-users] Errors in quota-crawl.log
I have submitted a BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1229422 ___ ¯\_(ツ)_/¯ Ryan Clough Information Systems Decision Sciences International Corporation http://www.decisionsciencescorp.com/ http://www.decisionsciencescorp.com/ On Wed, Apr 8, 2015 at 1:49 AM, Sachin Pandit span...@redhat.com wrote: Please find the comments inline. - Original Message - From: Ryan Clough ryan.clo...@dsic.com To: gluster-users gluster-users@gluster.org Sent: Wednesday, April 8, 2015 9:59:55 AM Subject: Re: [Gluster-users] Errors in quota-crawl.log No takers? Seems like quota is working but when I see permission denied warnings it makes me wonder if the quota calculations are going to be accurate. Any help would be much appreciated. Ryan Clough Information Systems Decision Sciences International Corporation On Thu, Apr 2, 2015 at 12:43 PM, Ryan Clough ryan.clo...@dsic.com wrote: We are running the following operating system: Scientific Linux release 6.6 (Carbon) With the following kernel: 2.6.32-504.3.3.el6.x86_64 We are using the following version of Glusterfs: glusterfs-libs-3.6.2-1.el6.x86_64 glusterfs-3.6.2-1.el6.x86_64 glusterfs-cli-3.6.2-1.el6.x86_64 glusterfs-api-3.6.2-1.el6.x86_64 glusterfs-fuse-3.6.2-1.el6.x86_64 glusterfs-server-3.6.2-1.el6.x86_64 Here is the current configuration of our 2 node distribute only cluster: Volume Name: export_volume Type: Distribute Volume ID: c74cc970-31e2-4924-a244-4c70d958dadb Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: hgluster01:/gluster_data Brick2: hgluster02:/gluster_data Options Reconfigured: performance.cache-size: 1GB diagnostics.brick-log-level: ERROR performance.stat-prefetch: on performance.write-behind: on performance.flush-behind: on features.quota-deem-statfs: on performance.quick-read: off performance.client-io-threads: on performance.read-ahead: on performance.io-thread-count: 24 features.quota: on cluster.eager-lock: on nfs.disable: on auth.allow: 192.168.10.*,10.0.10.*,10.8.0.*,10.2.0.*,10.0.60.* server.allow-insecure: on performance.write-behind-window-size: 1MB network.ping-timeout: 60 features.quota-timeout: 0 performance.io-cache: off server.root-squash: on performance.readdir-ahead: on Here is the status of the nodes: Status of volume: export_volume Gluster process Port Online Pid -- Brick hgluster01:/gluster_data 49152 Y 7370 Brick hgluster02:/gluster_data 49152 Y 17868 Quota Daemon on localhost N/A Y 2051 Quota Daemon on hgluster02.red.dsic.com N/A Y 6691 Task Status of Volume export_volume -- There are no active volume tasks I have just turned quota on and was watching the quota-crawl.log and see a bunch of these type of messages: [2015-04-02 19:23:01.540692] W [fuse-bridge.c:483:fuse_entry_cbk] 0-glusterfs-fuse: 2338683: LOOKUP() /\ = -1 (Permission denied) [2015-04-02 19:23:01.543565] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (----) [2015-04-02 17:58:14.090556] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-export_volume-client-0: remote operation failed: Permission denied. Path: /\ (----) Should I be worried about this and how do I go about fixing the permissions? Is this a bug and should it be reported? Hi Ryan, Apologies for the late reply. Looking at the description of the problem I don't think there will be any problem. I think its better if we track this problem using a bug. If you have already raised a bug then please do provide us a bug-id, or else we will raise a new bug. I have one question: Looking at the path /\ , do you have a directory with similar path, as we can see accessing that has failed? Thanks, Sachin. Thanks, in advance, for your time to help me. Ryan Clough Information Systems Decision Sciences International Corporation This email and its contents are confidential. If you are not the intended recipient, please do not disclose or use the information within this email or its attachments. If you have received this email in error, please report the error to the sender by return email and delete this communication from your records. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -- This email and its contents are confidential. If you are not the intended recipient, please do not disclose or use the information within this email or its attachments. If you have received this email in error,
Re: [Gluster-users] Questions on ganesha HA and shared storage size
On 06/08/2015 08:20 PM, Alessandro De Salvo wrote: Sorry, just another question: - in my installation of gluster 3.7.1 the command gluster features.ganesha enable does not work: # gluster features.ganesha enable unrecognized word: features.ganesha (position 0) Which version has full support for it? Sorry. This option has recently been changed. It is now $ gluster nfs-ganesha enable - in the documentation the ccs and cman packages are required, but they seems not to be available anymore on CentOS 7 and similar, I guess they are not really required anymore, as pcs should do the full job Thanks, Alessandro Looks like so from http://clusterlabs.org/quickstart-redhat.html. Let us know if it doesn't work. Thanks, Soumya Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Great, many thanks Soumya! Cheers, Alessandro Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha scritto: Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, indeed, it does not work :-) OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 and nfs-ganesha 2.2.0: 1) ensured that the machines are able to resolve their IPs (but this was already true since they were in the DNS); 2) disabled NetworkManager and enabled network on both machines; 3) created a gluster shared volume 'gluster_shared_storage' and mounted it on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs native mount (on CentOS 7.1 there is a link by default /var/run - ../run) 4) created an empty /etc/ganesha/ganesha.conf; 5) installed pacemaker pcs resource-agents corosync on all cluster machines; 6) set the ‘hacluster’ user the same password on all machines; 7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on both nodes I issued the commands for both nodes) 8) IPv6 is configured by default on all nodes, although the infrastructure is not ready for IPv6 9) enabled pcsd and started it on all nodes 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per machine: === atlas-node1 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node1 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2 === atlas-node2 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node2 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2” 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message: # gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please check the log file for details Looking at the logs I found nothing really special but this: == /var/log/glusterfs/etc-glusterfs-glusterd.vol.log == [2015-06-08 17:57:15.672844] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:16.633048] E [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management: Initial NFS-Ganesha set up failed [2015-06-08 17:57:16.641563] E [glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of operation 'Volume (null)' failed on localhost : Failed to set up HA config for NFS-Ganesha. Please check the log file for details == /var/log/glusterfs/cmd_history.log == [2015-06-08 17:57:16.643615] : nfs-ganesha enable : FAILED : Failed to set up HA config for NFS-Ganesha. Please check the log file for details == /var/log/glusterfs/cli.log == [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting with: -1 Also, pcs seems to be fine for the auth part, although it obviously tells me the cluster is not running. I, [2015-06-08T19:57:16.305323 #7223] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name I, [2015-06-08T19:57:16.345457 #7223] INFO -- : Running: /usr/sbin/pcs cluster token-nodes :::141.108.38.46 - - [08/Jun/2015 19:57:16] GET /remote/check_auth HTTP/1.1 200 68 0.1919 :::141.108.38.46 - - [08/Jun/2015 19:57:16] GET /remote/check_auth HTTP/1.1 200 68 0.1920 atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] GET /remote/check_auth HTTP/1.1 200 68 - - /remote/check_auth What am I doing wrong? Thanks, Alessandro Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri skod...@redhat.com ha scritto: On 06/08/2015 08:20 PM, Alessandro De Salvo wrote: Sorry, just another question: - in my installation of gluster 3.7.1 the command gluster features.ganesha enable does not work: # gluster features.ganesha enable unrecognized word: features.ganesha (position 0) Which version has full support for it? Sorry. This option has recently been changed. It is now $ gluster nfs-ganesha enable - in the documentation the ccs and cman packages are required, but they seems not to be available anymore on CentOS 7 and similar, I guess they are not really required anymore, as pcs should do the full job Thanks, Alessandro Looks like so from
Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
Hi Ben Here the expected output: [root@node048 ~]# iperf3 -c 10.0.4.1 Connecting to host 10.0.4.1, port 5201 [ 4] local 10.0.5.48 port 44151 connected to 10.0.4.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.86 GBytes 15.9 Gbits/sec0 8.24 MBytes [ 4] 1.00-2.00 sec 1.94 GBytes 16.7 Gbits/sec0 8.24 MBytes [ 4] 2.00-3.00 sec 1.95 GBytes 16.8 Gbits/sec0 8.24 MBytes [ 4] 3.00-4.00 sec 1.86 GBytes 16.0 Gbits/sec0 8.24 MBytes [ 4] 4.00-5.00 sec 1.85 GBytes 15.8 Gbits/sec0 8.24 MBytes [ 4] 5.00-6.00 sec 1.89 GBytes 16.2 Gbits/sec0 8.24 MBytes [ 4] 6.00-7.00 sec 1.90 GBytes 16.3 Gbits/sec0 8.24 MBytes [ 4] 7.00-8.00 sec 1.88 GBytes 16.1 Gbits/sec0 8.24 MBytes [ 4] 8.00-9.00 sec 1.88 GBytes 16.2 Gbits/sec0 8.24 MBytes [ 4] 9.00-10.00 sec 1.87 GBytes 16.1 Gbits/sec0 8.24 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 18.9 GBytes 16.2 Gbits/sec0 sender [ 4] 0.00-10.00 sec 18.9 GBytes 16.2 Gbits/sec receiver iperf Done. Here are all shell commands i used for volume creation with RDMA transport-type: gluster volume create vol_home replica 2 transport rdma,tcp ib-storage1:/export/brick_home/brick1/ ib-storage2:/export/brick_home/brick1/ ib-storage3:/export/brick_home/brick1/ ib-storage4:/export/brick_home/brick1/ ib-storage1:/export/brick_home/brick2/ ib-storage2:/export/brick_home/brick2/ ib-storage3:/export/brick_home/brick2/ ib-storage4:/export/brick_home/brick2/ force and below the current volume information: [root@lucifer ~]# gluster volume info vol_home Volume Name: vol_home Type: Distributed-Replicate Volume ID: f6ebcfc1-b735-4a0e-b1d7-47ed2d2e7af6 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp,rdma Bricks: Brick1: ib-storage1:/export/brick_home/brick1 Brick2: ib-storage2:/export/brick_home/brick1 Brick3: ib-storage3:/export/brick_home/brick1 Brick4: ib-storage4:/export/brick_home/brick1 Brick5: ib-storage1:/export/brick_home/brick2 Brick6: ib-storage2:/export/brick_home/brick2 Brick7: ib-storage3:/export/brick_home/brick2 Brick8: ib-storage4:/export/brick_home/brick2 Options Reconfigured: performance.stat-prefetch: on performance.flush-behind: on features.default-soft-limit: 90% features.quota: on diagnostics.brick-log-level: CRITICAL auth.allow: localhost,127.0.0.1,10.* nfs.disable: on performance.cache-size: 64MB performance.write-behind-window-size: 1MB performance.quick-read: on performance.io-cache: on performance.io-thread-count: 64 nfs.enable-ino32: on and below my mount command: mount -t glusterfs -o transport=rdma,direct-io-mode=disable,enable-ino32 ib-storage1:vol_home /home I dont obtain any error with RDMA option but transport type silently fall back to TCP. Did i make any mistake in my settings? Can you tell me more about block size and other tunings i should do on my rdma volumes? Thanks in advance, Geoffrey -- Geoffrey Letessier Responsable informatique ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr Le 8 juin 2015 à 18:22, Ben Turner btur...@redhat.com a écrit : - Original Message - From: Geoffrey Letessier geoffrey.letess...@cnrs.fr To: Ben Turner btur...@redhat.com Cc: Pranith Kumar Karampuri pkara...@redhat.com, gluster-users@gluster.org Sent: Monday, June 8, 2015 8:37:08 AM Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances Hello, Do you know more about? In addition, do you know how to « activate » RDMA for my volume with Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type option (both in server and client side) but I notice all streams are using TCP stack -and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s). That is a little slow for the HW you described. Can you check what you get with iperf just between the clients and servers? https://iperf.fr/ With replica 2 and 10G NW you should see ~400 MB / sec sequential writes and ~600 MB / sec reads. Can you send me the output from gluster v info? You specify RDMA volumes at create time by running gluster v create blah transport rdma, did you specify RDMA when you created the volume? What block size are you using in your tests? 1024 KB writes perform best with glusterfs, and the block size gets smaller perf will drop a little bit. I wouldn't write in anything under 4k blocks, the sweet spot is between 64k and 1024k. -b Thanks in advance, Geoffrey -- Geoffrey Letessier
[Gluster-users] reading from local replica?
Am I misunderstanding cluster.read-subvolume/cluster.read-subvolume-index? I have two regions, A and B with servers a and b in, respectfully, each region. I have clients in both regions. Intra-region communication is fast, but the pipe between the regions is terrible. I'd like to minimize inter-region communication to as close to glusterfs write operations only and have reads go to the server in the region the client is running in. I have created a replica volume as: gluster volume create gv0 replica 2 a:/data/brick1/gv0 b:/data/brick1/gv0 force As a baseline, if I use scp to copy from the brick directly, I get -- for a 100M file -- times of about 6s if the client scps from the server in the same region and anywhere from 3 to 5 minutes if I the client scps the server in the other region. I was under the impression (from something I read but can't now find) that glusterfs automatically picks the fastest replica, but that has not been my experience; glusterfs seems to generally prefer the server in the other region over the local one, with times usually in excess of 4 minutes. I've also tried having clients mount the volume using the xlator options cluster.read-subvolume and cluster.read-subvolume-index, but neither seem to have any impact. Here are sample mount commands to show what I'm attempting: mount -t glusterfs -o xlator-option=cluster.read-subvolume=gv0-client-0 or 1 a:/gv0 /mnt/glusterfs mount -t glusterfs -o xlator-option=cluster.read-subvolume-index=0 or 1 a:/gv0 /mnt/glusterfs Am I misunderstanding how glusterfs works, particularly when trying to read locally? Is it possible to configure glusterfs to use a local replica (or the fastest replica) for reads? ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] nfs-ganesha/samba vfs and replica redundancy
On 6/3/2015 3:15 AM, Benjamin Kingston wrote: Can someone give me a hint on the best way to maintain data availability to a share on a third system using nfs-ganesha and samba? I currently have a round-robbin dns entry that nfs ganesha/samba uses, however even with a short ttl, there's brief downtime when a replica node fails. I can't see in the samba VFS or ganesha fsal syntax where a secondary address can be provided. I've tried comma seperated, space seperated, with/without quotes for multiple IP's and only seen issues. Any reason you aren't using a floating IP address? This isn't the newest talk, but the concepts have not changed: http://events.linuxfoundation.org/sites/events/files/lcjpcojp13_nakai.pdf Ted Miller Elkhart, IN, USA ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] The strange behavior whose common denominator is gluster
Are you sure you have mounted the gluster volume, and are writing to the gluster volume, and NOT to the brick? What you describe can happen when you write to the brick instead of the gluster volume. You can see here: http://www.gluster.org/community/documentation/index.php/QuickStart in steps 6 and 7. If you do not understand the difference, include the output of the 'mount' command from one of your servers. Ted Miller Elkhart, IN, USA On 6/5/2015 8:46 AM, Pablo Silva wrote: Dear Colleagues: We are using gluster in 3.3.1-15.el6.x86_64 and GlusterFS-3.6.2-1.el5 versions, we have two types of service: 1) Apache httpd-2.2.3-91.el5.centos + GlusterFS-3.6.2-1.el5 (two bricks) 2) AS2 Mendelson B45 + gluster 3.3.1-15.el6.x86_64 (two bricks) It is different services, a common problem, which I will explain Service N1 (Apache httpd-2.2.3-91.el5.centos + GlusterFS-3.6.2-1.el5 (two bricks)) --- We have a high-availability architecture, in which there are two Apache servers see a directory that is hosted on a gluster long ago we had a problem where an Apache server could list the files and submit them for download, while the other Apache server that is watching the same directory with the same files gluster indicated that there were no files for download. Feeding gluster files to that directory, MULE performed asynchronously.In summary, an Apache server could access files and another did not give aware of their existence, as the directory and the same files. Service N2 (As2 Mendelson B45 + gluster 3.3.1-15.el6.x86_64 (two bricks) ) -- We have only one Mendelson AS2 Server B45 running with gluster (two bricks), The operations of mendelson is quite simple, is to observe the presence of files in a directory every 5 seconds and sent to the partner, the directory is hosted in gluster, the issue that every certain amount of time not Mendelson AS2 takes cognizance the existence of files in the directory, even if you enter the directory notes of its existence In both cases, different services being the only common denominator is gluster, someone else is experiencing this problem? Have we not set the service gluster well and we are repeating the same mistake ?, or is it a bug? Thanks in advance Pablo ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -- *Ted Miller*, Design Engineer *SonSet Solutions* (formerly HCJB Global Technology Center) my desk +1 574.970.4272 receptionist +1 574.972.4252 http://sonsetsolutions.org /Technology for abundant life!/ ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
Unfortunately, when I restart every node in the cluster sequentially...qemu image of the HA VM gets corrupted... Even client nodes? Make sure that your client can connect to all of the servers. Make sure, after you restart a server, that the self-heal finishes before you restart the next one. What I suspect is happening is that you restart server A, writes happen on server B. You restart server B before the heal has happened to copy the changes from server A to server B, thus causing the client to write changes to server B. When server A comes back, both server A and server B think they have changes for the other. This is a classic split-brain state. On 06/04/2015 07:08 AM, Roger Lehmann wrote: Hello, I'm having a serious problem with my GlusterFS cluster. I'm using Proxmox 3.4 for high available VM management which works with GlusterFS as storage. Unfortunately, when I restart every node in the cluster sequentially one by one (with online migration of the running HA VM first of course) the qemu image of the HA VM gets corrupted and the VM itself has problems accessing it. May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, dev vda, sector 2048 May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on device vda1, logical block 0 May 15 10:35:09 blog kernel: [339003.942929] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, dev vda, sector 2072 May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on device vda1, logical block 3 May 15 10:35:09 blog kernel: [339003.943146] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, dev vda, sector 4196712 May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on device vda1, logical block 524333 May 15 10:35:09 blog kernel: [339003.943350] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, dev vda, sector 4197184 After the image is broken, it's impossible to migrate the VM or start it when it's down. root@pve2 ~ # gluster volume heal pve-vol info Gathering list of entries to be healed on volume pve-vol has been successful Brick pve1:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 Brick pve2:/var/lib/glusterd/brick Number of entries: 1 /images/200/vm-200-disk-1.qcow2 Brick pve3:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. Anybody else experienced this problem? I'm not sure if issue 1161885 (Possible file corruption on dispersed volumes) is the issue I'm experiencing. I have a 3 node replicate cluster. Thanks for your help! Regards, Roger Lehmann ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Please advise for our file server cluster
On 15-06-05 04:30 PM, Gao wrote: Hi, We are a small business and now we are planning to build a new file server system. I did some research and I decide to use GlusterFS as the cluster system to build a 2-node system. Our goals are trying to minimize the downtime and to avoid single point of failure. Meanwhile, I need keep an eye on the budget. In our office we have 20+ computers running Ubuntu. Few(6) machines use Windows 8. We use a SAMBA server to take care file sharing. I did some research and here are some main components I selected for the system: M/B: Asus P9D-E/4L (It has 6 SATA ports so I can use softRAID5 for data storage. 4 NIC ports so I can do link aggregation) CPU: XEON E3-1220v3 3.1GHz (is this over kill? the MB also support i3 though.) Memory: 4x8GB ECC DDR3 SSD: 120 GB for OS Hard Drive: 4 (or 5) 3TB 7200RPM drive to form soft RAID5 10GBe card: Intel X540-T1 About the hardware I am not confident. One thing is the 10GBe card. Is it sufficient? I chose this because it's less expensive. But I don't want it drag the system down once I build them. Also, if I only need 2 nodes, can I just use CAT6 cable to link them together? or I have to use a 10GBe switch? Could someone give me some advice? Thanks. Gao Any help? Please. -- ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] hadoop gluster
Hi EveryOne. I want to use Hadoop 2.x with GlusterFS. To testing, Prepare these softwares. Cnfiguration - CentOS 7.1(on VMWare) - Gluster FS 3.7 - Hadoop 2.x - Glusterfs-Hadoop plugin 2.3.13 Test Process is this. Test Process 1. Install CentOS 7.1. (3 VM Machine) - 1 Machine is GlusterFS Client, and hadoop Namenode - 2 Machine is GlusterFS BrickNode, and hadoop Datanode 2. Install GlusterFS 3.7, and cnfigure brick. 3. Mount Gluster volume from client Machine. 4. Install Hadoop 2.x, and HDFS configure. * Hadoop installed by root. 5. Starting and testing HDFS and Mapreduce. 6. Configure Hadoop 2.0 with GlusterFS Finish and Succcess Test 5, and I started Test 6 with this Note. Note http://www.gluster.org/community/documentation/index.php/Hadoop https://forge.gluster.org/hadoop/pages/Configuration But I'm not entirely the way. - How to use glusterfs-hadoop-2.3.13.jar ? - Only put glusterfs-hadoop-2.3.13.jar and edit core-site.xml, Hadoop can use on GlusterFS? - If Finished Configuration, Hadoop can access on GlusterFS Volume by hadoop command ? ex) $ bin/hadoop dfs -cat GlusterVolume/data $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep GlusteVolume/data GlusteVolume/dataout '[a-z.]+' Please tll me the way of Configuring. I'm a hadoop Beginner, and I'm not so good at English. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] root squash and git clone
Anyone noticed this ? This is easily reproducible for me. It's a bit strange though since a) git clone isn't/shouldn't be doing anything as root and b) if it were, it would have failed similarly on a regular(not glusterfs) nfs mount with root squash for me, which it doesn't. On Wed, Jun 3, 2015 at 7:22 AM, Prasun Gera prasun.g...@gmail.com wrote: Version: RHS 3.0 I noticed that if server.root-squash is set on, clients get permissions errors on git commands like git clone. Is this a known issue ? I confirmed that the write permissions to the destination directories were correct, and normal writes were working fine. git clones would fail though with: error:unable to write sha1 filename fatal :cannot store pack file fatal :index-pack failed ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] root squash and git clone
We had seen this at some point back in the Gluster 3.5.x days but have not seen it since 3.6.x. If you are truly using fully licensed Red Hat Storage then I would leverage Red Hat support directly. ___ ¯\_(ツ)_/¯ Ryan Clough Information Systems Decision Sciences International Corporation http://www.decisionsciencescorp.com/ http://www.decisionsciencescorp.com/ On Mon, Jun 8, 2015 at 6:34 PM, Prasun Gera prasun.g...@gmail.com wrote: Anyone noticed this ? This is easily reproducible for me. It's a bit strange though since a) git clone isn't/shouldn't be doing anything as root and b) if it were, it would have failed similarly on a regular(not glusterfs) nfs mount with root squash for me, which it doesn't. On Wed, Jun 3, 2015 at 7:22 AM, Prasun Gera prasun.g...@gmail.com wrote: Version: RHS 3.0 I noticed that if server.root-squash is set on, clients get permissions errors on git commands like git clone. Is this a known issue ? I confirmed that the write permissions to the destination directories were correct, and normal writes were working fine. git clones would fail though with: error:unable to write sha1 filename fatal :cannot store pack file fatal :index-pack failed ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -- This email and its contents are confidential. If you are not the intended recipient, please do not disclose or use the information within this email or its attachments. If you have received this email in error, please report the error to the sender by return email and delete this communication from your records. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] hadoop gluster
You should submit jobs as a user other than yarn which is also a member of the hadoop group. We usually add a mapred user. Also check your Hadoop Home/etc/hadoop/container-executor.cfg: yarn.nodemanager.linux-container-executor.group=hadoop banned.users=yarn min.user.id=1000 allowed.system.users=mapred You'll want the mapred UID 1000 or else adjust the setting in the file. Regards, Shubhendu On 06/08/2015 07:44 AM, 中川智之 wrote: Hi EveryOne. I want to use Hadoop 2.x with GlusterFS. To testing, Prepare these softwares. Cnfiguration - CentOS 7.1(on VMWare) - Gluster FS 3.7 - Hadoop 2.x - Glusterfs-Hadoop plugin 2.3.13 Test Process is this. Test Process 1. Install CentOS 7.1. (3 VM Machine) - 1 Machine is GlusterFS Client, and hadoop Namenode - 2 Machine is GlusterFS BrickNode, and hadoop Datanode 2. Install GlusterFS 3.7, and cnfigure brick. 3. Mount Gluster volume from client Machine. 4. Install Hadoop 2.x, and HDFS configure. * Hadoop installed by root. 5. Starting and testing HDFS and Mapreduce. 6. Configure Hadoop 2.0 with GlusterFS Finish and Succcess Test 5, and I started Test 6 with this Note. Note http://www.gluster.org/community/documentation/index.php/Hadoop https://forge.gluster.org/hadoop/pages/Configuration But I'm not entirely the way. - How to use glusterfs-hadoop-2.3.13.jar ? - Only put glusterfs-hadoop-2.3.13.jar and edit core-site.xml, Hadoop can use on GlusterFS? - If Finished Configuration, Hadoop can access on GlusterFS Volume by hadoop command ? ex) $ bin/hadoop dfs -cat GlusterVolume/data $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep GlusteVolume/data GlusteVolume/dataout '[a-z.]+' Please tll me the way of Configuring. I'm a hadoop Beginner, and I'm not so good at English. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Errors in quota-crawl.log
Hi Ryan, Thank you for reporting this failure. We will make sure to fix this as soon as possible. Thanks, Sachin Pandit. - Original Message - From: Ryan Clough ryan.clo...@dsic.com To: Sachin Pandit span...@redhat.com Cc: gluster-users gluster-users@gluster.org, Vijaikumar M vmall...@redhat.com Sent: Monday, June 8, 2015 11:21:25 PM Subject: Re: [Gluster-users] Errors in quota-crawl.log I have submitted a BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1229422 ___ ¯\_(ツ)_/¯ Ryan Clough Information Systems Decision Sciences International Corporation http://www.decisionsciencescorp.com/ http://www.decisionsciencescorp.com/ On Wed, Apr 8, 2015 at 1:49 AM, Sachin Pandit span...@redhat.com wrote: Please find the comments inline. - Original Message - From: Ryan Clough ryan.clo...@dsic.com To: gluster-users gluster-users@gluster.org Sent: Wednesday, April 8, 2015 9:59:55 AM Subject: Re: [Gluster-users] Errors in quota-crawl.log No takers? Seems like quota is working but when I see permission denied warnings it makes me wonder if the quota calculations are going to be accurate. Any help would be much appreciated. Ryan Clough Information Systems Decision Sciences International Corporation On Thu, Apr 2, 2015 at 12:43 PM, Ryan Clough ryan.clo...@dsic.com wrote: We are running the following operating system: Scientific Linux release 6.6 (Carbon) With the following kernel: 2.6.32-504.3.3.el6.x86_64 We are using the following version of Glusterfs: glusterfs-libs-3.6.2-1.el6.x86_64 glusterfs-3.6.2-1.el6.x86_64 glusterfs-cli-3.6.2-1.el6.x86_64 glusterfs-api-3.6.2-1.el6.x86_64 glusterfs-fuse-3.6.2-1.el6.x86_64 glusterfs-server-3.6.2-1.el6.x86_64 Here is the current configuration of our 2 node distribute only cluster: Volume Name: export_volume Type: Distribute Volume ID: c74cc970-31e2-4924-a244-4c70d958dadb Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: hgluster01:/gluster_data Brick2: hgluster02:/gluster_data Options Reconfigured: performance.cache-size: 1GB diagnostics.brick-log-level: ERROR performance.stat-prefetch: on performance.write-behind: on performance.flush-behind: on features.quota-deem-statfs: on performance.quick-read: off performance.client-io-threads: on performance.read-ahead: on performance.io-thread-count: 24 features.quota: on cluster.eager-lock: on nfs.disable: on auth.allow: 192.168.10.*,10.0.10.*,10.8.0.*,10.2.0.*,10.0.60.* server.allow-insecure: on performance.write-behind-window-size: 1MB network.ping-timeout: 60 features.quota-timeout: 0 performance.io-cache: off server.root-squash: on performance.readdir-ahead: on Here is the status of the nodes: Status of volume: export_volume Gluster process Port Online Pid -- Brick hgluster01:/gluster_data 49152 Y 7370 Brick hgluster02:/gluster_data 49152 Y 17868 Quota Daemon on localhost N/A Y 2051 Quota Daemon on hgluster02.red.dsic.com N/A Y 6691 Task Status of Volume export_volume -- There are no active volume tasks I have just turned quota on and was watching the quota-crawl.log and see a bunch of these type of messages: [2015-04-02 19:23:01.540692] W [fuse-bridge.c:483:fuse_entry_cbk] 0-glusterfs-fuse: 2338683: LOOKUP() /\ = -1 (Permission denied) [2015-04-02 19:23:01.543565] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-export_volume-client-1: remote operation failed: Permission denied. Path: /\ (----) [2015-04-02 17:58:14.090556] W [client-rpc-fops.c:2766:client3_3_lookup_cbk] 0-export_volume-client-0: remote operation failed: Permission denied. Path: /\ (----) Should I be worried about this and how do I go about fixing the permissions? Is this a bug and should it be reported? Hi Ryan, Apologies for the late reply. Looking at the description of the problem I don't think there will be any problem. I think its better if we track this problem using a bug. If you have already raised a bug then please do provide us a bug-id, or else we will raise a new bug. I have one question: Looking at the path /\ , do you have a directory with similar path, as we can see accessing that has failed? Thanks, Sachin. Thanks, in advance, for your time to help me. Ryan Clough Information Systems Decision Sciences International Corporation This email and its contents are confidential. If you are not the intended recipient, please do not disclose or use the information within this email
Re: [Gluster-users] root squash and git clone
I am using it through my shcool's Satellite subscription. So I don't have a direct interface with RHN support. While it might be theoretically possible to escalate it to rhn, it would be much easier if i can figure this out on my own. On Mon, Jun 8, 2015 at 7:31 PM, Ryan Clough ryan.clo...@dsic.com wrote: We had seen this at some point back in the Gluster 3.5.x days but have not seen it since 3.6.x. If you are truly using fully licensed Red Hat Storage then I would leverage Red Hat support directly. ___ ¯\_(ツ)_/¯ Ryan Clough Information Systems Decision Sciences International Corporation http://www.decisionsciencescorp.com/ http://www.decisionsciencescorp.com/ On Mon, Jun 8, 2015 at 6:34 PM, Prasun Gera prasun.g...@gmail.com wrote: Anyone noticed this ? This is easily reproducible for me. It's a bit strange though since a) git clone isn't/shouldn't be doing anything as root and b) if it were, it would have failed similarly on a regular(not glusterfs) nfs mount with root squash for me, which it doesn't. On Wed, Jun 3, 2015 at 7:22 AM, Prasun Gera prasun.g...@gmail.com wrote: Version: RHS 3.0 I noticed that if server.root-squash is set on, clients get permissions errors on git commands like git clone. Is this a known issue ? I confirmed that the write permissions to the destination directories were correct, and normal writes were working fine. git clones would fail though with: error:unable to write sha1 filename fatal :cannot store pack file fatal :index-pack failed ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users This email and its contents are confidential. If you are not the intended recipient, please do not disclose or use the information within this email or its attachments. If you have received this email in error, please report the error to the sender by return email and delete this communication from your records. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] One host won't rebalance
The rebalance failures appear to be because the connection to subvolume bigdata2-client-8 was lost. Rebalance will stop if any dht subvolume goes down. From the logs: [2015-06-04 23:24:36.714719] I [client.c:2215:client_rpc_notify] 0-bigdata2-client-8: disconnected from bigdata2-client-8. Client process will keep trying to connect to glusterd until brick's port is available [2015-06-04 23:24:36.714734] W [dht-common.c:5953:dht_notify] 0-bigdata2-dht: Received CHILD_DOWN. Exiting [2015-06-04 23:24:36.714745] I [MSGID: 109029] [dht-rebalance.c:2136:gf_defrag_stop] 0-: Received stop command on rebalance Did anything happen to the brick process for 0-bigdata2-client-8 that would cause this? The brick logs might help here. I need to look into why the rebalance never proceeded on gluster-6. The logs show the following : [2015-06-03 15:18:17.905569] W [client-handshake.c:1109:client_setvolume_cbk] 0-bigdata2-client-1: failed to set the volume (Permission denied) [2015-06-03 15:18:17.905583] W [client-handshake.c:1135:client_setvolume_cbk] 0-bigdata2-client-1: failed to get 'process-uuid' from reply dict [2015-06-03 15:18:17.905592] E [client-handshake.c:1141:client_setvolume_cbk] 0-bigdata2-client-1: SETVOLUME on remote-host failed: Authentication for all subvols on gluster-6. Can you send us the brick logs for those as well? Thanks, Nithya - Original Message - From: Branden Timm bt...@wisc.edu To: Nithya Balachandran nbala...@redhat.com Cc: gluster-users@gluster.org Sent: Saturday, 6 June, 2015 12:20:53 AM Subject: Re: [Gluster-users] One host won't rebalance Update on this. After two out of three servers entered failed state during rebalance, and the third hadn't done anything yet, I cancelled the rebalance. I then stopped/started the volume, and ran rebalance fix-layout. As of this point, it is running on all three servers successfully. Once fix-layout is done I will attempt another data rebalance and update this list with the results. From: gluster-users-boun...@gluster.org gluster-users-boun...@gluster.org on behalf of Branden Timm bt...@wisc.edu Sent: Friday, June 5, 2015 10:38 AM To: Nithya Balachandran Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] One host won't rebalance Sure, here is gluster volume info: Volume Name: bigdata2 Type: Distribute Volume ID: 2cd214fa-6fa4-49d0-93f6-de2c510d4dd4 Status: Started Number of Bricks: 15 Transport-type: tcp Bricks: Brick1: gluster-6.redacted:/gluster/brick1/data Brick2: gluster-6.redacted:/gluster/brick2/data Brick3: gluster-6.redacted:/gluster/brick3/data Brick4: gluster-6.redacted:/gluster/brick4/data Brick5: gluster-7.redacted:/gluster/brick1/data Brick6: gluster-7.redacted:/gluster/brick2/data Brick7: gluster-7.redacted:/gluster/brick3/data Brick8: gluster-7.redacted:/gluster/brick4/data Brick9: gluster-8.redacted:/gluster/brick1/data Brick10: gluster-8.redacted:/gluster/brick2/data Brick11: gluster-8.redacted:/gluster/brick3/data Brick12: gluster-8.redacted:/gluster/brick4/data Brick13: gluster-7.redacted:/gluster-sata/brick1/data Brick14: gluster-8.redacted:/gluster-sata/brick1/data Brick15: gluster-6.redacted:/gluster-sata/brick1/data Options Reconfigured: cluster.readdir-optimize: on performance.enable-least-priority: off Attached is a tarball containing logs for gluster-6, 7 and 8. I should also note that as of this morning, the two hosts that were successfully running the rebalance show as failed, while the affected host still is sitting at 0 secs progress: Node Rebalanced-files size scanned failures skipped status run time in secs - --- --- --- --- --- -- localhost00Bytes 0 0 0 in progress 0.00 gluster-7.glbrc.org 302019.4TB 12730 4 0 failed 105165.00 gluster-8.glbrc.org00Bytes 0 0 0 failed 0.00 volume rebalance: bigdata2: success: Thanks! From: Nithya Balachandran nbala...@redhat.com Sent: Friday, June 5, 2015 4:46 AM To: Branden Timm Cc: Atin Mukherjee; gluster-users@gluster.org Subject: Re: [Gluster-users] One host won't rebalance Hi, Can you send us the gluster volume info for the volume and the rebalance log for the nodes? What is the pid of the process which does not proceed? Thanks, Nithya - Original Message - From: Atin Mukherjee amukh...@redhat.com To: Branden Timm bt...@wisc.edu, Atin Mukherjee atin.mukherje...@gmail.com Cc: gluster-users@gluster.org Sent: Friday, June 5, 2015 9:26:44 AM Subject: Re: [Gluster-users] One host won't rebalance On 06/05/2015 12:05
Re: [Gluster-users] Double counting of quota
Hi Alessandro, Please provide the test-case, so that we can try to re-create this problem in-house? Thanks, Vijay On Saturday 06 June 2015 05:59 AM, Alessandro De Salvo wrote: Hi, just to answer to myself, it really seems the temp files from rsync are the culprit, it seems that their size are summed up to the real contents of the directories I’m synchronizing, or in other terms their size is not removed from the used size after they are removed. I suppose this is someway connected to the error on removexattr I’m seeing. The temporary solution I’ve found is to use rsync with the option to write the temp files to /tmp, but it would be very interesting to understand why this is happening. Cheers, Alessandro Il giorno 06/giu/2015, alle ore 01:19, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi, I currently have two brick with replica 2 on the same machine, pointing to different disks of a connected SAN. The volume itself is fine: # gluster volume info atlas-home-01 Volume Name: atlas-home-01 Type: Replicate Volume ID: 660db960-31b8-4341-b917-e8b43070148b Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: host1:/bricks/atlas/home02/data Brick2: host2:/bricks/atlas/home01/data Options Reconfigured: performance.write-behind-window-size: 4MB performance.io-thread-count: 32 performance.readdir-ahead: on server.allow-insecure: on nfs.disable: true features.quota: on features.inode-quota: on However, when I set a quota on a dir of the volume the size show is twice the physical size of the actual dir: # gluster volume quota atlas-home-01 list /user1 Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /user14.0GB 80% 3.2GB 853.4MB No No # du -sh /storage/atlas/home/user1 1.6G/storage/atlas/home/user1 If I remove one of the bricks the quota shows the correct value. Is there any double counting in case the bricks are on the same machine? Also, I see a lot of errors in the logs like the following: [2015-06-05 21:59:27.450407] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-home-01-posix: could not read the link from the gfid handle /bricks/atlas/home01/data/.glusterfs/be/e5/bee5e2b8-c639-4539-a483-96c19cd889eb (No such file or directory) and also [2015-06-05 22:52:01.112070] E [marker-quota.c:2363:mq_mark_dirty] 0-atlas-home-01-marker: failed to get inode ctx for /user1/file1 When running rsync I also see the following errors: [2015-06-05 23:06:22.203968] E [marker-quota.c:2601:mq_remove_contri] 0-atlas-home-01-marker: removexattr trusted.glusterfs.quota.fddf31ba-7f1d-4ba8-a5ad-2ebd6e4030f3.contri failed for /user1/..bashrc.O4kekp: No data available Those files are the temp files of rsync, I’m not sure why the throw errors in glusterfs. Any help? Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] using a preferred node ?
From this slide (maybe outdated) it says that reads are also balanced (in replication scenario slide 22): http://www.gluster.org/community/documentation/images/8/80/GlusterFS_Architecture_%26_Roadmap-Vijay_Bellur-LinuxCon_EU_2013.pdf Except for write, having an option to do only failover for reads lookup would be possible I guess ? Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-06-08 8:11 GMT+02:00 Ravishankar N ravishan...@redhat.com: On 06/08/2015 11:34 AM, Mathieu Chateau wrote: Hello Ravi, thanks for clearing things up. Anything on the roadmap that would help my case? I don't think it would be possible for clients to do I/O only on its local brick and yet expect the bricks' contents to be in sync in real-time.. Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-06-08 6:37 GMT+02:00 Ravishankar N ravishan...@redhat.com: On 06/06/2015 12:49 AM, Mathieu Chateau wrote: Hello, sorry to bother again but I am still facing this issue. client still looks on the other side and not using the node declared in fstab: prd-sta-sto01:/gluster-preprod /mnt/gluster-preprod glusterfs defaults,_netdev,backupvolfile-server=prd-sta-sto02 0 0 I expect client to use sto01 and not sto02 as it's available. Hi Mathieu, When you do lookups (`ls` etc), they are sent to both bricks of the replica. If you write to a file, the write is also sent to both bricks. This is how it works. Only reads are served from the local brick. -Ravi If I add a static route to break connectivity to sto02 and do a df, I have around 30s before it works. Then it works ok. Questions: - How to force node to stick as possible with one specific (local) node ? - How to know where a client is currently connected? Thanks for your help :) Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:26 GMT+02:00 Mathieu Chateau mathieu.chat...@lotp.fr: Hello, thanks for helping :) If gluster server is rebooted, any way to make client failback on node after reboot ? How to know which node is using a client ? I see TCP connection to both node Regards, Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:13 GMT+02:00 Ravishankar N ravishan...@redhat.com: On 05/10/2015 08:29 PM, Mathieu Chateau wrote: Hello, Short way: Is there any way to define a preferred Gluster server ? Long way: I have the following setup (version 3.6.3) : Gluster A == VPN == Gluster B Volume is replicated between A and B. They are in same datacenter, using a 1Gb/s connection, low latency (0.5ms) I have gluster clients in lan A B. When doing a ls on big folder (~60k files), both gluster node are used, and so it need 9mn instead on 1mn if only the local gluster is reachable. Lookups (and writes of course) from clients are sent to both bricks because AFR uses the result of the lookup to select which brick to read from if there is a pending heal etc. If the file is clean on both A and B, then reads are always served from the local brick. i.e. reads on clients mounted on A will be served from the brick in A (and likewise for B). Hope that helps, Ravi It's HA setup, application is present on both side. I would like a master/master setup, but using only local node as possible. Regards, Mathieu CHATEAU http://www.lotp.fr ___ Gluster-users mailing listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitorig gluster 3.6.1
On 1 June 2015 at 12:28, Félix de Lelelis felix.deleli...@gmail.com wrote: Hi, I have monitoring gluster with scripts that lunch scripts. All scripts are redirected to a one script that check if is active any process glusterd and if the repsonse its false, the script lunch the check. All checks are: - gluster volume volname info - gluster volume heal volname info - gluster volume heal volname split-brain - gluster volume volname status detail - gluster volume volname statistics Since I enable the monitoring in our pre-production gluster, the gluster is down 2 times. We suspect that the monitoring are overloading but should not. The question is, there any way to check those states otherwise? You can make use of https://github.com/keithseahus/fluent-plugin-glusterfs as well. http://docs.fluentd.org/articles/collect-glusterfs-logs HTH Best Regards, Vishwanath Thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Cannot start Gluster -- resolve brick failed in restore
Hi. I have a GlusterFS cluster running on a Debian Wheezy with GlusterFS 3.6.2, with one volume on all three bricks (web1, web2, web3). All was working good until I changed the IP addresses of bricks, because after then only the GlusterFS daemon on web1 is starting well, and the deamons on web2 and web3 are exiting with these errors: [2015-06-08 07:59:15.929330] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid) [2015-06-08 07:59:15.932417] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2015-06-08 07:59:15.932482] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2015-06-08 07:59:15.933772] W [rdma.c:4221:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2015-06-08 07:59:15.933815] E [rdma.c:4519:init] 0-rdma.management: Failed to initialize IB Device [2015-06-08 07:59:15.933838] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2015-06-08 07:59:15.933887] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-06-08 07:59:17.354500] I [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30600 [2015-06-08 07:59:17.527377] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-06-08 07:59:17.527446] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-06-08 07:59:17.527499] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-06-08 07:59:17.528139] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-06-08 07:59:17.528861] E [glusterd-store.c:4244:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2015-06-08 07:59:17.528891] E [xlator.c:425:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2015-06-08 07:59:17.528906] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed [2015-06-08 07:59:17.528917] E [graph.c:525:glusterfs_graph_activate] 0-graph: init failed [2015-06-08 07:59:17.529257] W [glusterfsd.c:1194:cleanup_and_exit] (-- 0-: received signum (0), shutting down Please note that bricks name are setted in /etc/hosts and all of them are resolving well with the new IP addresses, so I cannot find out where the problem is. Could you help me please? Thank you very much! Bye ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] using a preferred node ?
Hello Ravi, thanks for clearing things up. Anything on the roadmap that would help my case? Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-06-08 6:37 GMT+02:00 Ravishankar N ravishan...@redhat.com: On 06/06/2015 12:49 AM, Mathieu Chateau wrote: Hello, sorry to bother again but I am still facing this issue. client still looks on the other side and not using the node declared in fstab: prd-sta-sto01:/gluster-preprod /mnt/gluster-preprod glusterfs defaults,_netdev,backupvolfile-server=prd-sta-sto02 0 0 I expect client to use sto01 and not sto02 as it's available. Hi Mathieu, When you do lookups (`ls` etc), they are sent to both bricks of the replica. If you write to a file, the write is also sent to both bricks. This is how it works. Only reads are served from the local brick. -Ravi If I add a static route to break connectivity to sto02 and do a df, I have around 30s before it works. Then it works ok. Questions: - How to force node to stick as possible with one specific (local) node ? - How to know where a client is currently connected? Thanks for your help :) Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:26 GMT+02:00 Mathieu Chateau mathieu.chat...@lotp.fr: Hello, thanks for helping :) If gluster server is rebooted, any way to make client failback on node after reboot ? How to know which node is using a client ? I see TCP connection to both node Regards, Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:13 GMT+02:00 Ravishankar N ravishan...@redhat.com: On 05/10/2015 08:29 PM, Mathieu Chateau wrote: Hello, Short way: Is there any way to define a preferred Gluster server ? Long way: I have the following setup (version 3.6.3) : Gluster A == VPN == Gluster B Volume is replicated between A and B. They are in same datacenter, using a 1Gb/s connection, low latency (0.5ms) I have gluster clients in lan A B. When doing a ls on big folder (~60k files), both gluster node are used, and so it need 9mn instead on 1mn if only the local gluster is reachable. Lookups (and writes of course) from clients are sent to both bricks because AFR uses the result of the lookup to select which brick to read from if there is a pending heal etc. If the file is clean on both A and B, then reads are always served from the local brick. i.e. reads on clients mounted on A will be served from the brick in A (and likewise for B). Hope that helps, Ravi It's HA setup, application is present on both side. I would like a master/master setup, but using only local node as possible. Regards, Mathieu CHATEAU http://www.lotp.fr ___ Gluster-users mailing listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] HA storage based on two nodes with one point of failure
2015-06-08 8:32 GMT+03:00 Ravishankar N ravishan...@redhat.com: On 06/08/2015 02:38 AM, Юрий Полторацкий wrote: Hi, I have made a lab with a config listed below and have got unexpected result. Someone, tell me, please, where did I go wrong? I am testing oVirt. Data Center has two clusters: the first as a computing with three nodes (node1, node2, node3); the second as a storage (node5, node6) based on glusterfs (replica 2). I want the storage to be HA. I have read here https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/html/Administration_Guide/sect-Managing_Split-brain.html next: For a replicated volume with two nodes and one brick on each machine, if the server-side quorum is enabled and one of the nodes goes offline, the other node will also be taken offline because of the quorum configuration. As a result, the high availability provided by the replication is ineffective. To prevent this situation, a dummy node can be added to the trusted storage pool which does not contain any bricks. This ensures that even if one of the nodes which contains data goes offline, the other node will remain online. Note that if the dummy node and one of the data nodes goes offline, the brick on other node will be also be taken offline, and will result in data unavailability. So, I have added my Engine (not self-hosted) as a dummy node without a brick and have configured quorum as listed below: cluster.quorum-type: fixed cluster.quorum-count: 1 cluster.server-quorum-type: server cluster.server-quorum-ratio: 51% Then, I've run a VM and have dropped the network link from node6, after one a hour have switched back the link and after a while have got a split-brain. But why? No one could write to the brick on node6: the VM was running on node3 and node1 was SPM. It could have happened that after node6 came up, the client(s) saw a temporary disconnect of node 5 and a write happened at that time. When the node 5 is connected again, we have AFR xattrs on both nodes blaming each other, causing split-brain. For a replica 2 setup. it is best to set the client-quorum to auto instead of fixed. What this means is that the first node of the replica must always be up for writes to be permitted. If the first node goes down, the volume becomes read-only. Yes, at first I have tested with client-quorum auto, but my VMs has been paused when the first node goes down and this is not unacceptable Ok, I understood: there is now way to have fault tolerance storage with only two servers using GlusterFS. I have to get another one. Thanks. For better availability , it would be better to use a replica 3 volume with (again with client-quorum set to auto). If you are using glusterfs 3.7, you can also consider using the arbiter configuration [1] for replica 3. [1] https://github.com/gluster/glusterfs/blob/master/doc/features/afr-arbiter-volumes.md Thanks, Ravi Gluster's log from node6: Июн 07 15:35:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]: [2015-06-07 12:35:06.106270] C [MSGID: 106002] [glusterd-server-quorum.c:356:glusterd_do_volume_quorum_action] 0-management: Server quorum lost for volume vol3. Stopping local bricks. Июн 07 16:30:06 node6.virt.local etc-glusterfs-glusterd.vol[28491]: [2015-06-07 13:30:06.261505] C [MSGID: 106003] [glusterd-server-quorum.c:351:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vol3. Starting local bricks. gluster volume heal vol3 info Brick node5.virt.local:/storage/brick12/ /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain Number of entries: 1 Brick node6.virt.local:/storage/brick13/ /5d0bb2f3-f903-4349-b6a5-25b549affe5f/dom_md/ids - Is in split-brain Number of entries: 1 gluster volume info vol3 Volume Name: vol3 Type: Replicate Volume ID: 69ba8c68-6593-41ca-b1d9-40b3be50ac80 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: node5.virt.local:/storage/brick12 Brick2: node6.virt.local:/storage/brick13 Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 cluster.server-quorum-type: server cluster.quorum-type: fixed network.remote-dio: enable cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off auth.allow: * user.cifs: disable nfs.disable: on performance.readdir-ahead: on cluster.quorum-count: 1 cluster.server-quorum-ratio: 51% 06.06.2015 12:09, Юрий Полторацкий пишет: Hi, I want to build a HA storage based on two servers. I want that if one goes down, my storage will be available in RW mode. If I will use replica 2, then split-brain can occur. To avoid this I would use a quorum. As I understand correctly, I can use quorum on a client side, on a server side, or on both. I want to add a dummy node without a brick and make such config: cluster.quorum-type: fixed
Re: [Gluster-users] using a preferred node ?
On 06/08/2015 11:34 AM, Mathieu Chateau wrote: Hello Ravi, thanks for clearing things up. Anything on the roadmap that would help my case? I don't think it would be possible for clients to do I/O only on its local brick and yet expect the bricks' contents to be in sync in real-time.. Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-06-08 6:37 GMT+02:00 Ravishankar N ravishan...@redhat.com mailto:ravishan...@redhat.com: On 06/06/2015 12:49 AM, Mathieu Chateau wrote: Hello, sorry to bother again but I am still facing this issue. client still looks on the other side and not using the node declared in fstab: prd-sta-sto01:/gluster-preprod /mnt/gluster-preprod glusterfs defaults,_netdev,backupvolfile-server=prd-sta-sto02 0 0 I expect client to use sto01 and not sto02 as it's available. Hi Mathieu, When you do lookups (`ls` etc), they are sent to both bricks of the replica. If you write to a file, the write is also sent to both bricks. This is how it works. Only reads are served from the local brick. -Ravi If I add a static route to break connectivity to sto02 and do a df, I have around 30s before it works. Then it works ok. Questions: * How to force node to stick as possible with one specific (local) node ? * How to know where a client is currently connected? Thanks for your help :) Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:26 GMT+02:00 Mathieu Chateau mathieu.chat...@lotp.fr mailto:mathieu.chat...@lotp.fr: Hello, thanks for helping :) If gluster server is rebooted, any way to make client failback on node after reboot ? How to know which node is using a client ? I see TCP connection to both node Regards, Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:13 GMT+02:00 Ravishankar N ravishan...@redhat.com mailto:ravishan...@redhat.com: On 05/10/2015 08:29 PM, Mathieu Chateau wrote: Hello, Short way: Is there any way to define a preferred Gluster server ? Long way: I have the following setup (version 3.6.3) : Gluster A == VPN == Gluster B Volume is replicated between A and B. They are in same datacenter, using a 1Gb/s connection, low latency (0.5ms) I have gluster clients in lan A B. When doing a ls on big folder (~60k files), both gluster node are used, and so it need 9mn instead on 1mn if only the local gluster is reachable. Lookups (and writes of course) from clients are sent to both bricks because AFR uses the result of the lookup to select which brick to read from if there is a pending heal etc. If the file is clean on both A and B, then reads are always served from the local brick. i.e. reads on clients mounted on A will be served from the brick in A (and likewise for B). Hope that helps, Ravi It's HA setup, application is present on both side. I would like a master/master setup, but using only local node as possible. Regards, Mathieu CHATEAU http://www.lotp.fr ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitorig gluster 3.6.1
You may use gluster nagios plugin for monitoring purpose. You can get more details from here : http://www.gluster.org/pipermail/gluster-users/2014-June/017819.html --Humble On Mon, Jun 8, 2015 at 12:42 PM, M S Vishwanath Bhat msvb...@gmail.com wrote: On 1 June 2015 at 12:28, Félix de Lelelis felix.deleli...@gmail.com wrote: Hi, I have monitoring gluster with scripts that lunch scripts. All scripts are redirected to a one script that check if is active any process glusterd and if the repsonse its false, the script lunch the check. All checks are: - gluster volume volname info - gluster volume heal volname info - gluster volume heal volname split-brain - gluster volume volname status detail - gluster volume volname statistics Since I enable the monitoring in our pre-production gluster, the gluster is down 2 times. We suspect that the monitoring are overloading but should not. The question is, there any way to check those states otherwise? You can make use of https://github.com/keithseahus/fluent-plugin-glusterfs as well. http://docs.fluentd.org/articles/collect-glusterfs-logs HTH Best Regards, Vishwanath Thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Double counting of quota
Hi Vijiay, the use case is very simple. I'm using gluster 3.7.1 with a replicated volume (replica 2), enabling quota. The 2 bricks are on the same machine but using two disks of the same size, but unfortunately one of them is slower, but I think it is irrelevant. Although I do not think it is important for this use case, but I want to note that the volumes are xfs formatted with default options and are thin logical volumes. The gluster server is running on CentIS 7.1. The problem occurs when using rsync to copy from an external source into the gluster volume: if I copy without specifying the temp dir rsync uses the current dir for temporaries and this is taken into account, at least in one of the two bricks. I can confirm it's only happening in one of the bricks by looking at the xattrs of the same dir on the two bricks, as the quota values are different. At the moment I have recreated the bricks and started the copy over and it seems much better now that I'm explicitly asking rsync to use /tmp for temporaries. Anyways, I'm still seeing errors in the logs that I will report later. Many thanks for the help, Alessandro Il giorno 08/giu/2015, alle ore 08:38, Vijaikumar M vmall...@redhat.com ha scritto: Hi Alessandro, Please provide the test-case, so that we can try to re-create this problem in-house? Thanks, Vijay On Saturday 06 June 2015 05:59 AM, Alessandro De Salvo wrote: Hi, just to answer to myself, it really seems the temp files from rsync are the culprit, it seems that their size are summed up to the real contents of the directories I’m synchronizing, or in other terms their size is not removed from the used size after they are removed. I suppose this is someway connected to the error on removexattr I’m seeing. The temporary solution I’ve found is to use rsync with the option to write the temp files to /tmp, but it would be very interesting to understand why this is happening. Cheers, Alessandro Il giorno 06/giu/2015, alle ore 01:19, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi, I currently have two brick with replica 2 on the same machine, pointing to different disks of a connected SAN. The volume itself is fine: # gluster volume info atlas-home-01 Volume Name: atlas-home-01 Type: Replicate Volume ID: 660db960-31b8-4341-b917-e8b43070148b Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: host1:/bricks/atlas/home02/data Brick2: host2:/bricks/atlas/home01/data Options Reconfigured: performance.write-behind-window-size: 4MB performance.io-thread-count: 32 performance.readdir-ahead: on server.allow-insecure: on nfs.disable: true features.quota: on features.inode-quota: on However, when I set a quota on a dir of the volume the size show is twice the physical size of the actual dir: # gluster volume quota atlas-home-01 list /user1 Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /user14.0GB 80% 3.2GB 853.4MB No No # du -sh /storage/atlas/home/user1 1.6G/storage/atlas/home/user1 If I remove one of the bricks the quota shows the correct value. Is there any double counting in case the bricks are on the same machine? Also, I see a lot of errors in the logs like the following: [2015-06-05 21:59:27.450407] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-home-01-posix: could not read the link from the gfid handle /bricks/atlas/home01/data/.glusterfs/be/e5/bee5e2b8-c639-4539-a483-96c19cd889eb (No such file or directory) and also [2015-06-05 22:52:01.112070] E [marker-quota.c:2363:mq_mark_dirty] 0-atlas-home-01-marker: failed to get inode ctx for /user1/file1 When running rsync I also see the following errors: [2015-06-05 23:06:22.203968] E [marker-quota.c:2601:mq_remove_contri] 0-atlas-home-01-marker: removexattr trusted.glusterfs.quota.fddf31ba-7f1d-4ba8-a5ad-2ebd6e4030f3.contri failed for /user1/..bashrc.O4kekp: No data available Those files are the temp files of rsync, I’m not sure why the throw errors in glusterfs. Any help? Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] using a preferred node ?
On 06/08/2015 11:51 AM, Mathieu Chateau wrote: From this slide (maybe outdated) it says that reads are also balanced (in replication scenario slide 22): http://www.gluster.org/community/documentation/images/8/80/GlusterFS_Architecture_%26_Roadmap-Vijay_Bellur-LinuxCon_EU_2013.pdf Except for write, having an option to do only failover for reads lookup would be possible I guess ? Lookups have to be sent to both bricks because AFR uses the response to determine if there is a stale copy etc (and then serve from the good copy). For reads, if a client is also mounted on the same machine as the brick, reads will be served from that brick automatically. You can also use the cluster.read-subvolume option to explicitly force the client to read from a brick: /`gluster volume set help//` // snip// // Option: cluster.read-subvolume// //Default Value: (null)// //Description: inode-read fops happen only on one of the bricks in replicate. Afr will prefer the one specified using this option if it is not stale. Option value must be one of the xlator names of the children. Ex: volname-client-0 till volname-client-number-of-bricks - 1// // ///snip// // / Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-06-08 8:11 GMT+02:00 Ravishankar N ravishan...@redhat.com mailto:ravishan...@redhat.com: On 06/08/2015 11:34 AM, Mathieu Chateau wrote: Hello Ravi, thanks for clearing things up. Anything on the roadmap that would help my case? I don't think it would be possible for clients to do I/O only on its local brick and yet expect the bricks' contents to be in sync in real-time.. Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-06-08 6:37 GMT+02:00 Ravishankar N ravishan...@redhat.com mailto:ravishan...@redhat.com: On 06/06/2015 12:49 AM, Mathieu Chateau wrote: Hello, sorry to bother again but I am still facing this issue. client still looks on the other side and not using the node declared in fstab: prd-sta-sto01:/gluster-preprod /mnt/gluster-preprod glusterfs defaults,_netdev,backupvolfile-server=prd-sta-sto02 0 0 I expect client to use sto01 and not sto02 as it's available. Hi Mathieu, When you do lookups (`ls` etc), they are sent to both bricks of the replica. If you write to a file, the write is also sent to both bricks. This is how it works. Only reads are served from the local brick. -Ravi If I add a static route to break connectivity to sto02 and do a df, I have around 30s before it works. Then it works ok. Questions: * How to force node to stick as possible with one specific (local) node ? * How to know where a client is currently connected? Thanks for your help :) Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:26 GMT+02:00 Mathieu Chateau mathieu.chat...@lotp.fr mailto:mathieu.chat...@lotp.fr: Hello, thanks for helping :) If gluster server is rebooted, any way to make client failback on node after reboot ? How to know which node is using a client ? I see TCP connection to both node Regards, Cordialement, Mathieu CHATEAU http://www.lotp.fr 2015-05-11 7:13 GMT+02:00 Ravishankar N ravishan...@redhat.com mailto:ravishan...@redhat.com: On 05/10/2015 08:29 PM, Mathieu Chateau wrote: Hello, Short way: Is there any way to define a preferred Gluster server ? Long way: I have the following setup (version 3.6.3) : Gluster A == VPN == Gluster B Volume is replicated between A and B. They are in same datacenter, using a 1Gb/s connection, low latency (0.5ms) I have gluster clients in lan A B. When doing a ls on big folder (~60k files), both gluster node are used, and so it need 9mn instead on 1mn if only the local gluster is reachable. Lookups (and writes of course) from clients are sent to both bricks because AFR uses the result of the lookup to select which brick to read from if there is a pending heal etc. If the file is clean on both A and B, then reads are always served from the local brick. i.e. reads on clients mounted on A will be served from the brick in A (and likewise for B). Hope that helps, Ravi It's HA setup, application is present on both side. I would like a master/master setup, but
Re: [Gluster-users] slave is rebalancing, master is not?
On 5 June 2015 at 20:46, Dr. Michael J. Chudobiak m...@avtechpulse.com wrote: I seem to have an issue with my replicated setup. The master says no rebalancing is happening, but the slave says there is (sort of). The master notes the issue: [2015-06-05 15:11:26.735361] E [glusterd-utils.c:9993:glusterd_volume_status_aggregate_tasks_status] 0-management: Local tasks count (0) and remote tasks count (1) do not match. Not aggregating tasks status. The slave shows some odd messages like this: [2015-06-05 14:44:56.525402] E [glusterfsd-mgmt.c:1494:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server I want the supposed rebalancing to stop, so I can add bricks. Any idea what is going on, and how to fix it? Both servers were recently upgraded from Fedora 21 to 22. Status output is below. - Mike Master: [root@karsh ~]# /usr/sbin/gluster volume status Status of volume: volume1 Gluster process PortOnline Pid -- Brick karsh:/gluster/brick1/data49152 Y 4023 Brick xena:/gluster/brick2/data 49152 Y 1719 Brick karsh:/gluster/brick3/data49153 Y 4015 Brick xena:/gluster/brick4/data 49153 Y 1725 NFS Server on localhost 2049Y 4022 Self-heal Daemon on localhost N/A Y 4034 NFS Server on xena 2049Y 24550 Self-heal Daemon on xenaN/A Y 24557 Task Status of Volume volume1 -- There are no active volume tasks [root@xena glusterfs]# /usr/sbin/gluster volume status Status of volume: volume1 Gluster process PortOnline Pid -- Brick karsh:/gluster/brick1/data49152 Y 4023 Brick xena:/gluster/brick2/data 49152 Y 1719 Brick karsh:/gluster/brick3/data49153 Y 4015 Brick xena:/gluster/brick4/data 49153 Y 1725 NFS Server on localhost 2049Y 24550 Self-heal Daemon on localhost N/A Y 24557 NFS Server on 192.168.0.240 2049Y 4022 Self-heal Daemon on 192.168.0.240 N/A Y 4034 Task Status of Volume volume1 -- Task : Rebalance ID : f550b485-26c4-49f8-b7dc-055c678afce8 Status : in progress [root@xena glusterfs]# gluster volume rebalance volume1 status volume rebalance: volume1: success: This is weird. Did you start rebalance yourself? What does gluster volume rebalance volume1 status say? Also check if both the nodes are properly connected using gluster peer status. If it says completed/stopped, you can go ahead and add the bricks. Also can you check if rebalance process is running in your second server (xena?) BTW, there is *no* master and slave in a single gluster volume :) Best Regards, Vishwanath ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 2 node replica 2 cluster - volume on one node stopped responding
Some extra points: - 10.100.3.41 is one of the oVirt hosts. - I only needed to restart glusterfsd glusterd in one of the gluster nodes (also the one where I pulled the logs from) to get everything in working order. - it's a separate gluster volume, not managed from oVirt engine. On 8 June 2015 at 11:35, Tiemen Ruiten t.rui...@rdmedia.com wrote: Hello, We are running an oVirt cluster on top of a 2 node replica 2 Gluster volume. Yesterday we suddenly noticed VMs were not responding and quickly found out the Gluster volume had issues. These errors were filling up the etc-glusterfs-glusterd.log file: [2015-06-07 08:36:26.498012] W [rpcsvc.c:270:rpcsvc_program_actor] 0-rpc-service: RPC program not available (req 1298437 330) for 10.100.3.41:1022 [2015-06-07 08:36:26.498073] E [rpcsvc.c:565:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully A restart of glusterfsd and glusterd resolved the issue, but triggered a lot of self-heals. We are running glusterfs 3.7.0 on ZFS. I have attached etc-glusterfs-glusterd.log, the brick log file and the glustershd.log. I would be grateful if anyone could shed any light on what happened here and if there's anything we can do to prevent it. -- Tiemen Ruiten Systems Engineer RD Media -- Tiemen Ruiten Systems Engineer RD Media ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster does not seem to detect a split-brain situation
Ah, that's really weird. I'm pretty sure that nothing ever made write changes to /export on either machine, so I wonder how the hard links ended up being split. I'll indeed clean up the .glusterfs directory and keep close tabs on Gluster's repair. Glustershd.log and the client mount logs (data.log and gluster.log at least) on the client are empty and nothing appears when I read the mismatching studies.dat file. Thanks for your help! Sjors Op zo 7 jun. 2015 om 22:10 schreef Joe Julian j...@julianfamily.org: (oops... I hate when I reply off-list) That warning should, imho, be an error. That's saying that the handle, which should be a hardlink to the file, doesn't have a matching inode. It should if it's a hardlink. If it were me, I would: find /export/sdb1/data/.glusterfs -type f -links 1 -print0 | xargs /bin/rm This would clean up any handles that are not hardlinked where they should be and will allow gluster to repair them. Btw, the self-heal errors would be in glustershd.log and/or the client mount log(s), not (usually) the brick logs. On 06/07/2015 12:21 PM, Sjors Gielen wrote: Oops! Accidentally ran the command as non-root on Curacao, that's why there was no output. The actual output is: curacao# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat getfattr: Removing leading '/' from absolute path names # file: export/sdb1/data/Case/21000355/studies.dat trusted.afr.data-client-0=0x trusted.afr.data-client-1=0x trusted.afr.dirty=0x trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 For reference, the output on bonaire: bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat getfattr: Removing leading '/' from absolute path names # file: export/sdb1/data/Case/21000355/studies.dat trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 Op zo 7 jun. 2015 om 21:13 schreef Sjors Gielen sj...@sjorsgielen.nl: I'm reading about quorums, I haven't set up anything like that yet. (In reply to Joe Julian, who responded off-list) The output of getfattr on bonaire: bonaire# getfattr -m . -d -e hex /export/sdb1/data/Case/21000355/studies.dat getfattr: Removing leading '/' from absolute path names # file: export/sdb1/data/Case/21000355/studies.dat trusted.gfid=0xfb34574974cf4804b8b80789738c0f81 On curacao, the command gives no output. From `gluster volume status`, it seems that while the brick curacao:/export/sdb1/data is online, it has no associated port number. Curacao can connect to the port number provided by Bonaire just fine. There are no firewalls on/between the two machines, they are on the same subnet connected by Ethernet cables and two switches. By the way, warning messages just started appearing to /var/log/glusterfs/bricks/export-sdb1-data.log on Bonaire saying mismatching ino/dev between file X and handle Y, though, maybe only just now even though I started the full self-heal hours ago. [2015-06-07 19:10:39.624393] W [posix-handle.c:727:posix_handle_hard] 0-data-posix: mismatching ino/dev between file /export/sdb1/data/Archive/S21/21008971/studies.dat (9127104621/2065) and handle /export/sdb1/data/.glusterfs/97/c2/97c2a65d-36e0-4566-a5c1-5925f97af1fd (9190215976/2065) Thanks again! Sjors Op zo 7 jun. 2015 om 19:13 schreef Sjors Gielen sj...@sjorsgielen.nl: Hi all, I work at a small, 8-person company that uses Gluster for its primary data storage. We have a volume called data that is replicated over two servers (details below). This worked perfectly for over a year, but lately we've been noticing some mismatches between the two bricks, so it seems there has been some split-brain situation that is not being detected or resolved. I have two questions about this: 1) I expected Gluster to (eventually) detect a situation like this; why doesn't it? 2) How do I fix this situation? I've tried an explicit 'heal', but that didn't seem to change anything. Thanks a lot for your help! Sjors --8-- Volume peer info: http://pastebin.com/PN7tRXdU curacao# md5sum /export/sdb1/data/Case/21000355/studies.dat 7bc2daec6be953ffae920d81fe6fa25c /export/sdb1/data/Case/21000355/studies.dat bonaire# md5sum /export/sdb1/data/Case/21000355/studies.dat 28c950a1e2a5f33c53a725bf8cd72681 /export/sdb1/data/Case/21000355/studies.dat # mallorca is one of the clients mallorca# md5sum /data/Case/21000355/studies.dat 7bc2daec6be953ffae920d81fe6fa25c /data/Case/21000355/studies.dat I expected an input/output error after reading this file, because of the split-brain situation, but got none. There are no entries in the GlusterFS logs of either bonaire or curacao. bonaire# gluster volume heal data full Launching heal operation to perform full self heal on volume data has been successful Use heal info commands to check status bonaire# gluster volume heal data info Brick
Re: [Gluster-users] Cannot start Gluster -- resolve brick failed in restore
On 06/08/2015 01:38 PM, shacky wrote: Hi. I have a GlusterFS cluster running on a Debian Wheezy with GlusterFS 3.6.2, with one volume on all three bricks (web1, web2, web3). All was working good until I changed the IP addresses of bricks, because after then only the GlusterFS daemon on web1 is starting well, and the deamons on web2 and web3 are exiting with these errors: [2015-06-08 07:59:15.929330] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid) [2015-06-08 07:59:15.932417] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536 [2015-06-08 07:59:15.932482] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory [2015-06-08 07:59:15.933772] W [rdma.c:4221:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device) [2015-06-08 07:59:15.933815] E [rdma.c:4519:init] 0-rdma.management: Failed to initialize IB Device [2015-06-08 07:59:15.933838] E [rpc-transport.c:333:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2015-06-08 07:59:15.933887] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2015-06-08 07:59:17.354500] I [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 30600 [2015-06-08 07:59:17.527377] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-06-08 07:59:17.527446] I [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2015-06-08 07:59:17.527499] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-06-08 07:59:17.528139] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2015-06-08 07:59:17.528861] E [glusterd-store.c:4244:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore [2015-06-08 07:59:17.528891] E [xlator.c:425:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again [2015-06-08 07:59:17.528906] E [graph.c:322:glusterfs_graph_init] 0-management: initializing translator failed [2015-06-08 07:59:17.528917] E [graph.c:525:glusterfs_graph_activate] 0-graph: init failed [2015-06-08 07:59:17.529257] W [glusterfsd.c:1194:cleanup_and_exit] (-- 0-: received signum (0), shutting down Please note that bricks name are setted in /etc/hosts and all of them are resolving well with the new IP addresses, so I cannot find out where the problem is. Could you help me please? Here is what you can do on the nodes where glusterD fails to start: 1. cd /var/lib/glusterd 2. grep -irns old ip The output will be similar like this : vols/test-vol/info:20:brick-0=172.17.0.2:-tmp-b1 vols/test-vol/info:21:brick-1=172.17.0.2:-tmp-b2 vols/test-vol/test-vol.tcp-fuse.vol:6:option remote-host 172.17.0.2 vols/test-vol/test-vol.tcp-fuse.vol:15:option remote-host 172.17.0.2 vols/test-vol/trusted-test-vol.tcp-fuse.vol:8:option remote-host 172.17.0.2 vols/test-vol/trusted-test-vol.tcp-fuse.vol:19:option remote-host 172.17.0.2 vols/test-vol/test-vol-rebalance.vol:6:option remote-host 172.17.0.2 vols/test-vol/test-vol-rebalance.vol:15:option remote-host 172.17.0.2 vols/test-vol/bricks/172.17.0.1:-tmp-b1:1:hostname=172.17.0.2 vols/test-vol/bricks/172.17.0.1:-tmp-b2:1:hostname=172.17.0.2 nfs/nfs-server.vol:8:option remote-host 172.17.0.2 nfs/nfs-server.vol:19:option remote-host 172.17.0.2 3. find . * -exec sed -i s/old ip/new ip/g {} \; 4. You would need to manually rename few files (for eg : mv vols/test-vol/bricks/172.17.0.1:-tmp-b1 vols/test-vol/bricks/172.17.0.2:-tmp-b1) Do this exercise on all the failed nodes and recheck and let me know if it works. ~Atin Thank you very much! Bye ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -- ~Atin ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Double counting of quota
We have open bug 1227724 for the similar problem Thanks, Rajesh On 06/08/2015 12:08 PM, Vijaikumar M wrote: Hi Alessandro, Please provide the test-case, so that we can try to re-create this problem in-house? Thanks, Vijay On Saturday 06 June 2015 05:59 AM, Alessandro De Salvo wrote: Hi, just to answer to myself, it really seems the temp files from rsync are the culprit, it seems that their size are summed up to the real contents of the directories I’m synchronizing, or in other terms their size is not removed from the used size after they are removed. I suppose this is someway connected to the error on removexattr I’m seeing. The temporary solution I’ve found is to use rsync with the option to write the temp files to /tmp, but it would be very interesting to understand why this is happening. Cheers, Alessandro Il giorno 06/giu/2015, alle ore 01:19, Alessandro De Salvoalessandro.desa...@roma1.infn.it ha scritto: Hi, I currently have two brick with replica 2 on the same machine, pointing to different disks of a connected SAN. The volume itself is fine: # gluster volume info atlas-home-01 Volume Name: atlas-home-01 Type: Replicate Volume ID: 660db960-31b8-4341-b917-e8b43070148b Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: host1:/bricks/atlas/home02/data Brick2: host2:/bricks/atlas/home01/data Options Reconfigured: performance.write-behind-window-size: 4MB performance.io-thread-count: 32 performance.readdir-ahead: on server.allow-insecure: on nfs.disable: true features.quota: on features.inode-quota: on However, when I set a quota on a dir of the volume the size show is twice the physical size of the actual dir: # gluster volume quota atlas-home-01 list /user1 Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /user14.0GB 80% 3.2GB 853.4MB No No # du -sh /storage/atlas/home/user1 1.6G/storage/atlas/home/user1 If I remove one of the bricks the quota shows the correct value. Is there any double counting in case the bricks are on the same machine? Also, I see a lot of errors in the logs like the following: [2015-06-05 21:59:27.450407] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-home-01-posix: could not read the link from the gfid handle /bricks/atlas/home01/data/.glusterfs/be/e5/bee5e2b8-c639-4539-a483-96c19cd889eb (No such file or directory) and also [2015-06-05 22:52:01.112070] E [marker-quota.c:2363:mq_mark_dirty] 0-atlas-home-01-marker: failed to get inode ctx for /user1/file1 When running rsync I also see the following errors: [2015-06-05 23:06:22.203968] E [marker-quota.c:2601:mq_remove_contri] 0-atlas-home-01-marker: removexattr trusted.glusterfs.quota.fddf31ba-7f1d-4ba8-a5ad-2ebd6e4030f3.contri failed for /user1/..bashrc.O4kekp: No data available Those files are the temp files of rsync, I’m not sure why the throw errors in glusterfs. Any help? Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitorig gluster 3.6.1
Am Montag, 8. Juni 2015, 12:42:17 schrieb M S Vishwanath Bhat: On 1 June 2015 at 12:28, Félix de Lelelis felix.deleli...@gmail.com wrote: Hi, I have monitoring gluster with scripts that lunch scripts. All scripts are redirected to a one script that check if is active any process glusterd and if the repsonse its false, the script lunch the check. All checks are: - gluster volume volname info - gluster volume heal volname info - gluster volume heal volname split-brain - gluster volume volname status detail - gluster volume volname statistics Since I enable the monitoring in our pre-production gluster, the gluster is down 2 times. We suspect that the monitoring are overloading but should not. The question is, there any way to check those states otherwise? You can make use of https://github.com/keithseahus/fluent-plugin-glusterfs as well. http://docs.fluentd.org/articles/collect-glusterfs-logs HTH Best Regards, Vishwanath gluster lacks the implementation of a SNMP agent, so ALL monitoring systems can monitor it. Home-grown scripts or implementations for one specific monitoring system cannot be the solution. Mit freundlichen Grüßen, Michael Schwartzkopff -- [*] sys4 AG http://sys4.de, +49 (89) 30 90 46 64, +49 (162) 165 0044 Franziskanerstraße 15, 81669 München Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263 Vorstand: Patrick Ben Koetter, Marc Schiffbauer Aufsichtsratsvorsitzender: Florian Kirstein signature.asc Description: This is a digitally signed message part. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Questions on ganesha HA and shared storage size
Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
Hello, Do you know more about? In addition, do you know how to « activate » RDMA for my volume with Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type option (both in server and client side) but I notice all streams are using TCP stack -and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s). Thanks in advance, Geoffrey -- Geoffrey Letessier Responsable informatique ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr Le 2 juin 2015 à 23:45, Geoffrey Letessier geoffrey.letess...@cnrs.fr a écrit : Hi Ben, I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)… [root@nisus test]# ddt -t 10g /mnt/test/ Writing to /mnt/test/ddt.8362 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/test/ddt.8362 ... done. 10240MiBKiB/s CPU% Write 114770 4 Read40675 4 for info: /mnt/test concerns the single v2 GlFS volume [root@nisus test]# ddt -t 10g /mnt/fhgfs/ Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/fhgfs/ddt.8380 ... done. 10240MiBKiB/s CPU% Write 102591 1 Read98079 2 Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)? --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s |~47s | ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s |~57s | ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --- | native FS |~11s |~4s | ~2s |~56s | ~10s | --- | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | --- | single (v2) | ~3m6s | ~14s |~32s | ~1m2s | ~44s | --- for info: -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers) - single (v2): simple gluster volume with default settings I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK. Thank you very much for your reply and help. Geoffrey --- Geoffrey Letessier Responsable informatique ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr mailto:geoffrey.letess...@cnrs.fr Le 2 juin 2015 à 21:53, Ben Turner btur...@redhat.com mailto:btur...@redhat.com a écrit : I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like: Jun 2 15:23:14 gqac006 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jun 2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999 1 0x0080 Jun 2 15:23:14 gqac006 kernel: 880611321cc8 0082 880611321c18 a027236e Jun 2 15:23:14 gqac006 kernel: 880611321c48 a0272c10 88052bd1e040 880611321c78 Jun 2 15:23:14 gqac006 kernel: 88052bd1e0f0 88062080c7a0 880625addaf8 880611321fd8 Jun 2 15:23:14 gqac006 kernel: Call Trace: Jun 2 15:23:14 gqac006 kernel: [a027236e] ? rpc_make_runnable+0x7e/0x80 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [a0272c10] ? rpc_execute+0x50/0xa0 [sunrpc] Jun 2 15:23:14 gqac006 kernel: [810aaa21] ? ktime_get_ts+0xb1/0xf0 Jun 2 15:23:14 gqac006 kernel: [811242d0] ? sync_page+0x0/0x50 Jun 2 15:23:14 gqac006 kernel: [8152a1b3] io_schedule+0x73/0xc0 Jun 2 15:23:14 gqac006 kernel: [8112430d] sync_page+0x3d/0x50 Jun 2 15:23:14 gqac006 kernel: [8152ac7f] __wait_on_bit+0x5f/0x90 Jun 2 15:23:14 gqac006 kernel: [81124543] wait_on_page_bit+0x73/0x80 Jun 2 15:23:14 gqac006 kernel: [8109eb80] ? wake_bit_function+0x0/0x50 Jun 2
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Great, many thanks Soumya! Cheers, Alessandro Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha scritto: Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Quota issue
In addition, i notice a very big difference between the sum of DU on each brick and « quota list » display, as you can read below: [root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/amyloid_team cl-storage1: 1,6T /export/brick_home/brick1/amyloid_team cl-storage3: 1,6T /export/brick_home/brick1/amyloid_team cl-storage1: 1,6T /export/brick_home/brick2/amyloid_team cl-storage3: 1,6T /export/brick_home/brick2/amyloid_team [root@lucifer ~]# gluster volume quota vol_home list /amyloid_team Path Hard-limit Soft-limit Used Available /amyloid_team 9.0TB 90% 7.8TB 1.2TB As you can notice, the sum of all bricks gives me roughly 6.4TB and « quota list » around 7.8TB; so there is a difference of 1.4TB i’m not able to explain… Do you have any idea? Thanks, Geoffrey -- Geoffrey Letessier Responsable informatique ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr Le 8 juin 2015 à 14:30, Geoffrey Letessier geoffrey.letess...@cnrs.fr a écrit : Hello, Concerning the 3.5.3 version of GlusterFS, I met this morning a strange issue writing file when quota is exceeded. One person of my lab, whose her quota is exceeded (but she didn’t know about) try to modify a file but, because of exceeded quota, she was unable to and decided to exit VI. Now, her file is empty/blank as you can read below: pdsh@lucifer: cl-storage3: ssh exited with exit code 2 cl-storage1: -T 2 tarus amyloid_team 0 19 févr. 12:34 /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0 8 juin 12:38 /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh In addition, i dont understand why, my volume being a distributed volume inside replica (cl-storage[1,3] is replicated only on cl-storage[2,4]), i have 2 « same » files (complete path) in 2 different bricks (as you can read above). Thanks by advance for your help and clarification. Geoffrey -- Geoffrey Letessier Responsable informatique ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr mailto:geoffrey.letess...@ibpc.fr Le 2 juin 2015 à 23:45, Geoffrey Letessier geoffrey.letess...@cnrs.fr mailto:geoffrey.letess...@cnrs.fr a écrit : Hi Ben, I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)… [root@nisus test]# ddt -t 10g /mnt/test/ Writing to /mnt/test/ddt.8362 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/test/ddt.8362 ... done. 10240MiBKiB/s CPU% Write 114770 4 Read40675 4 for info: /mnt/test concerns the single v2 GlFS volume [root@nisus test]# ddt -t 10g /mnt/fhgfs/ Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/fhgfs/ddt.8380 ... done. 10240MiBKiB/s CPU% Write 102591 1 Read98079 2 Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)? --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s |~47s | ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s |~57s | ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --- | native FS |~11s |~4s | ~2s |~56s | ~10s | --- | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | --- | single (v2) | ~3m6s | ~14s |~32s | ~1m2s | ~44s | --- for
[Gluster-users] Quota issue
Hello, Concerning the 3.5.3 version of GlusterFS, I met this morning a strange issue writing file when quota is exceeded. One person of my lab, whose her quota is exceeded (but she didn’t know about) try to modify a file but, because of exceeded quota, she was unable to and decided to exit VI. Now, her file is empty/blank as you can read below: pdsh@lucifer: cl-storage3: ssh exited with exit code 2 cl-storage1: -T 2 tarus amyloid_team 0 19 févr. 12:34 /export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh cl-storage1: -rwxrw-r-- 2 tarus amyloid_team 0 8 juin 12:38 /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh In addition, i dont understand why, my volume being a distributed volume inside replica (cl-storage[1,3] is replicated only on cl-storage[2,4]), i have 2 « same » files (complete path) in 2 different bricks (as you can read above). Thanks by advance for your help and clarification. Geoffrey -- Geoffrey Letessier Responsable informatique ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr Le 2 juin 2015 à 23:45, Geoffrey Letessier geoffrey.letess...@cnrs.fr a écrit : Hi Ben, I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)… [root@nisus test]# ddt -t 10g /mnt/test/ Writing to /mnt/test/ddt.8362 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/test/ddt.8362 ... done. 10240MiBKiB/s CPU% Write 114770 4 Read40675 4 for info: /mnt/test concerns the single v2 GlFS volume [root@nisus test]# ddt -t 10g /mnt/fhgfs/ Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/fhgfs/ddt.8380 ... done. 10240MiBKiB/s CPU% Write 102591 1 Read98079 2 Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)? --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s |~47s | ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s |~57s | ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --- | native FS |~11s |~4s | ~2s |~56s | ~10s | --- | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | --- | single (v2) | ~3m6s | ~14s |~32s | ~1m2s | ~44s | --- for info: -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers) - single (v2): simple gluster volume with default settings I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK. Thank you very much for your reply and help. Geoffrey --- Geoffrey Letessier Responsable informatique ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr mailto:geoffrey.letess...@cnrs.fr Le 2 juin 2015 à 21:53, Ben Turner btur...@redhat.com mailto:btur...@redhat.com a écrit : I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like: Jun 2 15:23:14 gqac006 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jun 2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999 1 0x0080 Jun 2 15:23:14 gqac006 kernel: 880611321cc8 0082 880611321c18 a027236e Jun 2 15:23:14 gqac006 kernel: 880611321c48 a0272c10 88052bd1e040 880611321c78 Jun 2 15:23:14 gqac006 kernel: 88052bd1e0f0 88062080c7a0 880625addaf8 880611321fd8 Jun 2 15:23:14 gqac006 kernel: Call Trace:
Re: [Gluster-users] Double counting of quota
OK, many thanks Rajesh. I just wanted to add that I see a lot of warnings in the logs like the following: [2015-06-08 13:13:10.365633] W [marker-quota.c:3162:mq_initiate_quota_task] 0-atlas-data-01-marker: inode ctx get failed, aborting quota txn I’m not sure if this is a bug (related or not to the one you mention) or if it is normal and harmless. Thanks, Alessandro Il giorno 08/giu/2015, alle ore 10:39, Rajesh kumar Reddy Mekala rmek...@redhat.com ha scritto: We have open bug 1227724 for the similar problem Thanks, Rajesh On 06/08/2015 12:08 PM, Vijaikumar M wrote: Hi Alessandro, Please provide the test-case, so that we can try to re-create this problem in-house? Thanks, Vijay On Saturday 06 June 2015 05:59 AM, Alessandro De Salvo wrote: Hi, just to answer to myself, it really seems the temp files from rsync are the culprit, it seems that their size are summed up to the real contents of the directories I’m synchronizing, or in other terms their size is not removed from the used size after they are removed. I suppose this is someway connected to the error on removexattr I’m seeing. The temporary solution I’ve found is to use rsync with the option to write the temp files to /tmp, but it would be very interesting to understand why this is happening. Cheers, Alessandro Il giorno 06/giu/2015, alle ore 01:19, Alessandro De Salvo alessandro.desa...@roma1.infn.it mailto:alessandro.desa...@roma1.infn.it ha scritto: Hi, I currently have two brick with replica 2 on the same machine, pointing to different disks of a connected SAN. The volume itself is fine: # gluster volume info atlas-home-01 Volume Name: atlas-home-01 Type: Replicate Volume ID: 660db960-31b8-4341-b917-e8b43070148b Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: host1:/bricks/atlas/home02/data Brick2: host2:/bricks/atlas/home01/data Options Reconfigured: performance.write-behind-window-size: 4MB performance.io-thread-count: 32 performance.readdir-ahead: on server.allow-insecure: on nfs.disable: true features.quota: on features.inode-quota: on However, when I set a quota on a dir of the volume the size show is twice the physical size of the actual dir: # gluster volume quota atlas-home-01 list /user1 Path Hard-limit Soft-limit Used Available Soft-limit exceeded? Hard-limit exceeded? --- /user14.0GB 80% 3.2GB 853.4MB No No # du -sh /storage/atlas/home/user1 1.6G/storage/atlas/home/user1 If I remove one of the bricks the quota shows the correct value. Is there any double counting in case the bricks are on the same machine? Also, I see a lot of errors in the logs like the following: [2015-06-05 21:59:27.450407] E [posix-handle.c:157:posix_make_ancestryfromgfid] 0-atlas-home-01-posix: could not read the link from the gfid handle /bricks/atlas/home01/data/.glusterfs/be/e5/bee5e2b8-c639-4539-a483-96c19cd889eb (No such file or directory) and also [2015-06-05 22:52:01.112070] E [marker-quota.c:2363:mq_mark_dirty] 0-atlas-home-01-marker: failed to get inode ctx for /user1/file1 When running rsync I also see the following errors: [2015-06-05 23:06:22.203968] E [marker-quota.c:2601:mq_remove_contri] 0-atlas-home-01-marker: removexattr trusted.glusterfs.quota.fddf31ba-7f1d-4ba8-a5ad-2ebd6e4030f3.contri failed for /user1/..bashrc.O4kekp: No data available Those files are the temp files of rsync, I’m not sure why the throw errors in glusterfs. Any help? Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users http://www.gluster.org/mailman/listinfo/gluster-users smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Sorry, just another question: - in my installation of gluster 3.7.1 the command gluster features.ganesha enable does not work: # gluster features.ganesha enable unrecognized word: features.ganesha (position 0) Which version has full support for it? - in the documentation the ccs and cman packages are required, but they seems not to be available anymore on CentOS 7 and similar, I guess they are not really required anymore, as pcs should do the full job Thanks, Alessandro Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Great, many thanks Soumya! Cheers, Alessandro Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha scritto: Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
I saw similar behaviour when file permissions of vm image was set to root:root instead of hypervisor user. chown -R libvirt-qemu:kvm /var/lib/libvirt/images before starting vm did the trick for me... Am 04.06.2015 um 16:08 schrieb Roger Lehmann: Hello, I'm having a serious problem with my GlusterFS cluster. I'm using Proxmox 3.4 for high available VM management which works with GlusterFS as storage. Unfortunately, when I restart every node in the cluster sequentially one by one (with online migration of the running HA VM first of course) the qemu image of the HA VM gets corrupted and the VM itself has problems accessing it. May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, dev vda, sector 2048 May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on device vda1, logical block 0 May 15 10:35:09 blog kernel: [339003.942929] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, dev vda, sector 2072 May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on device vda1, logical block 3 May 15 10:35:09 blog kernel: [339003.943146] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, dev vda, sector 4196712 May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on device vda1, logical block 524333 May 15 10:35:09 blog kernel: [339003.943350] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, dev vda, sector 4197184 After the image is broken, it's impossible to migrate the VM or start it when it's down. root@pve2 ~ # gluster volume heal pve-vol info Gathering list of entries to be healed on volume pve-vol has been successful Brick pve1:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 Brick pve2:/var/lib/glusterd/brick Number of entries: 1 /images/200/vm-200-disk-1.qcow2 Brick pve3:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. Anybody else experienced this problem? I'm not sure if issue 1161885 (Possible file corruption on dispersed volumes) is the issue I'm experiencing. I have a 3 node replicate cluster. Thanks for your help! Regards, Roger Lehmann ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -- Mit freundlichen Grüßen André Bauer MAGIX Software GmbH André Bauer Administrator August-Bebel-Straße 48 01219 Dresden GERMANY tel.: 0351 41884875 e-mail: aba...@magix.net aba...@magix.net mailto:Email www.magix.com http://www.magix.com/ Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Michael Keith Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205 Find us on: http://www.facebook.com/MAGIX http://www.twitter.com/magix_de http://www.youtube.com/wwwmagixcom http://www.magixmagazin.de -- The information in this email is intended only for the addressee named above. Access to this email by anyone else is unauthorized. If you are not the intended recipient of this message any disclosure, copying, distribution or any action taken in reliance on it is prohibited and may be unlawful. MAGIX does not warrant that any attachments are free from viruses or other defects and accepts no liability for any losses resulting from infected email transmissions. Please note that any views expressed in this email may be those of the originator and do not necessarily represent the agenda of the company. -- ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances
- Original Message - From: Geoffrey Letessier geoffrey.letess...@cnrs.fr To: Ben Turner btur...@redhat.com Cc: Pranith Kumar Karampuri pkara...@redhat.com, gluster-users@gluster.org Sent: Monday, June 8, 2015 8:37:08 AM Subject: Re: [Gluster-users] GlusterFS 3.7 - slow/poor performances Hello, Do you know more about? In addition, do you know how to « activate » RDMA for my volume with Intel/QLogic QDR? Currently, i mount my volumes with RDMA transport-type option (both in server and client side) but I notice all streams are using TCP stack -and my bandwith never exceed 2.0-2.5Gbs (250-300MB/s). That is a little slow for the HW you described. Can you check what you get with iperf just between the clients and servers? https://iperf.fr/ With replica 2 and 10G NW you should see ~400 MB / sec sequential writes and ~600 MB / sec reads. Can you send me the output from gluster v info? You specify RDMA volumes at create time by running gluster v create blah transport rdma, did you specify RDMA when you created the volume? What block size are you using in your tests? 1024 KB writes perform best with glusterfs, and the block size gets smaller perf will drop a little bit. I wouldn't write in anything under 4k blocks, the sweet spot is between 64k and 1024k. -b Thanks in advance, Geoffrey -- Geoffrey Letessier Responsable informatique ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@ibpc.fr Le 2 juin 2015 à 23:45, Geoffrey Letessier geoffrey.letess...@cnrs.fr a écrit : Hi Ben, I just check my messages log files, both on client and server, and I dont find any hung task you notice on yours.. As you can read below, i dont note the performance issue in a simple DD but I think my issue is concerning a set of small files (tens of thousands nay more)… [root@nisus test]# ddt -t 10g /mnt/test/ Writing to /mnt/test/ddt.8362 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/test/ddt.8362 ... done. 10240MiBKiB/s CPU% Write 114770 4 Read40675 4 for info: /mnt/test concerns the single v2 GlFS volume [root@nisus test]# ddt -t 10g /mnt/fhgfs/ Writing to /mnt/fhgfs/ddt.8380 ... syncing ... done. sleeping 10 seconds ... done. Reading from /mnt/fhgfs/ddt.8380 ... done. 10240MiBKiB/s CPU% Write 102591 1 Read98079 2 Do you have a idea how to tune/optimize performance settings? and/or TCP settings (MTU, etc.)? --- | | UNTAR | DU | FIND | TAR | RM | --- | single | ~3m45s | ~43s |~47s | ~3m10s | ~3m15s | --- | replicated | ~5m10s | ~59s | ~1m6s | ~1m19s | ~1m49s | --- | distributed | ~4m18s | ~41s |~57s | ~2m24s | ~1m38s | --- | dist-repl | ~8m18s | ~1m4s | ~1m11s | ~1m24s | ~2m40s | --- | native FS |~11s |~4s | ~2s |~56s | ~10s | --- | BeeGFS | ~3m43s | ~15s | ~3s | ~1m33s | ~46s | --- | single (v2) | ~3m6s | ~14s |~32s | ~1m2s | ~44s | --- for info: -BeeGFS is a distributed FS (4 bricks, 2 bricks per server and 2 servers) - single (v2): simple gluster volume with default settings I also note I obtain the same tar/untar performance issue with FhGFS/BeeGFS but the rest (DU, FIND, RM) looks like to be OK. Thank you very much for your reply and help. Geoffrey --- Geoffrey Letessier Responsable informatique ingénieur système CNRS - UPR 9080 - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letess...@cnrs.fr mailto:geoffrey.letess...@cnrs.fr Le 2 juin 2015 à 21:53, Ben Turner btur...@redhat.com mailto:btur...@redhat.com a écrit : I am seeing problems on 3.7 as well. Can you check /var/log/messages on both the clients and servers for hung tasks like: Jun 2 15:23:14 gqac006 kernel: echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jun 2 15:23:14 gqac006 kernel: iozoneD 0001 0 21999 1 0x0080 Jun 2