Re: [Gluster-devel] Glusterfs the same file but data size is different between server and client
Hi Joe Julian thank you very much, according to you advice, I use command du --apparent-size -sh get result as follow: As follows: check server result: [root@test-gluster004 vol-replica]# du --apparent-size -sh linux-kernel 2.9G linux-kernel check client result: [root@test-gluster003 share]# du --apparent-size -sh linux-kernel 2.9G linux-kernel thank you very much justgluste...@gmail.com From: Joe Julian Date: 2014-09-04 10:12 To: gluster-devel Subject: Re: [Gluster-devel] Glusterfs the same file but data size is different between server and client I'm going to reiterate to make sure I understand correctly. You created a replica 2 volume. Mounted the new volume on a client. Copied a directory to the client mountpoint using cp -a (I assume). Then, on the two bricks, you checked a du -sh for that directory. If all that is correct, then I'm not really sure. Try a du --apparent-size -sh. Perhaps your two filesystems have different block sizes causing files that are smaller than a block to use more disk space on one than the other? Perhaps compare sha1sum between the two and see if there are differences. There shouldn't be. On 09/03/2014 07:01 PM, justgluste...@gmail.com wrote: Hi,I create a glusterfs replica volume (replica count is 2 ), and mount the volume in client, then I copy a dir linux-kernel, when finish copy, I use du command check the dir linux-kernel size, I find the linux-kernel size is different between server and client, As follows: check server result: [root@test-gluster004 vol-replica]# du -sh linux-kernel 4.3G linux-kernel/ check client result: [root@test-gluster003 share]# du -sh linux-kernel 2.9G linux-kernel it is observed that same file but data size is different between server and client, why?? thanks ! justgluste...@gmail.com ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] glusterfs replica volume self heal dir very slow!!why?
Hi all: I do the following test: I create a glusterfs replica volume (replica count is 2 ) with two server node(server A and server B), then mount the volume in client node, then, I shut down the network of server A node, in client node, I copy a dir(which has a lot of small files), the dir size is 2.9GByte, when copy finish, I start the network of server A node, now, glusterfs self-heal-daemon start heal dir from server B to server A, in the end, I find the self-heal-daemon heal the dir use 40 minutes, It's too slow! why? I find out related options with self-heal, as follow: cluster.self-heal-window-size cluster.self-heal-readdir-size cluster.background-self-heal-count I want to ask, modify the above options can improve the performance of heal dir? if possible, please give a reasonable value about above options。 thanks! justgluste...@gmail.com ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Regarding the write performance in replica 1 volume in 1Gbps Ethernet, get about 50MB/s while writing single file.
Hi Jaden, Sorry, from your subject I misunderstood your setup. In a pure distributed volume, each file goes to a brick and only that brick. That unique brick is computed with the elastic hash algorithm. If you get near wire speed, 1Gbps or 120MBps, when writing several files at once but only roughly half speed writing only one file, maybe each brick limits write speed: one green SATA disk running at 5400rpm reaches 75MBps maximum writing big files sequentially (Enterprise SATA disk spinning at 7200rpm reaches around 115MBps). Can you, please, explain which type of bricks do you have on each server node? I'll try to emulate your setup and test it. Thank you! El 04/09/14 a les 03:20, Jaden Liang ha escrit: Hi Ramon, I am running on gluster FUSE client. I maybe not stat clearly my testing environment. Let me explain. The volume is configured on 2 servers. There is no replication at all, just distributed volume. So I don't think it is the replicated data issue. Actually, we can reach 100MB/s when writing mutiple files at the same time. Here is the volume info: # gluster volume info vs_vol_rep1 Volume Name: vs_vol_rep1 Type: Distribute Volume ID: cd137b57-e98a-4755-939a-7fc578f2a8c0 Status: Started Number of Bricks: 10 Transport-type: tcp Bricks: Brick1: host-001e67a3486c:/sf/data/vs/local/dfb0edaa-cfcb-4536-b5cb-a89aabaf8b4d/49ea070f-1480-4838-8182-95d1a6f17d81 Brick2: host-001e67a3486c:/sf/data/vs/local/ac752388-1c2d-43a2-9396-7bedaf9abce2/49ea070f-1480-4838-8182-95d1a6f17d81 Brick3: host-001e67a3486c:/sf/data/vs/local/6ef6c20e-ed59-4f3c-a354-a47caf11bbb0/49ea070f-1480-4838-8182-95d1a6f17d81 Brick4: host-001e67a3486c:/sf/data/vs/local/4fa375da-265f-4436-8385-6af949581e16/49ea070f-1480-4838-8182-95d1a6f17d81 Brick5: host-001e67a3486c:/sf/data/vs/local/184f174a-c5ee-45e8-8cbc-20ae518ad7b1/49ea070f-1480-4838-8182-95d1a6f17d81 Brick6: host-001e67a3486c:/sf/data/vs/local/0a20eb9a-bba4-4cfd-be8f-542eac7a1f98/49ea070f-1480-4838-8182-95d1a6f17d81 Brick7: host-001e67a3486c:/sf/data/vs/local/03648144-fec1-4471-9aa7-45fc2123867a/49ea070f-1480-4838-8182-95d1a6f17d81 Brick8: host-001e67a349d4:/sf/data/vs/local/e7de2d40-6ebd-4867-b2a6-c19c669ecc83/49ea070f-1480-4838-8182-95d1a6f17d81 Brick9: host-001e67a349d4:/sf/data/vs/local/896da577-cd03-42a0-8f5c-469759dd7f7b/49ea070f-1480-4838-8182-95d1a6f17d81 Brick10: host-001e67a349d4:/sf/data/vs/local/6f274934-7e8b-4145-9f3a-bab549e2a95d/49ea070f-1480-4838-8182-95d1a6f17d81 Options Reconfigured: diagnostics.latency-measurement: on nfs.disable: on On Wednesday, September 3, 2014, Ramon Selga ramon.se...@gmail.com mailto:ramon.se...@gmail.com wrote: Hi Jaden, May I ask some more info about your setup? Are you using NFS client or gluster FUSE client? If you are using NFS Client write data goes to one of nodes of replica pair and that node sends write replica data to the other node. If you are using one switch for client and server connections and one 1GbE port on each device, data received in the first node is re-sended to the other node simultaneously and, in theory, you may reach speeds closer to 100MBps. In case of gluster FUSE Client, write data goes simultaneously to both server nodes using half bandwidth for each of the client's 1GbE port because replica is done by client side, that results on a writing speed around 50MBps(60MBps). I hope this helps. El 03/09/14 a les 07:02, Jaden Liang ha escrit: Hi all, We did some more tests and analysis yesterday. It looks like 50MB/s is the top theoretical speed in replica 1 volume over 1Gbps network. GlusterFS write 128KB data once a block, then wait for return. The 128KB data would cost about 1ms in 1Gbps network. And in the server-side, it took about 800us to 1000us to write 128KB to the HDD and return. Plus some other 100us to 200us time elapsed. GlusterFS would take about 2ms-2.2ms to finish a 128KB block data writing, which is about 50MB/s. The question is that why don't glusterfs use pipeline writting or reading to speed up this chatty process? On Tuesday, September 2, 2014, Jaden Liang jaden1...@gmail.com javascript:_e(%7B%7D,'cvml','jaden1...@gmail.com'); wrote: Hello, gluster-devel and gluster-users team, We are running a performance test in a replica 1 volume and find out the single file sequence writing performance only get about 50MB/s in a 1Gbps Ethernet. However, if we test multiple files sequence writing, the writing performance can go up to 120MB/s which is the top speed of network. We also tried to use the stat xlator to find out where is the bottleneck of single file write performance. Here is the stat data: Client-side: .. vs_vol_rep1-client-8.latency.WRITE=total:21834371.00us, mean:2665.328491us, count:8192, max:4063475, min:1849 ..
[Gluster-devel] Proposal for GlusterD-2.0
GlusterD performs the following functions as the management daemon for GlusterFS: - Peer membership management - Maintains consistency of configuration data across nodes (distributed configuration store) - Distributed command execution (orchestration) - Service management (manage GlusterFS daemons) - Portmap service for GlusterFS daemons This proposal aims to delegate the above functions to technologies that solve these problems well. We aim to do this in a phased manner. The technology alternatives we would be looking for should have the following properties, - Open source - Vibrant community - Good documentation - Easy to deploy/manage This would allow GlusterD's architecture to be more modular. We also aim to make GlusterD's architecture as transparent and observable as possible. Separating out these functions would allow us to do that. Bulk of current GlusterD code deals with keeping the configuration of the cluster and the volumes in it consistent and available across the nodes. The current algorithm is not scalable (N^2 in no. of nodes) and doesn't prevent split-brain of configuration. This is the problem area we are targeting for the first phase. As part of the first phase, we aim to delegate the distributed configuration store. We are exploring consul [1] as a replacement for the existing distributed configuration store (sum total of /var/lib/glusterd/* across all nodes). Consul provides distributed configuration store which is consistent and partition tolerant. By moving all Gluster related configuration information into consul we could avoid split-brain situations. All development efforts towards this proposal would happen in parallel to the existing GlusterD code base. The existing code base would be actively maintained until GlusterD-2.0 is production-ready. This is in alignment with the GlusterFS Quattro proposals on making GlusterFS scalable and easy to deploy. This is the first phase ground work towards that goal. Questions and suggestions are welcome. ~kaushal [1] : http://www.consul.io/ ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0
On 05/09/2014, at 11:21 AM, Kaushal M wrote: snip As part of the first phase, we aim to delegate the distributed configuration store. We are exploring consul [1] Does this mean we'll need to learn Go as well as C and Python? If so, that doesn't sound completely optimal. :/ That being said, a lot of distributed/networked computing projects seem to be written in it these days. Is Go specifically a good language for our kind of challenges, or is it more a case of the new shiny? + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0
On 5 Sep 2014, at 12:21, Kaushal M kshlms...@gmail.com wrote: - Peer membership management - Maintains consistency of configuration data across nodes (distributed configuration store) - Distributed command execution (orchestration) - Service management (manage GlusterFS daemons) - Portmap service for GlusterFS daemons Isn't some of this covered by crm/corosync/pacemaker/heartbeat? Marcus -- Marcus Bointon Technical Director, Synchromedia Limited Creators of http://www.smartmessages.net/ UK 1CRM solutions http://www.syniah.com/ mar...@synchromedia.co.uk | http://www.synchromedia.co.uk/ signature.asc Description: Message signed with OpenPGP using GPGMail ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0
- Original Message - On 05/09/2014, at 11:21 AM, Kaushal M wrote: snip As part of the first phase, we aim to delegate the distributed configuration store. We are exploring consul [1] Does this mean we'll need to learn Go as well as C and Python? If so, that doesn't sound completely optimal. :/ consul is written in Go. I don't think that forces us to write our code in Go. That being said, a lot of distributed/networked computing projects seem to be written in it these days. Is Go specifically a good language for our kind of challenges, or is it more a case of the new shiny? I would prefer to write new code in a language other than C. But that's me. I haven't written a lot of Go to comment :-( ~KP ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0
- Original Message - On 5 Sep 2014, at 12:21, Kaushal M kshlms...@gmail.com wrote: - Peer membership management - Maintains consistency of configuration data across nodes (distributed configuration store) - Distributed command execution (orchestration) - Service management (manage GlusterFS daemons) - Portmap service for GlusterFS daemons Isn't some of this covered by crm/corosync/pacemaker/heartbeat? Maybe, I haven't explored the above technologies. consul can be used for all the different functions that glusterd performs today. consul has service discovery and leadership algorithms for executing commands in the cluster, in its 'toolbox'. Portmap service can be implemented as a dictionary with glusterfs-server processes as the key and their ports as the value. We can build further sophistication as we find use for. ~KP ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0
Isn't some of this covered by crm/corosync/pacemaker/heartbeat? Sorta, kinda, mostly no. Those implement virtual synchrony, which is closely related to consensus but not quite the same even in a formal CS sense. In practice, using them is *very* different. Two jobs ago, I inherited a design based on the idea that if everyone starts at the same state and handles the same messages in the same order (in that case they were using Spread) then they'd all stay consistent. Sounds great in theory, right? Unfortunately, in practice it meant that returning a node which had missed messages to a consistent state was our problem, and it was an unreasonably complex one. Debugging failure-during-recovery problems in that code was some of the least fun I ever had at that job. A consensus protocol, with its focus on consistency of data rather than consistency of communication, seems like a better fit for what we're trying to achieve. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Proposal for GlusterD-2.0
As part of the first phase, we aim to delegate the distributed configuration store. We are exploring consul [1] as a replacement for the existing distributed configuration store (sum total of /var/lib/glusterd/* across all nodes). Consul provides distributed configuration store which is consistent and partition tolerant. By moving all Gluster related configuration information into consul we could avoid split-brain situations. Overall, I like the idea. But I think you knew that. ;) Is the idea to run consul on all nodes as we do with glusterd, or to run it only on a few nodes (similar to Ceph's mon cluster) and then use them to coordinate membership etc. for the rest? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] A suggestion on naming test files.
And while we're at it, may we please rename the two or three test files that don't start with bug- to be (foolishly) consistent with the rest of our tests? Feeling more ambitious? How about renaming all the bug-$sixdigit.t to bug-0$sixdigit.t for some even more foolish consistency. (That's a zero, in case your font makes it look ambiguous.) -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Feedback and future plans for glusterfsiostat
Hi Justin, Regarding the server.py script, yes that graph is live and it is refreshed periodically. According to the model right now, the server script at the backend polls the meta directory of mounted volumes repeatedly, calculates the speed from the difference of values encountered and stores it. Another thread on the server flushes this data to the webpage as and when the server is accessed through a browser. The current interval of requests done from server to mounted volume and from browser to server is 1 sec. On the other note, I'm not sure that io-stats stores stats related to a specific directory or inode as of know, IIRC. As Krishnan said, we'll have to look into other methods if we intend to do that. Regards Vipul Nayyar On Thursday, 4 September 2014 3:17 AM, Justin Clift jus...@gluster.org wrote: On 03/09/2014, at 3:48 PM, Vipul Nayyar wrote: snip 3) server.py which is a web server written with very basic python libraries. I've attached some screenshots of the server in action while visualizing live data from gluster mounts in the form of web based graphs. Justin, since this web server implementation is a bit similar to your tool glusterflow, your feedback regarding this is really important and valuable. Thanks. Server.py is the bit that caught my attention too. :) I like the screenshots. Does server.py update/refresh in real-time? Also, one thing that's missing in the screenshots is the time info on the X axis. eg how far apart are each of the tick marks on the x axis? Minor oversight, but important. ;) Some key points for those who want to test this thing. • Apply this patch http://review.gluster.org/#/c/8244/ , build and install. • Please grab the latest source code from https://forge.gluster.org/glusterfsiostat • Profiling needs to be turned on for the volumes regarding which you want I/O data. • Run the cmd line tool by `python stat.py` Giving --help tag would list the other options. • Start the server.py script in the same way and point your browser at 'localhost:8080' • The gluster mounts would be visible as separate tabs on the left in browser. Click on a tab to see the related graphs. • Since these graphs show live I/O activity, you need to run a read/write operation to see the graphs in action. Please do contact me regarding any suggestions or thoughts about this. This is is very cool. As a thought, since I don't know the code at all, is could it do stuff for parts of a volume? For example in the server.py GUI a person could give a directory path inside a volume, and it would show the IO operations stats for just that path? + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD port passes POSIX test suite
Hi netbsd0.cloud.gluster.org now passes the POSIX test suite: (...) All tests successful. Files=191, Tests=2069, 135 wallclock secs ( 1.03 usr 0.45 sys + 26.81 cusr 37.51 csys = 65.80 CPU) Result: PASS I have patches to pullup to netbsd-7 stable branch so that it will be available in next release, but everything is already installed on netbsd0.cloud.gluster.org. On the glusterFS side there are also a few fixes to merge before I can run the POSIX test suite in master autobuilds. See here: http://review.gluster.org/#/q/owner:%22Emmanuel+Dreyfus%22+status:open,n,z Of course I would like to pullup the glusterFS changes to release-3.6 once everything is in master, but when was release-3.6 branched? It would help to spot the changes to pullup. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD port passes POSIX test suite
On Fri, Sep 5, 2014 at 9:34 AM, Emmanuel Dreyfus m...@netbsd.org wrote: Hi netbsd0.cloud.gluster.org now passes the POSIX test suite: (...) All tests successful. Files=191, Tests=2069, 135 wallclock secs ( 1.03 usr 0.45 sys + 26.81 cusr 37.51 csys = 65.80 CPU) Result: PASS I have patches to pullup to netbsd-7 stable branch so that it will be available in next release, but everything is already installed on netbsd0.cloud.gluster.org. On the glusterFS side there are also a few fixes to merge before I can run the POSIX test suite in master autobuilds. See here: http://review.gluster.org/#/q/owner:%22Emmanuel+Dreyfus%22+status:open,n,z This is great stuff, thank you for all the hardwork. release-3.6 is already done some time ago - you have to post all the patches to release-3.6 branch. -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD port passes POSIX test suite
Harshavardhana har...@harshavardhana.net wrote: release-3.6 is already done some time ago - you have to post all the patches to release-3.6 branch. How much time ago? It would help to knowif I have to consider e.g. patches from 18th august. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD port passes POSIX test suite
On 09/05/2014 11:08 PM, Emmanuel Dreyfus wrote: Harshavardhana har...@harshavardhana.net wrote: release-3.6 is already done some time ago - you have to post all the patches to release-3.6 branch. How much time ago? It would help to knowif I have to consider e.g. patches from 18th august. release-3.6 was branched on 17th July. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] glusterfs replica volume self heal dir very slow!!why?
Hi, consider looking at this topic http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018577.html . This guy made lots of test. It seems like 50 MB/s (or around 400 Mbps) is the theoretical maximal limit for replica 1. I am waiting for comments to this too. 2014-09-05 11:15 GMT+03:00 justgluste...@gmail.com justgluste...@gmail.com : Hi all: I do the following test: I create a glusterfs replica volume (replica count is 2 ) with two server node(server A and server B), then mount the volume in client node, then, I shut down the network of server A node, in client node, I copy a dir(which has a lot of small files), the dir size is *2.9GByte,* when copy finish, I start the network of server A node, now, glusterfs self-heal-daemon start heal dir from server B to server A, in the end, I find the self-heal-daemon heal the dir *use 40 m**inutes, * *It's too slow! why?* I find out related options with self-heal, as follow: * cluster.self-heal-window-size* * cluster.self-heal-readdir-size* * cluster.background-self-heal-count* I want to ask, modify the above options can improve the performance of heal dir? if possible, please give a reasonable value about above options。 thanks! -- justgluste...@gmail.com ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Best regards, Roman. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD port passes POSIX test suite
Vijay Bellur vbel...@redhat.com wrote: release-3.6 was branched on 17th July. That is a lot of patches to pull-up. What is the planned release date for 3.6? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] basic/afr/gfid-self-heal.t on release-3.6/NetBSD
Hi NetBSD passes basic/afr/gfid-self-heal.t on master but fails it on release-3.6. I do not recall fixing anything about it, hence I wonder if someone else has an idea of a fix that went into master and that could be related. It happens in test 33-34, here is the relevant tests, with comment added by me: #Check gfid self-heal doesn't happen from one brick to other when type mismatch #is present for a name, without any xattrs TEST kill_brick $V0 $H0 $B0/${V0}0 TEST touch $M0/b TEST mkdir $B0/${V0}0/b TEST setfattr -x trusted.afr.$V0-client-0 $B0/${V0}1 $CLI volume start $V0 force EXPECT_WITHIN $PROCESS_UP_TIMEOUT 1 afr_child_up_status $V0 0 # 33: master/NetBSD: gets EIO, release-3.6/NetBSD: b is here and fine TEST ! stat $M0/b # 34: master/NetBSD: nothing, release-3.6/NetBSD: a gfid is returned gfid_0=$(gf_get_gfid_xattr $B0/${V0}0/b) TEST [[ -z \$gfid_0\ ]] All others basic/afr tests pass both master/NetBSD and release-3.6/NetBSD Any hint? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel