The more important point from the previous email is that pvfs-cp to individual servers takes ~160 MB/s, whereas writing data to the underlying ext4 file system shows a performance of ~350 MB/s .
- Kshitij From: Kshitij Mehta [mailto:[email protected]] Sent: Monday, December 19, 2011 3:48 PM To: 'Michael Moore'; 'Kyle Schochenmaier' Cc: '[email protected]' Subject: RE: [Pvfs2-users] Tracing pvfs2 internals Apologies for my late reply. Regarding Kyle's suggestion, 42MB/s certainly seems to be the local hard drive speed, I have 2G RAM on my machine. I performed a pvfs-cp on a 1G file from the local hard drive to /pvfs2 (I did this on an I/O server), and it took 6.2 seconds (165 MB/s). time /opt/pvfs-2.8.2/bin/pvfs2-cp /tmp/ior.out /pvfs2-ssd/ss_4mb/ior.out real 0m6.230s user 0m0.162s sys 0m3.968s Also, I did what Michael suggested, where I create a directory that uses a single datafile , and pvfs-cp files to this directory. Again, I see similar performance (6.x seconds) to copy 1G files. So if I pvfs-cp files to the file system or to individual servers, I see similar performance. I believe when I cp files to pvfs2, the performance should be nearly double the performance seen when I cp to individual servers. Correct me if I am wrong. (In the snippet below, I cp two files, ior.out and ior2.out to the directory that uses a single datafile. I then verify that they are using separate I/O servers.) $> time /opt/pvfs-2.8.2/bin/pvfs2-cp /tmp/ior.out /pvfs2-ssd/kmehta/1_dir/ior.out real 0m6.433s user 0m0.101s sys 0m3.341s $> time /opt/pvfs-2.8.2/bin/pvfs2-cp /tmp/ior.out /pvfs2-ssd/kmehta/1_dir/ior2.out real 0m6.349s user 0m0.105s sys 0m2.618s $> /opt/pvfs-2.8.2/bin/pvfs2-viewdist -f /pvfs2-ssd/kmehta/1_dir/ior.out dist_name = simple_stripe dist_params: strip_size:65536 Metadataserver: tcp://192.168.2.95:3334 Number of datafiles/servers = 1 Datafile 0 - tcp://192.168.2.95:3334, handle: 9223372036854774228 (7ffffffffffff9d4.bstream) $> /home/localtester # /opt/pvfs-2.8.2/bin/pvfs2-viewdist -f /pvfs2-ssd/kmehta/1_dir/ior2.out dist_name = simple_stripe dist_params: strip_size:65536 Metadataserver: tcp://192.168.2.94:3334 Number of datafiles/servers = 1 Datafile 0 - tcp://192.168.2.94:3334, handle: 6917529027641068981 (5fffffffffffcdb5.bstream) - Kshitij From: [email protected] [mailto:[email protected]] On Behalf Of Michael Moore Sent: Wednesday, December 14, 2011 2:16 PM To: Kyle Schochenmaier Cc: Kshitij Mehta; [email protected] Subject: Re: [Pvfs2-users] Tracing pvfs2 internals Another diagnostic step would be to create a directory that uses a single datafile (e.g. setfattr -n user.pvfs2.num_dfiles -v "1" /mnt/1_dir), touch two files in that directory and confirm that each one uses a different server (e.g. pvfs2-viewdist -f /mnt/1_dir/1.out). Then perform the same pvfs2-cp test to each file and see if there is a difference in performance. Michael On Wed, Dec 14, 2011 at 3:03 PM, Kyle Schochenmaier <[email protected]> wrote: Hi Kshitij - That looks extremely low.. do you actually have 27GB of RAM ? because that looks like the speeds of a local hard drive.. Can you try it with a 1GB file instead? ~Kyle Kyle Schochenmaier On Wed, Dec 14, 2011 at 1:56 PM, Kshitij Mehta <[email protected]> wrote: Ok, these are the results of performing a pvfs2-cp on a 27G file from /tmp to a directory on /pvfs2-ssd with a stripe size of 4MB. I see bandwidth of ~42MB/s. Is this expected? $> time /opt/pvfs-2.8.2/bin/pvfs2-cp /tmp/ior.out.00000000 /pvfs2-ssd/ss_4mb/ior.out real 10m55.393s user 0m3.075s sys 2m6.047s $> ls -lh /pvfs2-ssd/ss_4mb/ior.out -rw-r--r-- 1 root root 27G 2011-12-14 13:37 /pvfs2-ssd/ss_4mb/ior.out - Kshitij From: Kyle Schochenmaier [mailto:[email protected]] Sent: Wednesday, December 14, 2011 1:19 PM To: Kshitij Mehta Cc: Michael Moore; [email protected] Subject: Re: [Pvfs2-users] Tracing pvfs2 internals Hi Kshitij - What kind of performance do you get with pvfs2-cp ? If you set the block size for pvfs2-cp of some large file (1GB+) from /tmp/ on your client to the pvfs2-fs to 1MB+ do you get decent performance ? -- we should be testing the performance of in-memory pvfs2 at this point.. Kyle Schochenmaier On Wed, Dec 14, 2011 at 1:09 PM, Kshitij Mehta <[email protected]> wrote: 1) what interface are you using with IOR, MPIIO or POSIX? MPIIO 2) what protocol are you using, (tcp, ib) and what is the link speed? IB SDR , with a theoretical of 1 GB/s 3) is the PVFS2 file system you're comparing to ext4 just the single host or is it both hosts attached to SSD Both hosts. 4) With 32MB transfer size (from IOR, right?) does that match the stripe size you're using in the PVFS2 file system? Yes, we ran the test from IOR. The stripe size on PVFS2 was set to 1 MB. I am seeing similar results when using varying transfer sizes from 1MB through 1GB, doubling the transfer size in every run. 5) are you using directio or alt-aio? Alt-aio Thanks, Kshitij From: Michael Moore [mailto:[email protected]] Sent: Wednesday, December 14, 2011 5:21 AM To: Kshitij Mehta Cc: Kyle Schochenmaier; [email protected] Subject: Re: [Pvfs2-users] Tracing pvfs2 internals Hi Kshitij, A couple other questions and things to look at: 1) what interface are you using with IOR, MPIIO or POSIX? 2) what protocol are you using, (tcp, ib) and what is the link speed? 3) is the PVFS2 file system you're comparing to ext4 just the single host or is it both hosts attached to SSD 4) With 32MB transfer size (from IOR, right?) does that match the stripe size you're using in the PVFS2 file system? 5) are you using directio or alt-aio? Beyond that, if you could watch top for something CPU bound or swapping during testing that may show what's going on. Also, if you could watch iostat to see what's happening with the disks while running the test on PVFS2.. Michael On Wed, Dec 14, 2011 at 2:43 AM, Kshitij Mehta <[email protected]> wrote: I am using a transfer size of 32 MB, which should have shown much better performance (My apologies for not mentioning this before). The total file size being written is 8GB. - Kshitij On Dec 14, 2011, at 1:34 AM, Kyle Schochenmaier <[email protected]> wrote: Hi Kshitij - This is the expected behaviour, PVFS2 is not highly optimized for small writes/reads, which is what IOR is typically performing. So you will always see degraded performances here compared to the underlying filesystem's base performance. There are ways to tune to help optimize for this type of access. If you set your IOR block accesses to something larger such as 64K instead of the default (4K?) I think you would see performances which are closer. This used to be pretty well documented in the FAQ documents for PVFS, i'm not sure where the links are now.. Cheers, Kyle Schochenmaier On Wed, Dec 14, 2011 at 1:09 AM, Kshitij Mehta <[email protected]> wrote: Well , heres why I wanted to trace in the first place. I have a test configuration where we have configured PVFS2 over an SSD storage. There are two I/O servers that talk to the SSD storage through Infiniband (There are 2 IB channels going into the SSD, and each storage server can 'see' one half of the SSD). Now I used the IOR benchmark to test the write bandwidth. I first spawn a process on the I/O server such that it writes data to the underlying ext4 file system on the SSD instead of PVFS2. I see a bandwidth of ~350 MB/s. Now I spawn a process on the same I/O server and write data to the PVFS2 file system configured over the SSD, and I see a write bandwidth of ~180 MB/s. This seems to represent some kind of overhead with PVFS2, but seems too large. Has anybody else seen similar results? Is the overhead of pvfs2 documented? Do let me know if something is not clear or if you have additional questions about the above setup. Here are some other details: I/O servers: dual core with 2G main memory each. PVFS 2.8.2 Thanks, Kshitij -----Original Message----- From: Julian Kunkel [mailto:[email protected]] Sent: Tuesday, December 13, 2011 3:10 AM To: Kshitij Mehta Cc: [email protected] Subject: Re: [Pvfs2-users] Tracing pvfs2 internals Dear Kshitij, we have a version of OrangeFS which is instrumented with HDTrace, there you can record detailed information about activity of statemachines and I/O. For a description see the thesis: http://wr.informatik.uni-hamburg.de/_media/research:theses:Tien%20Duc%20Tien _Tracing%20Internal%20Behavior%20in%20PVFS.pdf The code is available in our redmine (here is a link to the wiki): http://redmine.wr.informatik.uni-hamburg.de/projects/piosimhd/wiki I consider the tracing implemented in PVFS as rather robust, since it is our second implementation with PVFS_hints. However, you might encounter some issues with the build system. If you want to try it and you need help, just ask. Regards, Julian Kunkel 2011/12/13 Kshitij Mehta <[email protected]>: > Hello, > > Is there a way I can trace/measure the internal behavior of pvfs2? > Suppose I have a simple I/O code that writes to pvfs2, I would like to > find out how much time exactly do various internal operations of Pvfs2 > take (metadata lookup, creating iovecs, etc.), before data is finally pushed to disk. > > > > Is there a configure option (what does `enabletracing` do in the > config > file) ? Or is there any other way to determine this ? > > > > Thanks, > Kshitij > > > > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
