On Fri, May 13, 2011 at 09:01:19PM +0800, Wantao wrote:
> Hi guys,
> 
> I am a PVFS2 newbie and made some performance tests using IOZone, but the 
> results puzzle me. I have 16 machines. One is meta data server, and other 15 
> machines are both PVFS2 IO servers and clients.  Each client machine runs one 
> IOZone process, so the aggregate performance is measured. Those machines are 
> configured as follows: one Intel i7-860 processor, 16GB DDR3 memory and 1TB 
> SATA hard disk. They are connected through a gigabit Ethernet switch. The OS 
> is Debian Lenny (2.6.26 kernel). The PVFS2 is 2.8.2 with default 
> configuration. 
> 
> The IOZone command used is: ./iozone -i 0 -i 1 -i 2 -r 4m -s 32g -t 15 -+m 
> pvfs_client_list. Since the memory capacity for each machine is 16GB, so I 
> set the test file size to 32GB to exercise the PVFS2 heavily. The result is 
> listed below:

Did you have a chance to try other thread counts than 15? Specifically, 
I think 4 (or 8 if hyper-threading enabled) per client would be 
interesting just as a point of reference. Typically higher numbers are 
seen when running with multiple threads per node.

> 
> Record Size 4096 KB
>     File size set to 33554432 KB
>     Network distribution mode enabled.
>     Command line used: ./iozone -i 0 -i 1 -i 2 -r 4m -s 32g -t 15 -+m 
> pvfs_client_list
>     Output is in Kbytes/sec
>     Time Resolution = 0.000001 seconds.
>     Processor cache size set to 1024 Kbytes.
>     Processor cache line size set to 32 bytes.
>     File stride size set to 17 * record size.
>     Throughput test with 15 processes
>     Each process writes a 33554432 Kbyte file in 4096 Kbyte records
> 
>     Test running:
>     Children see throughput for 15 initial writers     =  785775.56 KB/sec
>     Min throughput per process             =   50273.01 KB/sec 
>     Max throughput per process             =   53785.79 KB/sec
>     Avg throughput per process             =   52385.04 KB/sec
>     Min xfer                     = 31375360.00 KB
> 
>     Test running:
>     Children see throughput for 15 rewriters     =  612876.38 KB/sec
>     Min throughput per process             =   39466.78 KB/sec 
>     Max throughput per process             =   41843.63 KB/sec
>     Avg throughput per process             =   40858.43 KB/sec
>     Min xfer                     = 31649792.00 KB
> 
>     Test running:
>     Children see throughput for 15 readers         =  366397.27 KB/sec
>     Min throughput per process             =    9371.45 KB/sec 
>     Max throughput per process             =   29229.74 KB/sec
>     Avg throughput per process             =   24426.48 KB/sec
>     Min xfer                     = 10760192.00 KB
> 
>     Test running:
>     Children see throughput for 15 re-readers     =  370985.14 KB/sec
>     Min throughput per process             =    9850.98 KB/sec 
>     Max throughput per process             =   29660.86 KB/sec
>     Avg throughput per process             =   24732.34 KB/sec
>     Min xfer                     = 11145216.00 KB
> 
>     Test running:
>     Children see throughput for 15 random readers     =  257970.32 KB/sec
>     Min throughput per process             =    8147.65 KB/sec 
>     Max throughput per process             =   20084.32 KB/sec
>     Avg throughput per process             =   17198.02 KB/sec
>     Min xfer                     = 13615104.00 KB
> 
>     Test running:
>     Children see throughput for 15 random writers     =  376059.73 KB/sec
>     Min throughput per process             =   24060.38 KB/sec 
>     Max throughput per process             =   26446.96 KB/sec
>     Avg throughput per process             =   25070.65 KB/sec
>     Min xfer                     = 30527488.00 KB
> 
> I have three questions:
>  1. Why does write outperforms rewrite significantly? According to  IOZone's 
> document, rewrite is supposed to perform better, since it  writes to a file 
> which already exists, and the metadata is already  there.

I don't have a concrete answer. One thing to try is to run the tests 
separately and dump the caches from the server perspective between the 
runs, like:
iozone -i 0 ....
#for server in X
ssh/psh server "echo 3 > /proc/sys/vm/drop_caches"
#done
iozone -i 4

One other thing to keep in mind is that OrangeFS/PVFS doesn't have 
client side cache so a re-write doesn't see the benefit of being in 
cache on the client side. Not really an answer for why re-write is 
slower but I would expect it to be roughly equivalent within some delta. 
If the first write gets 1/2 the file's data into cache (approximately) 
on each server it may see a performance benefit that the re-write test 
doesn't get since the second half of the write test blew out the cache 
on each server.

>  2. Why is write/random-write faster than read/random-read so much? This  
> result is really unexpected. I feel that read is supposed to be faster.  Is 
> there anything wrong in my result numbers?

strip size and stride may be causing some funny access patterns. Try to 
get the stride of the reads/writes to match up to the file system 
stride. By default the file system strip is 64k. You'll likely want to 
have that match your record size. For these tests with a 4m record size 
adding a stanza inside the Filesystem context will set the default strip 
to 4M:

        <Distribution>
                Name simple_stripe
                Param strip_size
                Value 4194304
        </Distribution>

That should also make writes aligned with one record per server. For 15 
I/O servers try setting the stride to 15 (via -j). I don't know 
what effect it will have or which test(s) are considered the 'stride 
read test' but it would be a good option to check.

>  3. Observing the max and min throughput per process in each test item,  you 
> can find that in write/re-write/random-write, the difference between  max and 
> min is acceptable; while in read/re-read/random-read, the max  throughput is 
> about two or three times of the min number. How can I  explain this result? 
> Is it normal?

I wouldn't consider it normal, let's see if the changes I've mentioned 
reduce the deviation.

> 
> These results are out of my expectation. Is it possible that they are caused 
> by faulty hardware (network or disk) or configuration?

I don't think it's an issue of faulty hardware. I'm guessing it's a 
matter of matching the file system configuration to be optimal for the 
tests that are being ran.

Michael

> 
> Any advice is appreciated.
> 
> Sincerely,
> Wantao

> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to