Re: [Pvfs2-users] Conceptual question: Why larger block sizes perform better

Kshitij Mehta Fri, 21 Oct 2011 10:36:36 -0700

Apologies for opening an old thread,

By default, PVFS uses 8, 256k buffers to transfer data to a server.Once the connection is made, PVFS transmits data to the server usingthese 256k-sized buffers as fast as it can. You can think of the 8buffers as the PVFS window size (if you are familiar with TCPterminology). With 20 I/O servers, you have 20 of these windowspushing out data over the network just as fast as possible.

How does pvfs2 write non-contiguous data chunks to a single server?Using a list I/O interface like writev? Or does it issue separate writecalls for every 64K chunk of data to be written to a server?

Also, is this documented somewhere, or do you generally look at thesource code to figure such things out?


Thanks,
Kshitij


On 10/09/2011 03:03 PM, Becky Ligon wrote:

The dd block size determines how much data is given to PVFS2 in anyone write request. Thus, if the write request is given 2MB of data,that data is divided up and sent to the 20 I/O servers all at the sametime (see note below). If the write request is given only 64K ofdata, then a request is sent to the one server where the next 64k isto be written. So, throughput for larger requests generally performbetter than for small requests, depending on your network delay, howbusy your servers are, and the number of I/O servers in yourfilesystem. There is also some overhead associated with moving thedata from user space to kernel space; so, you incur more OS overheadusing 64k blocks than you would with 2MB blocks.
For example, if you use the linux command "cp" and compare itsperformance with "pvfs2-cp" to copy a large amount of data from a unixfilesystem into a PVFS filesystem, you will immediately notice thatpvfs2-cp is faster than cp. pvfs2-cp performs better than cp becauseit uses a default buffer size of 10MB, while cp uses the stripe size,in your case 64k. So, it will take cp longer to transfer the sameamount of data than it will with pvfs2-cp.
NOTE: By default, PVFS uses 8, 256k buffers to transfer data to aserver. Once the connection is made, PVFS transmits data to theserver using these 256k-sized buffers as fast as it can. You canthink of the 8 buffers as the PVFS window size (if you are familiarwith TCP terminology). With 20 I/O servers, you have 20 of thesewindows pushing out data over the network just as fast as possible.
Hope this helps!
Becky
On Sun, Oct 9, 2011 at 5:34 AM, belcampo <[email protected]<mailto:[email protected]>> wrote:
    On 10/06/2011 10:36 PM, Kshitij Mehta wrote:

        Hello,
        I have a pvfs2 file system configured over 20 IO servers with
        a default stripe size of 64Kbytes.
        I am running a simple test program where I write a matrix to file.

        This is what I see:
        If the 1GByte matrix is written in block sizes of 2Mbytes, the
        performance is much better than writing the matrix in blocks
        of 64Kbytes. I am not sure I understand why. Since the stripe
        size is 64KB, every block of 2MB eventually gets broken into
        64KB blocks which are written to the IO servers, so the
        performance should nearly be equal. I would understand why
        writing block size < stripe_size should perform badly, but
        when the block size exceeds the stripe size, I expect the
        performance to peak out.

        Can someone explain what happens here? Your help is appreciated.

    I can't explain only confirm. I also did some test with following
    results.

    with pvfs2-cp
           18.18
    over pvfs2fuse
    dd blocksize    MB/s
    4k      4.4
    8k      6.3
    16k     7.3
    32k     8.8
    64k     9.9
    128k    18.7
    256k    18.7
    512k    18.8
    1024k   18.8
    2048k   18.8

    over pvfs2fuse
    cp      8.2
    rsync   14.8

    over nfs
    cp      10.6
    rsync   11.0

    Further earlier was/is mentioned that ongoing effort is put in
    optimizing pvfs2/orangefs with small file-sizes. So AFAIK it is by
    design, but not knowing the reasoning behind it.


        Best,
        Kshitij Mehta

        _______________________________________________
        Pvfs2-users mailing list
        [email protected]
        <mailto:[email protected]>
        http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users


    _______________________________________________
    Pvfs2-users mailing list
    [email protected]
    <mailto:[email protected]>
    http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users




--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina




_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] Conceptual question: Why larger block sizes perform better

Reply via email to