Hi Tony,

This is most likely due to a change in how PVFS 2.8.x tracks file size during writes beyond EOF. It now stores file size explicitly in berkeley db for each data file. This is required for the new directio storage method, but we applied it to the other methods as well to simplify compatibility.

A test that you could run to confirm this would be to change your PVFS server configuration file to this in the StorageHints section:

TroveSyncMeta no

With that set to "no", PVFS will still synchronize metadata (including the explicit size field), but it may delay synchronization until after an acknowledgement has been sent to the client. This will probably hide the size update cost for a serial application.

The size update overhead will only show up for serialized applications that issue small writes beyond EOF (like iozone in the "initial write" phase). If it were a parallel application, PVFS would coalesce the size updates to reduce overhead. If it were a serial application that used larger writes, the size update cost would be amortized over a longer period of time.

Regarding your log file warnings, those are normal. In 2.8.x the servers communicate with each other on startup to precreate datafile objects. It issues those warnings on occasion if one or more servers is not up and running yet when it tries to do that, but it will stop as soon as all servers are available.

thanks,
-Phil


Tony Kew wrote:
Dear Phil,

Irrespective of the --enable-mmap-racache option, there does seem to be
a marked performance drop between PVFS version 2.7.1 (with the 20 or
so patches along the way - not that any were for performance insofar as
I am aware) and version 2.8.x

version 2.8.1 built was with the following configure options:
./configure --prefix=/usr \
--libdir=/usr/lib64 \
--enable-perf-counters \
--enable-fast \
--with-kernel=%{_kernelsrcdir} \
--enable-shared \

version 2.7.1 (fully patched) configured as above, with the addition
of the --enable-mmap-racache option.

I ran three iozone tests for each of the tested distributions, using
a PBS batch job that creates a (new) filesystem across all the nodes
in the job.  The iozone job runs a parallel iozone job, with one
data stream on each node.  The test directory is configured as
a stripe across all the nodes, so each node is writing to all the other
nodes during the test.

The average I/O numbers from running three iozone runs,
writing to a directory configured using the "Simple Stripe" ditribution:

v2.7.1:

 Initial write: 219,306.19 KB/sec
       Rewrite: 130,799.13 KB/sec
          Read: 183,249.66 KB/sec
       Re-read: 191,565.02 KB/sec


v2.8.1:

 Initial write:  40,381.42 KB/sec
       Rewrite: 132,908.15 KB/sec
          Read: 203,758.06 KB/sec
       Re-read: 276,100.11 KB/sec

For a TwoD Stipe distribution:

v2.7.1:

 Initial write: 343,876.68 KB/sec
       Rewrite: 229,740.04 KB/sec
          Read: 167,045.91 KB/sec
       Re-read: 166,417.03 KB/sec

v2.8.1:

 Initial write: 140,253.67 KB/sec
       Rewrite: 201,923.75 KB/sec
          Read: 182,109.70 KB/sec
       Re-read: 205,073.70 KB/sec

KB/sec



In the server log files for the v2.8.1 runs there are many of these:

[E 03/06 16:26] Warning: msgpair failed to tcp://c14n24:3334, will retry: Connection refused

...but only at the time when the filesystem is created, so I don't
believe they have any bearing on the test results.

Let me know if I can provide any more info, or if further tests
would be of use....


Many Thanks,
Tony

Tony Kew
SAN Administrator
The Center for Computational Research
New York State Center of Excellence
in Bioinformatics & Life Sciences
701 Ellicott Street, Buffalo, NY 14203

CoE Office: (716) 881-8930           Fax: (716) 849-6656
CSE Office: (716) 645-3797 x2174
     Cell: (716) 560-0910

"I love deadlines, I love the whooshing noise they make as they go by."
                                                         Douglas Adams



Tony Kew wrote:
Dear Phil,

PVFS 2.8.1 works (insofar as I can tell) with the --enable-mmap-racache
configure option with the following patch:

--- src/apps/kernel/linux/pvfs2-client-core.c.orig 2009-02-27 15:53:50.000000000 -0500 +++ src/apps/kernel/linux/pvfs2-client-core.c 2009-02-27 15:54:22.000000000 -0500
@@ -1609,7 +1609,7 @@ static PVFS_error post_io_readahead_requ
        &vfs_request->in_upcall.credentials,
        &vfs_request->response.io,
        vfs_request->in_upcall.req.io.io_type,
-        &vfs_request->op_id, (void *)vfs_request);
+        &vfs_request->op_id, vfs_request->hints, (void *)vfs_request);

    if (ret < 0)
    {


I am, though, getting poor iozone performance numbers for initial write...
I'm going to run some more iozone tests & try without the
--enable-mmap-racache  and let you know what I find...


Thanks,
Tony


Tony Kew
SAN Administrator
The Center for Computational Research
New York State Center of Excellence
in Bioinformatics & Life Sciences
701 Ellicott Street, Buffalo, NY 14203

CoE Office: (716) 881-8930          Fax: (716) 849-6656
CSE Office: (716) 645-3797 x2174
     Cell: (716) 560-0910

"I love deadlines, I love the whooshing noise they make as they go by."
                                                         Douglas Adams



Tony Kew wrote:
Dear Phil,

Thanks for the info, the option worked for me with 2.7.1 codebase, for
what its worth. The 2.8.0 code with my patch works so far (with the very limited tests I have done.) I'll be testing over several nodes this afternoon,
or tomorrow.

[...]


Phil Carns wrote:
Hi Tony,

I just wanted to mention that the second compile problem that you pointed out is from a code path that gets enabled with the --enable-mmap-racache option. That particular option is experimental and (as you have found) not well tested. I would not advise using it in a production setting.

-Phil


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to