[Pvfs2-users] Re: PVFS v2.8.x initial write performance

Phil Carns Mon, 06 Apr 2009 07:37:16 -0700

Thanks for the extra information, Tony. That's too bad that themetasync option wasn't helping for your configuration.

I don't think any relevant default configuration changes since 2.7.1. Ithink it is just that size update issue that I mentioned earlier sinceit only really shows up in the initial write phase. We just have aperformance regression there for serial applications.


I don't have an answer for you yet, but we are looking into it.

-Phil

Tony Kew wrote:

Dear Phil,

I ran the iozone job manually four times.  Only once was there any outout
in any of the server logs (after the filesystem was initially built,servers
started and the filesystem mounted).

The one iozone run that gave errors failed after the writes had completed,
running the initial read pass:
################################################################################
Errors from node c14n29 PVFSv2 server logfile:
################################################################################[E 03/23 12:38] job_time_mgr_expire: job time out: cancelling flowoperation, jo
b_id: 3647671.
[E 03/23 12:38] fp_multiqueue_cancel: flow proto cancel called on0x2a9586f540
[E 03/23 12:38] fp_multiqueue_cancel: I/O error occurred
[E 03/23 12:38] handle_io_error: flow proto error cleanup started on0x2a9586f54
0: Operation cancelled (possibly due to timeout)
[E 03/23 12:38] PVFS2 server: signal 11, faulty address is (nil), from(nil)
[E 03/23 12:38] [bt] [(nil)]
Other than this (which I consider an anomaly for now...) The averageperformance
number for the three iozone runs that completed follow:

     Initial write:  37,625.48 KB/sec
           Rewrite: 149,830.93 KB/sec
              Read: 170,758.41 KB/sec
           Re-read: 206,256.47 KB/sec

I would say there is a definite performance issue with initial writes.

Are there any filesytems configuration defaults that may have changed
perhaps?...

Thanks Much,
Tony

Tony Kew
SAN Administrator
The Center for Computational Research
New York State Center of Excellence
in Bioinformatics & Life Sciences
701 Ellicott Street, Buffalo, NY 14203

CoE Office: (716) 881-8930          Fax: (716) 849-6656
CSE Office: (716) 645-3797 x2174
     Cell: (716) 560-0910

"I love deadlines, I love the whooshing noise they make as they go by."
                                                         Douglas Adams



Tony Kew wrote:
Dear Phil,

The filesystem configuration in my tests are built as follows:
pvfs2-genconfig --quiet --protocol tcp --tcpport --notrovesync--trove-method "alt-aio" \--server-job-timeout 60" --fsid=_a_job_specific_id--fsname=_a_job_specific_name_ \--storage _a_job_specific_storage_space_ --logfile_a_job_specific_logfile_ \--ioservers [list of nodes in the job] --metaservers [list of nodes inthe job]
I believe the "--notrovesync" option already sets "TroveSyncMeta no"in the config
file.
I'm running an interactive PBS job to make sure the "msgpair failed"error messagesare generated during the filesystem build, and not sebsequently -certainly itappears to be the case, but I'm running an iozone job manually to besure...
Tony


Tony Kew
SAN Administrator
The Center for Computational Research
New York State Center of Excellence
in Bioinformatics & Life Sciences
701 Ellicott Street, Buffalo, NY 14203

CoE Office: (716) 881-8930          Fax: (716) 849-6656
CSE Office: (716) 645-3797 x2174
     Cell: (716) 560-0910

"I love deadlines, I love the whooshing noise they make as they go by."
                                                         Douglas Adams



Phil Carns wrote:
Hi Tony,
This is most likely due to a change in how PVFS 2.8.x tracks filesize during writes beyond EOF. It now stores file size explicitly inberkeley db for each data file. This is required for the newdirectio storage method, but we applied it to the other methods aswell to simplify compatibility.
A test that you could run to confirm this would be to change yourPVFS server configuration file to this in the StorageHints section:
TroveSyncMeta no
With that set to "no", PVFS will still synchronize metadata(including the explicit size field), but it may delay synchronizationuntil after an acknowledgement has been sent to the client. Thiswill probably hide the size update cost for a serial application.
The size update overhead will only show up for serializedapplications that issue small writes beyond EOF (like iozone in the"initial write" phase). If it were a parallel application, PVFSwould coalesce the size updates to reduce overhead. If it were aserial application that used larger writes, the size update costwould be amortized over a longer period of time.
Regarding your log file warnings, those are normal. In 2.8.x theservers communicate with each other on startup to precreate datafileobjects. It issues those warnings on occasion if one or more serversis not up and running yet when it tries to do that, but it will stopas soon as all servers are available.
thanks,
-Phil


Tony Kew wrote:
Dear Phil,

Irrespective of the --enable-mmap-racache option, there does seem to be
a marked performance drop between PVFS version 2.7.1 (with the 20 or
so patches along the way - not that any were for performance insofar as
I am aware) and version 2.8.x

version 2.8.1 built was with the following configure options:
./configure --prefix=/usr \
--libdir=/usr/lib64 \
--enable-perf-counters \
--enable-fast \
--with-kernel=%{_kernelsrcdir} \
--enable-shared \

version 2.7.1 (fully patched) configured as above, with the addition
of the --enable-mmap-racache option.

I ran three iozone tests for each of the tested distributions, using
a PBS batch job that creates a (new) filesystem across all the nodes
in the job.  The iozone job runs a parallel iozone job, with one
data stream on each node.  The test directory is configured as
a stripe across all the nodes, so each node is writing to all the other
nodes during the test.

The average I/O numbers from running three iozone runs,
writing to a directory configured using the "Simple Stripe"ditribution:
v2.7.1:

 Initial write: 219,306.19 KB/sec
       Rewrite: 130,799.13 KB/sec
          Read: 183,249.66 KB/sec
       Re-read: 191,565.02 KB/sec


v2.8.1:

 Initial write:  40,381.42 KB/sec
       Rewrite: 132,908.15 KB/sec
          Read: 203,758.06 KB/sec
       Re-read: 276,100.11 KB/sec

For a TwoD Stipe distribution:

v2.7.1:

 Initial write: 343,876.68 KB/sec
       Rewrite: 229,740.04 KB/sec
          Read: 167,045.91 KB/sec
       Re-read: 166,417.03 KB/sec

v2.8.1:

 Initial write: 140,253.67 KB/sec
       Rewrite: 201,923.75 KB/sec
          Read: 182,109.70 KB/sec
       Re-read: 205,073.70 KB/sec

KB/sec



In the server log files for the v2.8.1 runs there are many of these:
[E 03/06 16:26] Warning: msgpair failed to tcp://c14n24:3334, willretry: Connection refused
...but only at the time when the filesystem is created, so I don't
believe they have any bearing on the test results.

Let me know if I can provide any more info, or if further tests
would be of use....


Many Thanks,
Tony

Tony Kew
SAN Administrator
The Center for Computational Research
New York State Center of Excellence
in Bioinformatics & Life Sciences
701 Ellicott Street, Buffalo, NY 14203

CoE Office: (716) 881-8930           Fax: (716) 849-6656
CSE Office: (716) 645-3797 x2174
     Cell: (716) 560-0910

"I love deadlines, I love the whooshing noise they make as they go by."
                                                         Douglas Adams
[...]


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

[Pvfs2-users] Re: PVFS v2.8.x initial write performance

Reply via email to