Sam Lang wrote:
On Aug 10, 2006, at 4:04 PM, Phil Carns wrote:
flow-proto-tuning.patch:
-----------
This patch adds "FlowBufferSizeBytes" and "FlowBuffersPerFlow"
options to the configuration file format. They allow you to specify
the buffer size that the default flow protocol will use as well as
the maximum number of buffers to use per flow. Note that if you
change either of these parameters, then you need to remount any
active clients so that they pick up the configuration change before
performing any I/O.
max-aio.patch:
----------
This patch adds "TroveMaxConcurrentIO" to the configuration file
format. It allows you to specify the maximum number of I/O
operations that trove will allow to proceed concurrently (currently
16). Note from the previous email regarding AIO that depending on
your access pattern, AIO may queue all of your operations anyway
regardless of this setting. It probably doesn't have much effect
unless you are accessing more than one file at a time, or if you are
using an alternative to the stock AIO implementation.
I had made the same change in Julian's branch,
Oh, ok. We will switch over when that stuff hits trunk.
there are still a couple
things that aren't clear to me about this max value though. First, its
a global value for all outstanding lio_listio calls the pvfs server
makes, but based on your previous email comments about glibc's
one-thread-per-fd oddity, it seems like we only want that value to max
out per datafile. Also, after we hit the max we just queue the
operations and post them once current ops have completed. If librt
just queues ops and does them in FIFO order though, its pretty much the
same thing. Why not just let librt handle the queuing? If we were to
do ordering of the operations based on offsets, then it would make
sense for us to queue, but we don't. Are we better at queueing than
librt?
I agree that if you are using librt for aio, then this max value isn't
doing much of anything :) librt's queueing isn't exactly the same thing
though. librt allows N operations in flight at a time (where N can be
tuned using aio_init) by way of limiting the maximum number of threads
that it will spawn. However, since it serializes on each fd, that limit
never kicks in unless you are accessing N different files. Otherwise it
is really only going to do one thing at a time. The librt source that I
looked at happened to default N=16, just like Trove was.
I think the point of the aio limit in trove was to try to throttle I/O
on the servers, but it turns out that librt was already throttling above
an beyond; so the trove limit wasn't actually decreasing the number of
posted I/O operations to the kernel any.
Maybe the throttling makes more sense when you bypass librt somehow (as
in the previous patch) because then there is nothing to queue/throttle
the operations besides trove?
At any rate, we decided to make this configurable before understanding
the issues involved- it was just a hardcoded value we saw that looked
like it should have been tunable.
I know Julian was looking at performance of aio and found results
somewhere (I don't have a reference, sorry) that showed lio_listio did
better in cases where multiple fds were passed to one lio_listio
operation (right now we just do one fd with multiple segments to one
lio_listio). I wonder if that difference is based on the glibc queuing
behavior that you describe.
I would guess that the queueing behavior is the reason. I can't imagine
that using seperate files would make much difference once you get to the
system call level.
Just a curiousity, but I wonder if the aio
performance would change if we were to post multiple trove operations
in the same lio_listio call, or possibly even break up the bstream into
multiple files based on strip size...sounds crazy right? :-)
On the former question, I guess it depends on who is better a
coalescing- the kernel disk scheduler or the trove queue? Hopefully we
can avoid splitting files up :)
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers