Do you still use posix-aio with O_DIRECT?
I'd be very interested in a linux-native AIO trove module that uses
O_DIRECT, and this would avoid any buffer cache hits. I get the best IO
performance out of our storage servers when using linux native O_DIRECT
and linux native AIO.
Sam Lang wrote:
Hi Phil,
Its good to get this functionality into the code base -- we've had a
number of attempts at this sort of thing, but none of them got
committed to HEAD, and having it in there in whatever state is better
than not. I have a concern (and overall design gripe) with the use of
AIO interfaces for this sort of thing, when we already have callback
structures in dbpf.
We now have two levels of indirection, with threads being created and
managed in both. Obviously, that's more code to manage in different
locations that do more or less the same thing, making it harder for
others to understand and augment.
In general I don't think the aio callback structures are needed at
all, but its admittedly much easier to implement to those functions
than the dbpf ones, if only because of the disorderly op mgmt code in
dbpf bstream.
I don't know what our long term plans will be for the trove code, but
I would vote for trying to move towards a simpler centralized location
for management of the IO threads and queues, and different callbacks
for IO impls. I've done a prototype of this for queue/thread
management and O_DIRECT, and I think it would clean things up quite a
bit to go that route.
-sam
On Apr 17, 2008, at 4:10 PM, Phil Carns wrote:
There is a new trove method available in trunk now called "null-aio".
It can be selected by putting "TroveMethod null-aio" in the
<StorageHints> section of the file system configuration file.
This is only useful for debugging purposes, because it deliberately
skips doing any file I/O on the server side. Please use with
caution! It does all metadata operations the same as any other
method, but file reads will return garbage and file writes are thrown
away. Writing beyond eof triggers a truncate to mimic the
appropriate resulting bstream size.
This might be useful once in a while for narrowing down performance
problems between network and storage. It takes the storage out of
the loop and shows approximately what the network is capable of by
itself. Of course it will only work for benchmarks that don't verify
data correctness (or otherwise rely on data read off of PVFS).
We used to have a compile time option (--disable-disk-io) for this
same purpose, but that actually hasn't worked in a while. Nowadays
its easier to just do this as a trove method that can be selected at
runtime without recompiling.
-Phil
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
------------------------------------------------------------------------
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers