Hi John,

I think the best way (others can correct me) to modify the pvfs2 code to get the trove layer to operate on the logical offsets and sizes is in the flow code (flowproto_multiqueue.c). My reasoning is that the flow layer converts the PVFS_Request structure into the physical offsets and sizes and passes them on to the trove layer. The trove layer doesn't care whether the offsets and sizes passed to it (via trove_bstream_{read|write}_list) are logical or physical offsets, it just uses those values to operate directly on the bstream file in the normal case. In your case the offsets and sizes passed to trove could be logical, and trove wouldn't know the difference. In other words, you shouldn't have to modify any of the distribution code, or manipulate the offsets and sizes in the trove code, just use what the flow layer gives you.

The changes to the flow layer require that PINT_process_request return logical offsets instead of physical offsets (for both reads and writes). It will do this if you pass PINT_CLIENT instead of PINT_SERVER as the mode (5th argument). You will need to do this in each instance of PINT_process_request where PINT_SERVER is used.

PINT_process_request is a bit hard to use in some cases, but these changes are simple enough, and the offsets and sizes are treated opaquely everywhere outside of the function (except in AIO of course), which turns out to be a nice design of the framework in my view.

One caveat: You will probably want to either turn off small IO, which doesn't use the flow layer, or make the same modifications to PINT_process_request in the small-io.sm. You can just turn it off by compiling with CFLAGS=-DPVFS2_SMALL_IO_OFF.

-sam

On June 27, 2006, at 11:28AM, John Bent wrote:


Ok, I've removed the footnote. Now I'm doing everything within the new trove layer and no longer doing it in the PINT_distribute although I did change some things slightly. The problem was that PINT_ADD_SEGMENT was combining the segments assuming they were in their own individual stripe.
However, since they now must be interspersed with segments from other
servers, they can no longer be combined. (Obviously, this will adversely
affect performance of the old trove layer so it calls for some layer
violation to only turn off merging depending on the trove layer selected.
I guess later if I care about this, I call add a trove function to the
trove function table to this effect.)

I'm still however unable to read the files back correctly using the pvfs2
servers.

John

On Tue, 27 Jun 2006, John Bent wrote:


Hello,

I'm working on a pet research project in which I'm (somewhat abashedly) actually _removing_ functionality from PVFS2. What I'm trying to do is
create a new trove interface in which requests to disk are no longer
logically striped across multiple PVFS2 servers each with its own
physical storage but are rather passed transparently from client through PVFS2 onto a second and underlying shared file system on which each PVFS2
server is mounted.

In order to do this, I have extended IO requests to pass the logical
filenames along with the handles and I have further modified
PINT_distribute(footnote 1) to use the file distribution info to translate its physical offset into the actual logical file offset and then pass this
logical offset to the PINT_ADD_SEGMENT macro.

This works in that files written to pvfs2 servers are transparently
created in the pvfs2 storage space. These files can then be correctly
read directly from the other underlying shared file system.

However, they can no longer be read correctly through the PVFS2 servers.
Perhaps when I write to the actual logical offsets instead of to the
striped offsets, I am fooling the pvfs2 servers into thinking those
logical offsets are actually the striped ones? When I try to read the file back, I get a file that is N times the correct size where N is the number of data servers. What happens is that each server gives me each segment of the file thinking that segment is unique to it. (at least this
is what I think is happening)

Does anyone have any suggestions where else I should look in the code to
modify this?

Thanks,

John

footnote 1:  This is not very clean to do this in the PINT_distribute
function. I did try to keep my changes isolated with the new trove layer by passing the distribution info to the trove_bstream_[read|write] _list functions but this had the same problem when I did the readback through
the PVFS2 servers as well as having the additional problem that the
readback directly through the other shared file system was _almost_
correct but somehow off by a little bit (seemingly at the end of the
file).

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to