Thanks Sam,

Sadly, however, your suggestion did not just work directly out of the box.
The servers are not at the correct logical offsets and are overwriting
each other.  Additionally, your approach suffers the same problem that
mine did which is that each datafile (which is actually now the same
shared datafile) is read completely by the client from each server.
Therefore when reading a file, the client actually creates a new file
that is comprised of N copies of the file where N is the number of
datafiles.

Perhaps the cleanest solution would be to create a new distribution and
pass this to the client.  This distribution would simply instruct the
client to use the same physical offsets as logical.  This should then work
equally well for reads and writes.  I see in the io/description directory
that there are already distributions for simple-stripe, basic, and
varstrip.  Is this a workable, good approach?

John


On Tue, 27 Jun 2006, Sam Lang wrote:

>
> Hi John,
>
> I think the best way (others can correct me) to modify the pvfs2 code
> to get the trove layer to operate on the logical offsets and sizes is
> in the flow code (flowproto_multiqueue.c).  My reasoning is that the
> flow layer converts the PVFS_Request structure into the physical
> offsets and sizes and passes them on to the trove layer.  The trove
> layer doesn't care whether the offsets and sizes passed to it (via
> trove_bstream_{read|write}_list) are logical or physical offsets, it
> just uses those values to operate directly on the bstream file in the
> normal case.  In your case the offsets and sizes passed to trove
> could be logical, and trove wouldn't know the difference.  In other
> words, you shouldn't have to modify any of the distribution code, or
> manipulate the offsets and sizes in the trove code, just use what the
> flow layer gives you.
>
> The changes to the flow layer require that PINT_process_request
> return logical offsets instead of physical offsets (for both reads
> and writes).  It will do this if you pass PINT_CLIENT instead of
> PINT_SERVER as the mode (5th argument).  You will need to do this in
> each instance of PINT_process_request where PINT_SERVER is used.
>
> PINT_process_request is a bit hard to use in some cases, but these
> changes are simple enough, and the offsets and sizes are treated
> opaquely everywhere outside of the function (except in AIO of
> course), which turns out to be a nice design of the framework in my
> view.
>
> One caveat:  You will probably want to either turn off small IO,
> which doesn't use the flow layer, or make the same modifications to
> PINT_process_request in the small-io.sm.  You can just turn it off by
> compiling with CFLAGS=-DPVFS2_SMALL_IO_OFF.
>
> -sam
>
> On June 27, 2006, at 11:28AM, John Bent wrote:
>
> >
> > Ok, I've removed the footnote.  Now I'm doing everything within the
> > new
> > trove layer and no longer doing it in the PINT_distribute although
> > I did
> > change some things slightly.  The problem was that PINT_ADD_SEGMENT
> > was
> > combining the segments assuming they were in their own individual
> > stripe.
> > However, since they now must be interspersed with segments from other
> > servers, they can no longer be combined.  (Obviously, this will
> > adversely
> > affect performance of the old trove layer so it calls for some layer
> > violation to only turn off merging depending on the trove layer
> > selected.
> > I guess later if I care about this, I call add a trove function to the
> > trove function table to this effect.)
> >
> > I'm still however unable to read the files back correctly using the
> > pvfs2
> > servers.
> >
> > John
> >
> > On Tue, 27 Jun 2006, John Bent wrote:
> >
> >>
> >> Hello,
> >>
> >> I'm working on a pet research project in which I'm (somewhat
> >> abashedly)
> >> actually _removing_ functionality from PVFS2.  What I'm trying to
> >> do is
> >> create a new trove interface in which requests to disk are no longer
> >> logically striped across multiple PVFS2 servers each with its own
> >> physical storage but are rather passed transparently from client
> >> through
> >> PVFS2 onto a second and underlying shared file system on which
> >> each PVFS2
> >> server is mounted.
> >>
> >> In order to do this, I have extended IO requests to pass the logical
> >> filenames along with the handles and I have further modified
> >> PINT_distribute(footnote 1) to use the file distribution info to
> >> translate
> >> its physical offset into the actual logical file offset and then
> >> pass this
> >> logical offset to the PINT_ADD_SEGMENT macro.
> >>
> >> This works in that files written to pvfs2 servers are transparently
> >> created in the pvfs2 storage space.  These files can then be
> >> correctly
> >> read directly from the other underlying shared file system.
> >>
> >> However, they can no longer be read correctly through the PVFS2
> >> servers.
> >> Perhaps when I write to the actual logical offsets instead of to the
> >> striped offsets, I am fooling the pvfs2 servers into thinking those
> >> logical offsets are actually the striped ones?  When I try to read
> >> the
> >> file back, I get a file that is N times the correct size where N
> >> is the
> >> number of data servers.  What happens is that each server gives me
> >> each
> >> segment of the file thinking that segment is unique to it.  (at
> >> least this
> >> is what I think is happening)
> >>
> >> Does anyone have any suggestions where else I should look in the
> >> code to
> >> modify this?
> >>
> >> Thanks,
> >>
> >> John
> >>
> >> footnote 1:  This is not very clean to do this in the PINT_distribute
> >> function.  I did try to keep my changes isolated with the new
> >> trove layer
> >> by passing the distribution info to the trove_bstream_[read|write]
> >> _list
> >> functions but this had the same problem when I did the readback
> >> through
> >> the PVFS2 servers as well as having the additional problem that the
> >> readback directly through the other shared file system was _almost_
> >> correct but somehow off by a little bit (seemingly at the end of the
> >> file).
> >>
> >> _______________________________________________
> >> Pvfs2-developers mailing list
> >> [email protected]
> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >>
> > _______________________________________________
> > Pvfs2-developers mailing list
> > [email protected]
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
> >
>
>
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to