I'm warming to the idea of doing everything related to distributions through the hints mechanism, even though it sort of undoes Pete's attempt to simplify the discussion.

The cost of string processing is minimal compared to everything else that we do, so that isn't a big deal IMO. We could still transport distribution info the same way that we do now, and in fact this extension to allow for enumerating servers (via aliases) wouldn't impact that either (or doesn't have to right now at least).

I think everyone is ok with:
- storing handles for datafiles, not aliases
- not creating three new distributions, but instead just changing how we get the list of servers to them (Sam believes that we can do this nicely)
- keep other existing functionality of simple_stripe/var_strip

I personally don't think the environment variable approach is all that useful for PVFS. I believe this because there are only a *very* limited number of cases where it would apply cleanly (e.g. pvfs2-cp) that there isn't another mechanism for doing the same thing (e.g. MPI-IO hints). I think that MPI-IO hints should have some mechanism along those lines, and I wouldn't mind discussing that separately, but that's a different topic.

So I think we're mostly trying to work out what our API should really be, whether we should extend the distro functionality vs. going totally to hints, and if we go to hints what that API should look like, right?

Julian and others have done examples of passing in distro parameters to MPI-IO, and BradS I think has done this too. What did they look like (since they would be other examples of string passing of values)? What did we like/dislike about them?

Regards,

Rob

Walter B. Ligon III wrote:
I tend to agree with Pete and Sam ... Given the existing distro interface this is the right place to set a list of specific servers - though I tend to think it should be independent of the distro type (just as the number of servers is currently independent of which distro type you use). The list of servers shouldn't be saved as part of the metadata - I think Sam's example of setting such a list on a directory is a distinct case. In this case you set an extended attribute which is used as the default during file creation - but that list still isn't saved with the files (but might be saved as an extended attrib on a subdir ... but I digress).

On the other hand, if we had a good generic hint mechanism we COULD replace the entire distro interface and just use hints to do all of that. I'm not sure this is the best approach because it is awkward. Especially if everything is specified using strings.

Maybe we should do it all with XML!   ;-)

Walt

Sam Lang wrote:

On Oct 7, 2006, at 2:09 PM, Rob Ross wrote:

I agree completely with Pete. I think we might consider just adjusting the input parameters on the client side to allow for inclusion of the list of aliases. There is no reason for these to be stored as part of the distribution information on the server, as once the objects are created we already know where they are.


I agree storing them in a file's distribution is redundant. The distribution stored on a directory in the extended attribute is the only reason I can think of for storing the list in the distro.

In terms of interfaces, the distribution parameter we pass into create is really a sort of hint: We don't require it (defaulting to simple stripe if its not specified), its opaque to the caller (specified by a key string: "varstrip" and arbitrary parameters), and it functions as a 'hint' in the way we seem to have a need for them, changing the distribution of that file.

It might be possible to design a hints interface that both allows us to express a distribution as one of possibly many generic hints being passed into create, without losing the convenience of just passing in a distribution, or a list of servers.

-sam


We have what, 3 distributions that will need to be adjusted to this new scheme? That shouldn't be too bad, and they overall cover a very wide range of options.

Regards,

Rob

Pete Wyckoff wrote:

[EMAIL PROTECTED] wrote on Fri, 06 Oct 2006 16:33 -0500:

On Oct 6, 2006, at 1:48 PM, Julian Martin Kunkel wrote:

Also it will not
allow to set the servers for all distributions...

Yeah I can't imagine wanting to ever do that. It would mean passing in a distribution different from the default simple- stripe, as well as a hint saying you want a specific set of servers in the same call. Seems sort of yucky to me. I'd rather have all the information about the distribution in the distribution. You're even able to use the distribution field in the directory hints structure to specify per-directory IO server lists. Not that you would ever want to do that either...

I agree with Sam that this is yucky.  I'm hijacking this thread.
Let's forget about hints for a moment and decide how we want to
extend the concept of distributions, as seen by users, in such a way
that they can specify particular IO servers by name.  If this is an
interface people want, we should design it properly, not just
implement it with hints because we (might) have them.
Some issues, please suggest approaches and other issues.  (I'm using
"name" here to mean host alias.)
1.  What kind of control do users want?
    - all data on one server by name?
    - arbirtrary control of stripe sizes and host names?
2.  New distribution name, or extension to existing ones?
    - dist-varstrip has a lot of flexibility, but no hostnames
    - maybe a new "dist-single-host-by-name" is all that is desired
3.  Store hostnames in on-disk distribution?
    - guessing no for the single-stripe distro, but perhaps somebody
    can really think of a use case for this?
4.  User API
    - through PVFS_dist_create
    - (please not through both PVFS_dist_create + some hint)
    - via environment variable too?
If our design happens to end up as something that would be
implemented well by hints, then we can think about using them.  For
now, let's just get the design correct.
We can come back and argue the merits of a generic hint interface in
a different thread.
        -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to