[EMAIL PROTECTED] wrote on Fri, 23 Feb 2007 10:03 -0600:
> On Feb 22, 2007, at 5:54 PM, Sam Lang wrote:
> > We could hand out new handles by choosing one randomly, and then
> > checking if its in the DB, getting rid of the need for a ledger
> > entirely, but I assume this idea was already scratched to avoid the
> > potential costs at creation time, especially as the filesystem grows.
>
> Actually as I think about this some more, maybe its worth
> considering. Right now genconfig only uses the first 2^32 handles,
> dividing them up equally amongst the number of servers. That's
> obviously not anywhere near the possible limit. If genconfig
> allocated even half of the 2^64 handles to the servers, that would
> really decrease the likelihood of selecting an already used handle at
> random, even for a filesystem with millions of files.
>
> Also, the ledger could still be used to keep track of the handles
> that are created during the lifetime of that particular server
> process, as well as the ones that already exist if a randomly chosen
> handle gets a hit. If genconfig allocates over the 1 - 2^63 range,
> with 64 servers the chance of randomly picking an already used handle
> is 1 in 2^56. With 16 million files its still 1 in 4 billion.
>
> The interfaces do allow the client to specify the specific handle or
> a range of handles when doing the create, but we always just get the
> range directly from the config file. Are there use cases out there
> where more limited ranges (or specific handles) are requested by the
> client?
I like the idea of ditching the ledger. What's the reason to keep
track of handles that are created during the lifetime of a
particular server process?
Some old design notes say the servers will track recently freed
handles to avoid the reuse problem. But I'm not sure if we do this
or if it is really a good idea.
Now for some crazy comments.
For create scalability, you may want the client to pick handle IDs
and offer those to the server, so that you can optimistically create
a metafile assuming there are no collisions on the server. These
guessed handle IDs can be random though. We did not implement this
as it would be quite expensive if implemented in terms of the
existing extent/extentlist/ledger data structures.
In the OSD work, we have to do painful things to return a handle ID
in a particular range. I would much rather have the server pick a
random ID and give it to the client. Or for the client to try to
pick a particular ID and hope there is no collision at the server.
So I'd like to discard the idea of pre-assigned per-server handle
ranges and augment our notion of PVFS_handle to include some sort of
"server identifier" as well as the 64-bit ID that is private to the
particular device on which the object sits.
Various distributed FS implementations for wide-area use seem to be
happy with 128-bit handles and assume collisions will never happen.
This always struck me as wildly reckless, but maybe it is time to
accept the fact that these number spaces are really big.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers