On Feb 23, 2007, at 10:45 AM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Fri, 23 Feb 2007 10:03 -0600:
On Feb 22, 2007, at 5:54 PM, Sam Lang wrote:
We could hand out new handles by choosing one randomly, and then
checking if its in the DB, getting rid of the need for a ledger
entirely, but I assume this idea was already scratched to avoid the
potential costs at creation time, especially as the filesystem
grows.
Actually as I think about this some more, maybe its worth
considering. Right now genconfig only uses the first 2^32 handles,
dividing them up equally amongst the number of servers. That's
obviously not anywhere near the possible limit. If genconfig
allocated even half of the 2^64 handles to the servers, that would
really decrease the likelihood of selecting an already used handle at
random, even for a filesystem with millions of files.
Also, the ledger could still be used to keep track of the handles
that are created during the lifetime of that particular server
process, as well as the ones that already exist if a randomly chosen
handle gets a hit. If genconfig allocates over the 1 - 2^63 range,
with 64 servers the chance of randomly picking an already used handle
is 1 in 2^56. With 16 million files its still 1 in 4 billion.
The interfaces do allow the client to specify the specific handle or
a range of handles when doing the create, but we always just get the
range directly from the config file. Are there use cases out there
where more limited ranges (or specific handles) are requested by the
client?
I like the idea of ditching the ledger. What's the reason to keep
track of handles that are created during the lifetime of a
particular server process?
Some old design notes say the servers will track recently freed
handles to avoid the reuse problem. But I'm not sure if we do this
or if it is really a good idea.
Now for some crazy comments.
For create scalability, you may want the client to pick handle IDs
and offer those to the server, so that you can optimistically create
a metafile assuming there are no collisions on the server. These
guessed handle IDs can be random though. We did not implement this
as it would be quite expensive if implemented in terms of the
existing extent/extentlist/ledger data structures.
In the OSD work, we have to do painful things to return a handle ID
in a particular range. I would much rather have the server pick a
random ID and give it to the client. Or for the client to try to
pick a particular ID and hope there is no collision at the server.
Rob and I have talked about this a little bit. At the least an IO
server's handle range could be partitioned up amongst the metadata
servers in the config file. Then its up to the metadata server to
allocate datafile handles for servers. This still seems reasonable
with randomly chosen handles if the range is big enough.
So I'd like to discard the idea of pre-assigned per-server handle
ranges and augment our notion of PVFS_handle to include some sort of
"server identifier" as well as the 64-bit ID that is private to the
particular device on which the object sits.
Are you talking about increasing the size of PVFS_handle (object,
whatever) to 128 bits, and use the upper half for server/object
namespace? I hate to sound like Bill Gates, but surely no one will
ever need 2^64 servers. We actually sort of already have a server
identifier built into the handle, although I agree its thinking about
the problem differently than ranges. It seems like including the
actual server id in the handle/object thingy reduces the transparency
of the object. What would it do to migration, for example?
Various distributed FS implementations for wide-area use seem to be
happy with 128-bit handles and assume collisions will never happen.
This always struck me as wildly reckless, but maybe it is time to
accept the fact that these number spaces are really big.
I thought they used something like uuids and embedded host and
timestamp info into the actual handle to guarantee uniqueness. It
does seem odd that they would just assume that collisions of random
numbers wouldn't occur.
-sam
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers