Re: [Pvfs2-developers] server crash on startup with millions of files

Rob Ross Fri, 23 Feb 2007 12:38:19 -0800

Wow, I didn't think that this stuff would come up again soon :).

The current implementation (a) tracks what is free and (b) what isrecently used, (c) lets the server choose the handle to return, and (d)keeps a global handle space (a handle is unique across all servers).

the point of (a) was to avoid having to hit storage to find a freehandle. i agree that this is perhaps not that big a deal now that wehave a better handle on how to efficiently use berkeley db.

(a) and (c) together mean that clients never have to retry to get ahandle. i agree that in itself isn't all that valuable.

(b) ensures that clients caching metadata on some file don't end upaccessing some newly created file's data, or deleting some new file'sobject, or some similar thing. this is an important part of allowingclients to cache file metadata (specifically datafile handles) withoutcoordination.

(c) also allows us to precreate objects if we wanted to, although wedon't do that right now. this would be less important if/whenserver-to-server communication is in place and we move file creationover to the server side.

(d) eventually allows us to move objects around without updating thefile's metadata, assuming that we come up with a different mechanism fordetermining where a file resides. A bloom filter sort of approach mightwork, as an example. Or if server-to-server were working the serverscould just figure out where things are with some aggregate comm.

walt's idea seems to allow us to map a collection of objects (a"segment") to a given server, then a client could pick values in thatsegment. my feeling is that this hamstrings our ability to move objectsaround, because we would then need to move entire segments around, as atthe very least it could take a very long time to reach a consistentstate again (think of many large objects needing to be moved; how doclients know where to contact?). this idea is a generalization of pete'sidea to have a server id be part of the object handle; pete's approachmakes it impossible to migrate without changing file metadata. more onthis below.

pete's idea of speeding up creates by guessing at free handles, is ok,but the right way to speed up creates is to precreate. then the latencycan be hidden in the mix of other operations. lustre already does this,and i believe it is every effective for them.

he is correct that randomly picking values would lead towards nasty datastructures in the ledger. i'd be happy to see the "free" list part ofthe leder disappear if that helps. i do think that the "recently freed"list has to stay for the reason listed above, although it could beimplemented differently perhaps -- maybe just leave an entry in the DBnoting when the object was freed, and if it is referenced again afterthe appropriate time we consider it up for grabs? this has a niceside-effect of keeping the object "off limits" even if a server isrestarted.

i don't understand why it is difficult to get a value in a particularrange in the OSD work. can you clarify this pete? can't you just "guess"a value in the range until you get one?

one thing that we could discuss is the relative merit of migration usingthis sort of approach. maybe in fact this idea that i have that we wantto keep a FS-wide object handle space is flawed, that changing filemetadata can be addressed in a reasonable way that simplifies theoverall system, allows for migration, and doesn't have a negative impacton our caching of metadata.

overall i think that changing how we reference objects, with theexception of perhaps redoing how we keep up with free/recently-freedobjects, is something that should perhaps wait until we haveserver-to-server working. we're likely to want to make some changes atthat point anyway, once the system has more control over theconstruction of files and directories. maybe we can discuss how we'dlike things to work in that context and concentrate on getting there,rather than torquing things now and then perhaps messing with things again?

thanks everyone! it's fun to get to sit and think about this stuff,especially after many days of travel and meetings :).


regards,

rob

Walter B. Ligon III wrote:

I don't understand this. Is there a scheme whereby there is no mappingof the handle ID to a server? If not, then what we are talking about, Ithink, is whether the server mapping is fixed or not. The idea behindthe current scheme was to make the mapping of servers to handlesflexible. That said, the specific implementation might could be better.For example, using 128 bits we could have a 64 bit segment tag and a 64bit handle ID. The segment tag would map the handle to a server via thetables, and the ID would be unique within that segment. This mightsimplify some things without losing the flexibility we have.
As it is, the server can still randomly pick an ID, or a client couldrandomly pick an ID, they just have to do it within a range, which isn'tparticularly hard. With this suggested modification we could"eliminate" the range by giving all "handle ranges" a built-in extent of64 bits, which I think is the same as what you were suggesting.
If I'm not being clear, let me know and I'll try again. Or, if I don'tunderstand the problem, let me know that.
Walt

Pete Wyckoff wrote:
For create scalability, you may want the client to pick handle IDs
and offer those to the server, so that you can optimistically create
a metafile assuming there are no collisions on the server.  These
guessed handle IDs can be random though.  We did not implement this
as it would be quite expensive if implemented in terms of the
existing extent/extentlist/ledger data structures.

In the OSD work, we have to do painful things to return a handle ID
in a particular range.  I would much rather have the server pick a
random ID and give it to the client.  Or for the client to try to
pick a particular ID and hope there is no collision at the server.

So I'd like to discard the idea of pre-assigned per-server handle
ranges and augment our notion of PVFS_handle to include some sort of
"server identifier" as well as the 64-bit ID that is private to the
particular device on which the object sits.

Various distributed FS implementations for wide-area use seem to be
happy with 128-bit handles and assume collisions will never happen.
This always struck me as wildly reckless, but maybe it is time to
accept the fact that these number spaces are really big.

        -- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] server crash on startup with millions of files

Reply via email to