On Jun 12, 2006, at 10:07 PM, Rob Ross wrote:
hey,
i know we're trying to keep the # of DBs down, but would it really
hurt that much to just use a separate DB for this data rather than
having to play funny games with the key strings?
I don't have much preference either way. I don't find the null
string to be that much of a hack, but I can see the advantages of
having a separate db for stuff like this. One disadvantage of
separate dbs is that we can't just do one sync at the end of a
crdirent or rmdirent.
also, it seems a little wacky that we have to pass a flag to tell
trove when to count and when not to count. is there a clean way to
avoid that?
This is the problem that dbpf doesn't know anything about the common
keys. We could copy the common keys in the dbpf layer, kind of an
ugly hack though. Also, the crdirent and rmdirent calls just give a
handle and the component name, so we really can only tell the
difference between common keys and everything else (!
is_this_a_common_key(key)). In this case that will either be a
component name or an xattr. So we'd only be able to do as good as
counting both xattrs and directory entries.
We talked about just adding the count to every handle in the keyval
db. That adds a bunch of unecessary keyval entries (for each file
and directory). I was trying to avoid that, but maybe the cost isn't
worth the hastle.
how do you read the count?
There's an additional trove_keyval_get_handle_info function.
otherwise i think it's great that we're moving the count increment/
decrement into trove, that this will allow for concurrent
modification, and that we can simplify the state machines.
thanks!
rob
Sam Lang wrote:
Hi all,
The new keyval code currently stores the size of a directory as a
separate common keyval. The server state machines update this
value with get/set state actions as needed (in
crdirent,rmdirent,etc.). This get and set actually prevents us
from allowing the create and delete operations of different files
in the same directory to take place concurrently, since the
crdirent and rmdirent ops (on the parent dirdata handle) get
serialized.
I'd like to fix all this by providing a keyval per handle that
contains a null string as part of the key (I call it keyval-handle-
info). The advantage of making it the null string is that it will
appear first in the lexical ordering of directory entries, so I
can skip over it in readdir easily. This null keyval would only
be created on handles as necessary (right now only for counting
dirents). The TROVE_KEYVAL_HANDLE_COUNT ds flag can be passed to
trove operations, for example in the case of crdirent, the
TROVE_KEYVAL_HANDLE_COUNT and TROVE_NOOVERWITE flags would be
passed to the trove_keyval_write call and specify that the count
should be incremented (or created and set to 0 if it doesn't
exist). rmdirent would do something similar in trove_keyval_remove.
Also, at present the crdirent and rmdirent state machines first do
a read of the keyval to check for existence. This seems
unnecessary. Instead, the crdirent sm can just pass
TROVE_NOOVERWITE to the keyval_write call, and fail if that call
fails. rmdirent already fails if the keyval_remove fails so the
extra keyval_read to check for existence seems redundant. Are
there any good reasons for those extra state actions that I'm
missing?
I've attached a patch of the changes I've described. I would like
to have this go in to the trunk before the upcoming release, since
it requires (yet another) storage format change. Let me know if
there are any questions or concerns.
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers