On Jun 12, 2006, at 10:07 PM, Rob Ross wrote:

hey,

i know we're trying to keep the # of DBs down, but would it really hurt that much to just use a separate DB for this data rather than having to play funny games with the key strings?

I don't have much preference either way. I don't find the null string to be that much of a hack, but I can see the advantages of having a separate db for stuff like this. One disadvantage of separate dbs is that we can't just do one sync at the end of a crdirent or rmdirent.


also, it seems a little wacky that we have to pass a flag to tell trove when to count and when not to count. is there a clean way to avoid that?

This is the problem that dbpf doesn't know anything about the common keys. We could copy the common keys in the dbpf layer, kind of an ugly hack though. Also, the crdirent and rmdirent calls just give a handle and the component name, so we really can only tell the difference between common keys and everything else (! is_this_a_common_key(key)). In this case that will either be a component name or an xattr. So we'd only be able to do as good as counting both xattrs and directory entries.

We talked about just adding the count to every handle in the keyval db. That adds a bunch of unecessary keyval entries (for each file and directory). I was trying to avoid that, but maybe the cost isn't worth the hastle.


how do you read the count?


There's an additional trove_keyval_get_handle_info function.

otherwise i think it's great that we're moving the count increment/ decrement into trove, that this will allow for concurrent modification, and that we can simplify the state machines.

thanks!

rob

Sam Lang wrote:
Hi all,
The new keyval code currently stores the size of a directory as a separate common keyval. The server state machines update this value with get/set state actions as needed (in crdirent,rmdirent,etc.). This get and set actually prevents us from allowing the create and delete operations of different files in the same directory to take place concurrently, since the crdirent and rmdirent ops (on the parent dirdata handle) get serialized. I'd like to fix all this by providing a keyval per handle that contains a null string as part of the key (I call it keyval-handle- info). The advantage of making it the null string is that it will appear first in the lexical ordering of directory entries, so I can skip over it in readdir easily. This null keyval would only be created on handles as necessary (right now only for counting dirents). The TROVE_KEYVAL_HANDLE_COUNT ds flag can be passed to trove operations, for example in the case of crdirent, the TROVE_KEYVAL_HANDLE_COUNT and TROVE_NOOVERWITE flags would be passed to the trove_keyval_write call and specify that the count should be incremented (or created and set to 0 if it doesn't exist). rmdirent would do something similar in trove_keyval_remove. Also, at present the crdirent and rmdirent state machines first do a read of the keyval to check for existence. This seems unnecessary. Instead, the crdirent sm can just pass TROVE_NOOVERWITE to the keyval_write call, and fail if that call fails. rmdirent already fails if the keyval_remove fails so the extra keyval_read to check for existence seems redundant. Are there any good reasons for those extra state actions that I'm missing? I've attached a patch of the changes I've described. I would like to have this go in to the trunk before the upcoming release, since it requires (yet another) storage format change. Let me know if there are any questions or concerns.


_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to