hey,

i know we're trying to keep the # of DBs down, but would it really hurt that much to just use a separate DB for this data rather than having to play funny games with the key strings?

also, it seems a little wacky that we have to pass a flag to tell trove when to count and when not to count. is there a clean way to avoid that?

how do you read the count?

otherwise i think it's great that we're moving the count increment/decrement into trove, that this will allow for concurrent modification, and that we can simplify the state machines.

thanks!

rob

Sam Lang wrote:

Hi all,

The new keyval code currently stores the size of a directory as a separate common keyval. The server state machines update this value with get/set state actions as needed (in crdirent,rmdirent,etc.). This get and set actually prevents us from allowing the create and delete operations of different files in the same directory to take place concurrently, since the crdirent and rmdirent ops (on the parent dirdata handle) get serialized.

I'd like to fix all this by providing a keyval per handle that contains a null string as part of the key (I call it keyval-handle-info). The advantage of making it the null string is that it will appear first in the lexical ordering of directory entries, so I can skip over it in readdir easily. This null keyval would only be created on handles as necessary (right now only for counting dirents). The TROVE_KEYVAL_HANDLE_COUNT ds flag can be passed to trove operations, for example in the case of crdirent, the TROVE_KEYVAL_HANDLE_COUNT and TROVE_NOOVERWITE flags would be passed to the trove_keyval_write call and specify that the count should be incremented (or created and set to 0 if it doesn't exist). rmdirent would do something similar in trove_keyval_remove.

Also, at present the crdirent and rmdirent state machines first do a read of the keyval to check for existence. This seems unnecessary. Instead, the crdirent sm can just pass TROVE_NOOVERWITE to the keyval_write call, and fail if that call fails. rmdirent already fails if the keyval_remove fails so the extra keyval_read to check for existence seems redundant. Are there any good reasons for those extra state actions that I'm missing?

I've attached a patch of the changes I've described. I would like to have this go in to the trunk before the upcoming release, since it requires (yet another) storage format change. Let me know if there are any questions or concerns.
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to