On Jan 29, 2008, at 1:42 PM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Tue, 29 Jan 2008 13:32 -0600:
On Jan 28, 2008, at 6:43 PM, Pete Wyckoff wrote:
[EMAIL PROTECTED] wrote on Mon, 28 Jan 2008 16:38 -0600:
Attached patch disables the handle ledger. For those not
familiar, the
handle ledger is an in-memory structure that maintains allocated
handles
for a given server. I'm disabling it because reading the entire
database
each time the server loads is extremely expensive for large
filesystems.
Instead of choosing a handle from the ledger, the patch picks one
randomly.
This means we have to deal with collisions now, but because of
our large
handle space, they only occur every 100 billion times or so.
I didn't blow away the handle allocation code entirely...I just
disabled
the calls that we had been using to invoke the handle ledger, and
added
some functionality that picks a random handle from a given
range. In the
dspace code, I modified the create function to continue up to 32
times if
a
collision with an already existing handle occurs.
Great change. Never liked that myself either. Some comments.
diff -u -a -p -r1.152 dbpf-dspace.c
--- src/io/trove/trove-dbpf/dbpf-dspace.c 8 Nov 2007 21:48:22
-0000 1.152
+++ src/io/trove/trove-dbpf/dbpf-dspace.c 28 Jan 2008 21:55:49
-0000
[..]
+ } while(ret != DB_NOTFOUND && ++attempts >
MAX_HANDLE_ALLOC_ATTEMPTS);
Uh, maybe <.
Are you arguing for increasing the max number of attempts, or just
retrying
forever?
Maybe I misunderstand the termination condition for the loop. You
want it to keep trying until attempts gets up to a certain value.
Just the < is backwards. If I'm spacing and you're sure this is
right, ignore me.
Oh, no you're right. Heh, I was the one spacing, and I thought you
were doing some kind of weird winking smiley. <-% Nice catch!
+ rfd = open("/dev/urandom", O_RDONLY, 0);
+ if(rfd < 0)
+ {
+ return -PVFS_EINVAL;
+ }
Painted ourselves into a linux-specific corner here. Maybe have the
usual time() etc. srand option here too if open fails.
+ random_r(&trove_handle_random_data, &r1);
+ i = r1 % extent_array->extent_count;
May want a feature test for this. Not sure if POSIX has gotten
itself into all the OSes on which people may run servers.
Right, I was concerned with making sure I got a good seed here. It
needs
to generate both a very large random sequence from the seed, as
well as not
pick the same seed over and over on server startup. Using
initstate_r with
an array size of 256 makes the values returned by random_r much more
random, and passing the current time ensures that the seed will be
different on each server startup.
If we use the more primitive forms of getting a random number, its
just
more likely to get repeated values for handles. Is that
acceptable? Does
it become the user's problem his random handle values aren't so
random?
Yeah. It will just run through the same set of allocated handles,
taking a long time that first time for people with lousy RNGs. Then
it will fall into an unallocated space and continue normally. As
long as there is a configure test for random_r, we can fall back to
lrand48() and friends or even ancient srand/rand. /dev/urandom
test must be at runtime, with graceful fallback to a seed made up of
hostname[0:255] | time() << 29 | coll_id << 63 | ... any other
random stuff you can get your hands on in that routine easily.
Not sure hostname is actually useful in this case, since the handles
are allocated (and only need to be unique) per-server. Same with the
fs_id. I could use process id I suppose...
Can I check for /dev/urandom with a runtime check in configure, or are
those verboten for cross-compiles?
-sam
I thought about proposing just doing linear allocation. Find the
highest handle, allocate +1 on that. That's what we do with the
OSDs, using a 1-element cache to remember the last handle allocated.
This works nicely until you first fill up your handle space and have
to wrap, then can go bad if you hit a run of undeleted old handles.
I've no idea what the cost is to run the RNG. Presumably it is very
fast. In which case just doing it all the time like you have it is
perfect.
-- Pete
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers