Doug may have had a key bounce in his input
> Let's do the math:
> 258687736 (Record Size)
> 192283300 (Key Size)
The key size is actually 19283300 in Chris' figures
Regarding 68,063 being less than the current modulus of 82,850. I think the
answer may lie in the splitting process.
As I understand it, the first time a split occurs group 1 is split and its
contents are split between new group 1 and new group 2. All the other groups
effectively get 1 added to their number. The next split is group 3 (which was
2) into 3 and 4 and so forth. A pointer is kept to say where the next split
will take place and also to help sort out how to adjust the algorithm to
identify which group matches a given key.
Based on this, if you started with 1000 groups, by the time you have split the
500th time you will have 1500 groups. The first 1000 will be relatively empty,
the last 500 will probably be overflowed, but not terribly badly. By the time
you get to the 1000th split, you will have 2000 groups and they will, one
hopes, be quite reasonably spread with very little overflow.
So I expect the average access times would drift up and down in a cycle. The
cycle time would get longer as the file gets bigger but the worst time would be
roughly the the same each cycle.
Given the power of two introduced into the algorithm by the before/after the
split thing, I wonder if there is such a need to start off with a prime?
PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
U2-Users mailing list