Is it KEYONLY or KEYDATA?  Originally only KEYONLY splitting was
available.

If it's KEYONLY, that means when the group is 20% full of KEY data, it
is eligible for splitting.  The rest of the group can store the RECORD
part of the data.

KEYDATA means the group is eligible for splitting with the total of key
data and record data exceeds the splitting threshold.  But I only think
the actual splitting occurs when it's that group's turn.

The reason why you would want a low split percentage is to avoid record
data in overflow blocks.  A 20% split increases the probability that the
record data is in the same block as the key data.  Since your keys are
sequential numeric, a low threshold makes sense if the split threshold
is KEYONLY.  An even lower threshold might be indicated.

We use type 1 KEYONLY splitting, and try to figure out good
SPLIT.LOAD/MERGE.LOAD thresholds.  Our keys are mostly not strictly
sequential integers though.

Try creating the new file with type 1 hashing, and see if it does the
same thing.  The resulting file might be less random access efficient
than a type 0, but probably won't blow up on you.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff Butera
Sent: Thursday, July 05, 2007 9:44 AM
To: [email protected]
Subject: [U2] Unidata Dynamic files

hpux, unidata 7.1.8

More in my quest to understand dynamic files in Unidata.  I understand
that split and merge load are not universal constants and many file
factors affect the optimal values for them.  What I don't understand is
why you'd want a split load of something like 20 in any circumstance. 
Perhaps it's my misunderstanding of split load, but I take it to mean
that when a group reaches 20% full, it will split it into two different
groups.

Of course, I'm dealing with a headache of a dynamic file that just isn't
happy.  In short, it's got about 33,000,000 records, each in under 512
in size with purely integer keys.  I can create a dynamic file with hash
type 0, modulo around 3,000,000 and block size 1024 which looks good at
the
start:  it's got 3 dat segments and 1 small ovr segment.

When I copy records from the original to the new dynamic file and get
somewhere around 20,000,000 records, it start creating ovr segments lie
crazy.  I end up with 4 dat segments and 5 ovr segments.

When I run guide on resulting file, it's taking up about 9Gig on disk,
but only contains about 1.1Gig of data.  I've tried split/merge loads of
20/10 and 60/40 with similar results.

Hence, I'm trying to determine why it's splitting like mad even thought
it's mostly empty.  With the purely integer (sequential) keys, it's got
a nice even distribution in keys.  I can avoid the overflow by creating
with a rediculously large modulo (eg: 10,000,000) but again, I get a
file that's 80% empty in the end.

Anyadvice appreciated.

Jeff Butera, Ph.D.
Administrative Systems
Hampshire College
[EMAIL PROTECTED]
413-559-5556

"Daddy - did you lose your mind?"
                     Catherine Butera
-------
u2-users mailing list
[email protected]
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
[email protected]
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to