This is a pretty ugly file!  Here's what I see:

1)  Modulo is way too big!  3 million groups for .9 million records;
1.6 GB physical space for 150 MB of data.  Note the large number of
empty groups in the 25% column at the bottom of the FILE.STAT report.
Probably the modulo was pushed to TRY to make up for the really lousy
hashing!  More about this below in 2).

2)  Lousy hashing distribution.  Note 2.8 million empty and sparse
groups in the 25% column; but at the same time 25,000 groups 200% +
full.  This isn't due to record size as the largest record is 2644
bytes.  Note that the largest group has 7417 records - if all these were
"average size" (Murphy says they aren't, though) that group would have
1.25 MB of data.  Murphy also says that the most popular records live at
the end of the largest group so there is your performance problem, quite
likely -- tons of I/O required to get to the end of the large groups.

What to do?

Step 1 - See if another type will do a better job.  Forget about
HASH.HELP and forget about the key patterns documented for the various
types -- yes, I know that type 18 "should" work best, but life isn't
that simple.  [AD] If you have FAST, use it. [/AD]  If not, use HASH.AID
to simulate the various types.  In using HASH.AID I'd suggest picking a
reasonable modulo, say around 200,001 or so.  ** BIG NOTE ** This modulo
choice is based on a separation of 4 which I'd recommend for a 2K data
buffer -- if you want to stay with separation 1 use a modulo of 800,001
or so ** END BIG NOTE **  Before running HASH.AID clear the
HASH.AID.FILE (CLEAR.FILE HASH.AID.FILE).  Then use HASH.AID with your
modulo and separation of choice and interate through all the available
types -- syntax is HASH.AID  SALES-HIST-BR1 and let it prompt you for
the Type, Modulo and Separation; for Type enter "2,18,1" which is like
FOR 1 TO 18, STEP 1.  Don't bother reading the output, just enter "N"
and let it scroll by.  When it's all done use LIST HASH.AID.FILE to
examine the results.  Look for the type that yields the smallest
"Largest Group" the fewest "Oversize Groups" and the closest together
"Smallest Group" and "Largest Group".  If one of the types does a lot
better than type 18 give it a try and see if it does better.  Note that
one flaw with HASH.AID is that it doesn't report empty groups (alas!).

If you find a better type it may solve or help your problem.  If not,

Step 2 - Read the very helpful post by Scott Ballinger in which he notes
that large, complex record keys sometimes don't hash well and could
cause the sort of problem you are seeing.  If none of the other file
types do better than type 18 I'm afraid this is what you are facing.
Were the file isolated the fix would be to move any important
information carried by the record key into one or more fields and
replace the compound record keys with sequential numeric, which as Scott
notes, often hash more reliably.  However, if the file is heavily
embedded in the application software this might not be a trivial change
to make!

Hope this helps!  Let us know how it turns out or if other questions
arise...

Jeff Fitzgerald
Fitzgerald & Long, Inc.
www.fitzlong.com



-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of roy
Sent: Tuesday, October 16, 2007 2:14 PM
To: [email protected]
Subject: RE: [U2] Size of Key Question

File name                               = SALES-HIST-BR1
File type                               = 18
Number of groups in file (modulo)       = 3000017
Separation                              = 1
Number of records                       = 883026
Number of physical bytes                = 1667799040
Number of data bytes                    = 150663032

Average number of records per group     = 0.2943
Average number of bytes per group       = 50.2207
Minimum number of records in a group    = 0
Maximum number of records in a group    = 7417

Average number of bytes per record      = 170.6213
Minimum number of bytes in a record     = 64
Maximum number of bytes in a record     = 2644

Average number of fields per record     = 25.6579
Minimum number of fields per record     = 11
Maximum number of fields per record     = 41

Groups   25%     50%     75%    100%    125%    150%    175%    200%
full
     2855826   50132   31541   14753   12862    5383    4611   24909
Press any key to continue...
-------
u2-users mailing list
[email protected]
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to