Yes (agreeing with Jeff on file sizing isn't a very reckless thing to do), except I'd stress this: don't use a seperation of 1. Go to 4, at least. If it turns out that a high percentage of the records are over 2K, then try a sep of 8. In certain cases, you may want to go to 16, but this isn't one of them. Never go above 16.
Here's links to Mark Baldridge's series on the subject of file sizing. http://www.ibm.com/developerworks/edu/dm-dw-dm-0512baldridge-i.html http://www.ibm.com/developerworks/edu/dm-dw-dm-0603baldridge-i.html http://www.ibm.com/developerworks/edu/dm-dw-dm-0606baldridge-i.html http://www.ibm.com/developerworks/edu/dm-dw-dm-0611baldridge-i.html Registration is required, but free. > Subject: RE: [U2] Size of Key Question> Date: Tue, 16 Oct 2007 19:07:18 -0400> From: [EMAIL PROTECTED]> To: [email protected]> > This is a pretty ugly file! Here's what I see:> > 1) Modulo is way too big! 3 million groups for .9 million records;> 1.6 GB physical space for 150 MB of data. Note the large number of> empty groups in the 25% column at the bottom of the FILE.STAT report.> Probably the modulo was pushed to TRY to make up for the really lousy> hashing! More about this below in 2).> > 2) Lousy hashing distribution. Note 2.8 million empty and sparse> groups in the 25% column; but at the same time 25,000 groups 200% +> full. This isn't due to record size as the largest record is 2644> bytes. Note that the largest group has 7417 records - if all these were> "average size" (Murphy says they aren't, though) that group would have> 1.25 MB of data. Murphy also says that the most popular records live at> the end of the largest group so there is your performance problem, quite> likely -- tons of I/O required to get to the end of the large groups.> > What to do?> > Step 1 - See if another type will do a better job. Forget about> HASH.HELP and forget about the key patterns documented for the various> types -- yes, I know that type 18 "should" work best, but life isn't> that simple. [AD] If you have FAST, use it. [/AD] If not, use HASH.AID> to simulate the various types. In using HASH.AID I'd suggest picking a> reasonable modulo, say around 200,001 or so. ** BIG NOTE ** This modulo> choice is based on a separation of 4 which I'd recommend for a 2K data> buffer -- if you want to stay with separation 1 use a modulo of 800,001> or so ** END BIG NOTE ** Before running HASH.AID clear the> HASH.AID.FILE (CLEAR.FILE HASH.AID.FILE). Then use HASH.AID with your> modulo and separation of choice and interate through all the available> types -- syntax is HASH.AID SALES-HIST-BR1 and let it prompt you for> the Type, Modulo and Separation; for Type enter "2,18,1" which is like> FOR 1 TO 18, STEP 1. Don't bother reading the output, just enter "N"> and let it scroll by. When it's all done use LIST HASH.AID.FILE to> examine the results. Look for the type that yields the smallest> "Largest Group" the fewest "Oversize Groups" and the closest together> "Smallest Group" and "Largest Group". If one of the types does a lot> better than type 18 give it a try and see if it does better. Note that> one flaw with HASH.AID is that it doesn't report empty groups (alas!).> > If you find a better type it may solve or help your problem. If not,> > Step 2 - Read the very helpful post by Scott Ballinger in which he notes> that large, complex record keys sometimes don't hash well and could> cause the sort of problem you are seeing. If none of the other file> types do better than type 18 I'm afraid this is what you are facing.> Were the file isolated the fix would be to move any important> information carried by the record key into one or more fields and> replace the compound record keys with sequential numeric, which as Scott> notes, often hash more reliably. However, if the file is heavily> embedded in the application software this might not be a trivial change> to make!> > Hope this helps! Let us know how it turns out or if other questions> arise...> > Jeff Fitzgerald> Fitzgerald & Long, Inc.> www.fitzlong.com> > > > -----Original Message-----> From: [EMAIL PROTECTED]> [mailto:[EMAIL PROTECTED] On Behalf Of roy> Sent: Tuesday, October 16, 2007 2:14 PM> To: [email protected]> Subject: RE: [U2] Size of Key Question> > File name = SALES-HIST-BR1> File type = 18> Number of groups in file (modulo) = 3000017> Separation = 1> Number of records = 883026> Number of physical bytes = 1667799040> Number of data bytes = 150663032> > Average number of records per group = 0.2943> Average number of bytes per group = 50.2207> Minimum number of records in a group = 0> Maximum number of records in a group = 7417> > Average number of bytes per record = 170.6213> Minimum number of bytes in a record = 64> Maximum number of bytes in a record = 2644> > Average number of fields per record = 25.6579> Minimum number of fields per record = 11> Maximum number of fields per record = 41> > Groups 25% 50% 75% 100% 125% 150% 175% 200%> full> 2855826 50132 31541 14753 12862 5383 4611 24909> Press any key to continue...> -------> u2-users mailing list> [email protected]> To unsubscribe please visit http://listserver.u2ug.org/ _________________________________________________________________ Windows Live Hotmail and Microsoft Office Outlook together at last. Get it now. http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971 033 ------- u2-users mailing list [email protected] To unsubscribe please visit http://listserver.u2ug.org/
