Re: [U2] Learning about file sizing
Marc, I need to understand more clearly about the split/merge formula. Below it states: SPLIT = INT(RECORDS PER BLOCK * IDSIZE * 100 / BLOCKSIZE) SPLIT = INT( 9 * 17 * 100 / 1024) = 14 How did they come up with 9 as the RECORDS PER BLOCK from the file status outlined? Sincerely, David Laansma IT Manager Hubbard Supply Co. Direct: 810-342-7143 Office: 810-234-8681 Fax: 810-234-6142 www.hubbardsupply.com "Delivering Products, Services and Innovative Solutions" -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Rutherford, Marc Sent: Tuesday, June 05, 2012 3:50 PM To: U2 Users List Subject: Re: [U2] Learning about file sizing Rod, Excellent post! I have a file I have been wanting to convert to dynamic. Since it not something I do every day I have been stalling for a while now... Thanks, Marc Rutherford Principal Programmer Analyst Advanced Bionics LLC 661) 362 1754 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Baakkonen, Rodney A (Rod) 46K Sent: Tuesday, June 05, 2012 10:53 AM To: 'U2 Users List' Subject: Re: [U2] Learning about file sizing Can't remember if this came from Wally or not a long time ago. But I use it to figure out Split/Merge. I have a development box that has a copy of production that I can play with. So I do a lot of playing with mod and sep and depend on GROUP.STAT to give me some idea of how groups are being populated. Sizing Dynamic Files Technote (FAQ) Problem Sometimes, administrators would like some ideas and insights on how to configure dynamic files to maximize file access speed and minimize the physical size. This article describes one process for making this determination. Solution To improve dynamic file performance an administrator can choose a new modulo and/or block size. Other important factors, however, are the percent standard deviation of the record size, the correct the hash type, and the split load percent. The first step is to generate file statistics using the ECL command FILE.STAT (in ECLTYPE U mode). The percent standard deviation can be obtained by the following formula: "Standard deviation from average" divided by "Average number of bytes in a record". Ideally, this percent would be zero - all records are exactly the same size. Having all records the same size makes calculations more accurate for our file sizing purposes. A standard deviation percent under 15-20% means the variation of record sizes is less than perfect but we can still predict well enough to be confident that the problem has a satisfactory solution. However, it is very common in the U2 world for a file design to have been left in service beyond what is reasonable for today's situations - i.e. what worked well in 1980 may not be a good solution for much larger files than was originally anticipated. So, if in the old days you had, say, 10 multivalues in most records and today you have between 20 and 3000, then it is easy to see how the percent standard deviation for record size can creep up over the years without being noticed. Anyway, we'll slog forward on the assumption that the standard deviation percent is "good". Final point: a high standard deviation percent for record size usually leads to wasted space, either in the form of sparsely populated primary groups and/or excessive overflow. A high standard deviation percent can create a situation where there is no "good" answer. An important factor in correct file sizing is to determine the better hashing algorithm - either type 0 or 1. It is useful to keep an open mind on this because hash type is another thing that can be set correctly and, over time, the format of the ids changes, and now the other hash type is better. First, you should always do ANALYZE.FILE filename and look at the "Keys" column. If you see consistency in the number as you look down the column, then the algorithm currently in place is likely correct. It can be variable enough to warrant further study. How you do this kind of analysis is to select a sample of 10,000 record ids from the file. Then create two dynamic files (one a type 0, the other a type 1) of blocksize 1024 and a modulo of 3. Then, CONFIGURE.FILE to set the MERGE.LOAD to 5 and the SPLIT.LOAD to 10. This configuration helps exagerate the results of the testing to make the decision a little easier. Then, populate each of the files using the sample list of ids and th e empty string for a record. Whichever file is the smaller is usually the better hash type. Determine Id Size and Record Size Get two numbers from the FILE.STAT report: "Average number of bytes in a record"(avg rec size), rounded up to the next whole number; and, "Average number of bytes in record ID"(id size), rounded up to the next whole number.
Re: [U2] Learning about file sizing
Unidata. I have gotten what I believe to be very helpful insight through this thread. Thanks! Sincerely, David Laansma IT Manager Hubbard Supply Co. Direct: 810-342-7143 Office: 810-234-8681 Fax: 810-234-6142 www.hubbardsupply.com "Delivering Products, Services and Innovative Solutions" -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists Sent: Wednesday, June 06, 2012 2:57 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] Learning about file sizing On 05/06/12 18:33, Dave Laansma wrote: > Can anyone point me to a good document that will give me guidelines > for 'proper' file sizing of dynamic files in particular? > Which database? Please note that as regards their underlying implementation UniVerse and UniData are *very* different. At user level they're very similar but when you're doing admin stuff like file-sizing I gather they are very different. I'm guessing you're talking UniData? (About which I know nothing. :-) > > > And when to use KEYONLY vs KEYDATA? > I thought you were talking about UniData :-) afaik these keywords will produce an error on UniVerse. So be careful that you're getting advice for the right product, and let others know what product you're talking about. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] Learning about file sizing
On 05/06/12 18:33, Dave Laansma wrote: > Can anyone point me to a good document that will give me guidelines for > 'proper' file sizing of dynamic files in particular? > Which database? Please note that as regards their underlying implementation UniVerse and UniData are *very* different. At user level they're very similar but when you're doing admin stuff like file-sizing I gather they are very different. I'm guessing you're talking UniData? (About which I know nothing. :-) > > > And when to use KEYONLY vs KEYDATA? > I thought you were talking about UniData :-) afaik these keywords will produce an error on UniVerse. So be careful that you're getting advice for the right product, and let others know what product you're talking about. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] Learning about file sizing
> From: Dave Laansma > Can anyone point me to a good document that will give me guidelines > for 'proper' file sizing of dynamic files in particular? There was an excellent article series on file sizing in Spectrum Magazine. I found part 6 in the Jan/Feb-2005 issue by searching http://intl-spectrum.com - it specifically discusses dynamic files. Part 1 was published in the Mar/Apr-2004 issue. UNIVERSE DYNAMICALLY HASHED FILES: TUNING PARAMETERS, PART 6 The concept that dynamic files are maintenance free is a misconception held by many database administrators and prompts the question, "If dynamic files don't need any maintenance, why are there so many tuning parameters?" This article discusses the tuning parameters for dynamically hashed UniVerse files. BY JEFF FITZGERALD AND PEGGY LONG HTH T ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] Learning about file sizing
Rod, Excellent post! I have a file I have been wanting to convert to dynamic. Since it not something I do every day I have been stalling for a while now... Thanks, Marc Rutherford Principal Programmer Analyst Advanced Bionics LLC 661) 362 1754 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Baakkonen, Rodney A (Rod) 46K Sent: Tuesday, June 05, 2012 10:53 AM To: 'U2 Users List' Subject: Re: [U2] Learning about file sizing Can't remember if this came from Wally or not a long time ago. But I use it to figure out Split/Merge. I have a development box that has a copy of production that I can play with. So I do a lot of playing with mod and sep and depend on GROUP.STAT to give me some idea of how groups are being populated. Sizing Dynamic Files Technote (FAQ) Problem Sometimes, administrators would like some ideas and insights on how to configure dynamic files to maximize file access speed and minimize the physical size. This article describes one process for making this determination. Solution To improve dynamic file performance an administrator can choose a new modulo and/or block size. Other important factors, however, are the percent standard deviation of the record size, the correct the hash type, and the split load percent. The first step is to generate file statistics using the ECL command FILE.STAT (in ECLTYPE U mode). The percent standard deviation can be obtained by the following formula: "Standard deviation from average" divided by "Average number of bytes in a record". Ideally, this percent would be zero - all records are exactly the same size. Having all records the same size makes calculations more accurate for our file sizing purposes. A standard deviation percent under 15-20% means the variation of record sizes is less than perfect but we can still predict well enough to be confident that the problem has a satisfactory solution. However, it is very common in the U2 world for a file design to have been left in service beyond what is reasonable for today's situations - i.e. what worked well in 1980 may not be a good solution for much larger files than was originally anticipated. So, if in the old days you had, say, 10 multivalues in most records and today you have between 20 and 3000, then it is easy to see how the percent standard deviation for record size can creep up over the years without being noticed. Anyway, we'll slog forward on the assumption that the standard deviation percent is "good". Final point: a high standard deviation percent for record size usually leads to wasted space, either in the form of sparsely populated primary groups and/or excessive overflow. A high standard deviation percent can create a situation where there is no "good" answer. An important factor in correct file sizing is to determine the better hashing algorithm - either type 0 or 1. It is useful to keep an open mind on this because hash type is another thing that can be set correctly and, over time, the format of the ids changes, and now the other hash type is better. First, you should always do ANALYZE.FILE filename and look at the "Keys" column. If you see consistency in the number as you look down the column, then the algorithm currently in place is likely correct. It can be variable enough to warrant further study. How you do this kind of analysis is to select a sample of 10,000 record ids from the file. Then create two dynamic files (one a type 0, the other a type 1) of blocksize 1024 and a modulo of 3. Then, CONFIGURE.FILE to set the MERGE.LOAD to 5 and the SPLIT.LOAD to 10. This configuration helps exagerate the results of the testing to make the decision a little easier. Then, populate each of the files using the sample list of ids and the empty string for a record. Whichever file is the smaller is usually the better hash type. Determine Id Size and Record Size Get two numbers from the FILE.STAT report: "Average number of bytes in a record"(avg rec size), rounded up to the next whole number; and, "Average number of bytes in record ID"(id size), rounded up to the next whole number. Follow these steps: 1.IDSIZE = id size from report above + 8 2.DATASIZE = avg rec size from report above - id size from report above 3.TOTAL = IDSIZE + DATASIZE Example: File name(Dynamic File) = DYN1 Number of groups in file (modulo) = 115 Dynamic hashing, hash type= 1 Split/Merge type = KEYONLY Block size= 1024 File has 5 groups in level one overflow. Number of records = 575 Total number of bytes = 25708 . . . Average number of bytes in a record = 100.7 Average number of bytes in record ID = 8.2 Standard deviation from average = 15.3
Re: [U2] Learning about file sizing
Can't remember if this came from Wally or not a long time ago. But I use it to figure out Split/Merge. I have a development box that has a copy of production that I can play with. So I do a lot of playing with mod and sep and depend on GROUP.STAT to give me some idea of how groups are being populated. Sizing Dynamic Files Technote (FAQ) Problem Sometimes, administrators would like some ideas and insights on how to configure dynamic files to maximize file access speed and minimize the physical size. This article describes one process for making this determination. Solution To improve dynamic file performance an administrator can choose a new modulo and/or block size. Other important factors, however, are the percent standard deviation of the record size, the correct the hash type, and the split load percent. The first step is to generate file statistics using the ECL command FILE.STAT (in ECLTYPE U mode). The percent standard deviation can be obtained by the following formula: "Standard deviation from average" divided by "Average number of bytes in a record". Ideally, this percent would be zero - all records are exactly the same size. Having all records the same size makes calculations more accurate for our file sizing purposes. A standard deviation percent under 15-20% means the variation of record sizes is less than perfect but we can still predict well enough to be confident that the problem has a satisfactory solution. However, it is very common in the U2 world for a file design to have been left in service beyond what is reasonable for today's situations - i.e. what worked well in 1980 may not be a good solution for much larger files than was originally anticipated. So, if in the old days you had, say, 10 multivalues in most records and today you have between 20 and 3000, then it is easy to see how the percent standard deviation for record size can creep up over the years without being noticed. Anyway, we'll slog forward on the assumption that the standard deviation percent is "good". Final point: a high standard deviation percent for record size usually leads to wasted space, either in the form of sparsely populated primary groups and/or excessive overflow. A high standard deviation percent can create a situation where there is no "good" answer. An important factor in correct file sizing is to determine the better hashing algorithm - either type 0 or 1. It is useful to keep an open mind on this because hash type is another thing that can be set correctly and, over time, the format of the ids changes, and now the other hash type is better. First, you should always do ANALYZE.FILE filename and look at the "Keys" column. If you see consistency in the number as you look down the column, then the algorithm currently in place is likely correct. It can be variable enough to warrant further study. How you do this kind of analysis is to select a sample of 10,000 record ids from the file. Then create two dynamic files (one a type 0, the other a type 1) of blocksize 1024 and a modulo of 3. Then, CONFIGURE.FILE to set the MERGE.LOAD to 5 and the SPLIT.LOAD to 10. This configuration helps exagerate the results of the testing to make the decision a little easier. Then, populate each of the files using the sample list of ids and the empty string for a record. Whichever file is the smaller is usually the better hash type. Determine Id Size and Record Size Get two numbers from the FILE.STAT report: "Average number of bytes in a record"(avg rec size), rounded up to the next whole number; and, "Average number of bytes in record ID"(id size), rounded up to the next whole number. Follow these steps: 1.IDSIZE = id size from report above + 8 2.DATASIZE = avg rec size from report above - id size from report above 3.TOTAL = IDSIZE + DATASIZE Example: File name(Dynamic File) = DYN1 Number of groups in file (modulo) = 115 Dynamic hashing, hash type= 1 Split/Merge type = KEYONLY Block size= 1024 File has 5 groups in level one overflow. Number of records = 575 Total number of bytes = 25708 . . . Average number of bytes in a record = 100.7 Average number of bytes in record ID = 8.2 Standard deviation from average = 15.3 Average number of bytes in a record = 100.7 -> 101 Average number of bytes in record ID = 8.2 -> 9 IDSIZE = 9 + 8 = 17 DATASIZE = 101 - 9 = 92 TOTAL = 17 + 92 = 109 Determine Blocksize and Modulo The first block in each group has 32 bytes of header information. So, for a 1024 byte block, 992 bytes are useable for keys and data. Of this, a minimum of roughly 10 percent (124 bytes in this case) is reserved for key information. Each key will use up 8 bytes of overhead plus the length of the key itself. This is represented by IDSIZE above. The data portion of the record(s) begins after the key ar