Re: [U2] RESIZE - dynamic files
You forgot the need to defragment, since someone suggested that my idea of using the intrinsic look-ahead ability is hampered by hard fragmentation. -Original Message- From: Rick Nuckolls To: 'U2 Users List' Sent: Fri, Jul 6, 2012 11:20 am Subject: Re: [U2] RESIZE - dynamic files Logically, the graphed solution to varying the split.load value with an -axis=modulus, y-axis=time_to_select_&_read_the_whole_file is going to be arabolic, having very slow performance at modulus=1 and modulus = # of records. If you actually want to find the precise low point, ignore all this bs, create a unch of files with copies of the same data, but different moduli, restart your ystem (including all disk drives & raid devices) in order to purge all buffers, nd then run the same program against each file. I think that we would all be urious about the results. Easier yet, just ignore the bs and use the defaults. :) -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin ent: Friday, July 06, 2012 9:56 AM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files o is there a performance increase in BASIC SELECTS by reducing overflow? Some eople are saying to reduce disk space to speed up the BASIC SELECT hile others say to reduce overflow.. I'm a bit confused. All of our programs hat read that table use a BASIC SELECT WITH.. for a BASIC select do you gain anything by reducing overflow? Chris To: u2-users@listserver.u2ug.org From: wjhon...@aol.com Date: Thu, 5 Jul 2012 20:12:21 -0400 Subject: Re: [U2] RESIZE - dynamic files A BASIC SELECT cannot use criteria at all. It is going to walk through every record in the file, in order. And that's the sticky wicket. That whole "in order" business. The disk drive controller has no clue on linked frames, but it *will* do ptimistic look aheads for you. So you are much better off, for BASIC SELECTs having nothing in overflow, at ll. :) That way, when you go to ask for the *next* frame, it will always be ontiguous, and already sitting in memory. -Original Message- From: Rick Nuckolls To: 'U2 Users List' Sent: Thu, Jul 5, 2012 4:43 pm Subject: Re: [U2] RESIZE - dynamic files Most disks and disk systems cache huge amounts of information these days, and, > epending on 20 factors or so, one solution will be better than another for a iven file. For the wholesale, SELECT F WITH, The fewest disk records will almost lways in. For files that have ~10 records/group and have ~10% of the groups verflowed, then perhaps 1% of record reads will do a second read for the verflow buffer because the target key was not in the primary group. Writing a > ew record would possibly hit the 10% mark for reading overflow buffers. But owering the split.load will increase the number of splits slightly, and ncrease the total number of groups considerably. What you have shown is that ou need to increase the the modulus (and select time) of a large file more han 0% in order to decrease the read and update times for you records 0.5% of the ime (assuming, that you have only reduced the number of overflow groups by 50%.) As Charles suggests, this is an interesting exercise, but your actual results ill rapidly change if you actually add /remove records from your file, change he load or number of files on your system, put in a new drive, cpu, memory oard, or install a new release of Universe, move to raid, etc. -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Wjhonson ent: Thursday, July 05, 2012 2:38 PM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files he hardward "look ahead" of the disk drive reader will grab consecutive frames" into memory, since it assumes you'll want the "next" frame next. o the less overflow you have, the faster a full file scan will become. t least that's my theory ;) Original Message----- rom: Rick Nuckolls o: 'U2 Users List' ent: Thu, Jul 5, 2012 2:29 pm ubject: Re: [U2] RESIZE - dynamic files hris, or the type of use that you described earlier; BASIC selects and reads, ducing overflow will have negligible performance benefit, especially compared changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in latively small percentages, then it will never merge anyway (because you will ed to delete 20-30% of the file for that to happen with the mergeload at 50%, your optimum minimum modulus solution will probably be "how ever large it ows" The overhead for a group split is not as bad as it sounds unless your dates/sec count is extremely high, such as during a copy. f you do regular SELECT and SCANS o
Re: [U2] RESIZE - dynamic files
Chris, 10 years ago, when I was administering a UniVerse system, the answer would have been "minimize both to the best of your ability". But I don't know how UniVerse has changed in the interim, during which time I have been working on UniData systems, which are enormously different in their handling of records in groups from any other Pick-type system I have ever worked on (all of which were much more similar to UniVerse at that time). And when last I administered a UniVerse system, there were no dynamic files.. With that caveat, here are the factors: 1) a record in a UniVerse file that is stored in overflow is going to take 2 or more disk reads to retrieve if you are retrieving it by id. However, in a Basic select (structured as in Will's example, with no quotes, no "WITH" criteria), the system will walk through the file group by group, and will read each record, so yes, it will take 2 (or more, depending on how deeply that group is in overflow) reads to get the data, but it will have done the first read anyway to read those records - so for the Basic SELECT, you probably want to minimize the number of groups read to the extent that you can do so without putting many of the groups into overflow. 2) to add records to the file, you have to access the file by the record id, which means hashing the id to the group, then walking through the group to see if the id is already in use, and if not, adding the record to the end of the data area in use. So for that, you absolutely want to minimize the amount of overflow, because overflow slows you down on the 'adds'. 3) any sort/select or query read of the database will be slowed down significantly by overflow, but you said you don't do much of that anyway. Susan M. Lynch F. W. Davison & Company, Inc. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 07/06/2012 12:56 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files So is there a performance increase in BASIC SELECTS by reducing overflow? Some people are saying to reduce disk space to speed up the BASIC SELECT while others say to reduce overflow.. I'm a bit confused. All of our programs that read that table use a BASIC SELECT WITH.. for a BASIC select do you gain anything by reducing overflow? Chris > To: u2-users@listserver.u2ug.org > From: wjhon...@aol.com > Date: Thu, 5 Jul 2012 20:12:21 -0400 > Subject: Re: [U2] RESIZE - dynamic files > > > A BASIC SELECT cannot use criteria at all. > It is going to walk through every record in the file, in order. > And that's the sticky wicket. That whole "in order" business. > The disk drive controller has no clue on linked frames, but it *will* do optimistic look aheads for you. > So you are much better off, for BASIC SELECTs having nothing in overflow, at all. :) > That way, when you go to ask for the *next* frame, it will always be contiguous, and already sitting in memory. > > > > > > > > > -----Original Message- > From: Rick Nuckolls > To: 'U2 Users List' > Sent: Thu, Jul 5, 2012 4:43 pm > Subject: Re: [U2] RESIZE - dynamic files > > > Most disks and disk systems cache huge amounts of information these days, and, > epending on 20 factors or so, one solution will be better than another for a > iven file. > For the wholesale, SELECT F WITH, The fewest disk records will almost always > in. For files that have ~10 records/group and have ~10% of the groups > verflowed, then perhaps 1% of record reads will do a second read for the > verflow buffer because the target key was not in the primary group. Writing a > ew record would possibly hit the 10% mark for reading overflow buffers. But > owering the split.load will increase the number of splits slightly, and > ncrease the total number of groups considerably. What you have shown is that > ou need to increase the the modulus (and select time) of a large file more than > 0% in order to decrease the read and update times for you records 0.5% of the > ime (assuming, that you have only reduced the number of overflow groups by > 50%.) > As Charles suggests, this is an interesting exercise, but your actual results > ill rapidly change if you actually add /remove records from your file, change > he load or number of files on your system, put in a new drive, cpu, memory > oard, or install a new release of Universe, move to raid, etc. > -Rick > -Original Message- > rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Wjhonson > ent: Thursday, July 05, 2012 2:38 PM > o: u2-users@listserver.u2ug.org > ubject: Re: [U2] RESIZE - dynamic files > > he hardward "look ahead" of the disk drive
Re: [U2] RESIZE - dynamic files
Logically, the graphed solution to varying the split.load value with an x-axis=modulus, y-axis=time_to_select_&_read_the_whole_file is going to be parabolic, having very slow performance at modulus=1 and modulus = # of records. If you actually want to find the precise low point, ignore all this bs, create a bunch of files with copies of the same data, but different moduli, restart your system (including all disk drives & raid devices) in order to purge all buffers, and then run the same program against each file. I think that we would all be curious about the results. Easier yet, just ignore the bs and use the defaults. :) -Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Friday, July 06, 2012 9:56 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files So is there a performance increase in BASIC SELECTS by reducing overflow? Some people are saying to reduce disk space to speed up the BASIC SELECT while others say to reduce overflow.. I'm a bit confused. All of our programs that read that table use a BASIC SELECT WITH.. for a BASIC select do you gain anything by reducing overflow? Chris > To: u2-users@listserver.u2ug.org > From: wjhon...@aol.com > Date: Thu, 5 Jul 2012 20:12:21 -0400 > Subject: Re: [U2] RESIZE - dynamic files > > > A BASIC SELECT cannot use criteria at all. > It is going to walk through every record in the file, in order. > And that's the sticky wicket. That whole "in order" business. > The disk drive controller has no clue on linked frames, but it *will* do > optimistic look aheads for you. > So you are much better off, for BASIC SELECTs having nothing in overflow, at > all. :) > That way, when you go to ask for the *next* frame, it will always be > contiguous, and already sitting in memory. > > > > > > > > > -Original Message----- > From: Rick Nuckolls > To: 'U2 Users List' > Sent: Thu, Jul 5, 2012 4:43 pm > Subject: Re: [U2] RESIZE - dynamic files > > > Most disks and disk systems cache huge amounts of information these days, > and, > epending on 20 factors or so, one solution will be better than another for a > iven file. > For the wholesale, SELECT F WITH, The fewest disk records will almost > always > in. For files that have ~10 records/group and have ~10% of the groups > verflowed, then perhaps 1% of record reads will do a second read for the > verflow buffer because the target key was not in the primary group. Writing > a > ew record would possibly hit the 10% mark for reading overflow buffers. But > owering the split.load will increase the number of splits slightly, and > ncrease the total number of groups considerably. What you have shown is that > ou need to increase the the modulus (and select time) of a large file more > than > 0% in order to decrease the read and update times for you records 0.5% of the > ime (assuming, that you have only reduced the number of overflow groups by > 50%.) > As Charles suggests, this is an interesting exercise, but your actual results > ill rapidly change if you actually add /remove records from your file, change > he load or number of files on your system, put in a new drive, cpu, memory > oard, or install a new release of Universe, move to raid, etc. > -Rick > -Original Message- > rom: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Wjhonson > ent: Thursday, July 05, 2012 2:38 PM > o: u2-users@listserver.u2ug.org > ubject: Re: [U2] RESIZE - dynamic files > > he hardward "look ahead" of the disk drive reader will grab consecutive > frames" into memory, since it assumes you'll want the "next" frame next. > o the less overflow you have, the faster a full file scan will become. > t least that's my theory ;) > > > Original Message- > rom: Rick Nuckolls > o: 'U2 Users List' > ent: Thu, Jul 5, 2012 2:29 pm > ubject: Re: [U2] RESIZE - dynamic files > > hris, > or the type of use that you described earlier; BASIC selects and reads, > ducing overflow will have negligible performance benefit, especially compared > changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in > latively small percentages, then it will never merge anyway (because you will > ed to delete 20-30% of the file for that to happen with the mergeload at 50%, > your optimum minimum modulus solution will probably be "how ever large it > ows" The overhead for a group split is not as bad as it sounds unless your > dates/sec count is extremely high, such as during
Re: [U2] RESIZE - dynamic files
What do you mean a BASIC SELECT WITH... If you mean you are EXECUTE "SELECT CUSTOMER WITH..." that is not a BASIC SELECT whose syntax is only OPEN "CUSTOMER" TO F.CUSTOMER SELECT F.CUSTOMER no WITH -Original Message- From: Chris Austin To: u2-users Sent: Fri, Jul 6, 2012 10:23 am Subject: Re: [U2] RESIZE - dynamic files o is there a performance increase in BASIC SELECTS by reducing overflow? Some eople are saying to reduce disk space to speed up the BASIC SELECT hile others say to reduce overflow.. I'm a bit confused. All of our programs hat read that table use a BASIC SELECT WITH.. for a BASIC select do you gain anything by reducing overflow? Chris To: u2-users@listserver.u2ug.org From: wjhon...@aol.com Date: Thu, 5 Jul 2012 20:12:21 -0400 Subject: Re: [U2] RESIZE - dynamic files A BASIC SELECT cannot use criteria at all. It is going to walk through every record in the file, in order. And that's the sticky wicket. That whole "in order" business. The disk drive controller has no clue on linked frames, but it *will* do ptimistic look aheads for you. So you are much better off, for BASIC SELECTs having nothing in overflow, at ll. :) That way, when you go to ask for the *next* frame, it will always be ontiguous, and already sitting in memory. -Original Message- From: Rick Nuckolls To: 'U2 Users List' Sent: Thu, Jul 5, 2012 4:43 pm Subject: Re: [U2] RESIZE - dynamic files Most disks and disk systems cache huge amounts of information these days, and, > epending on 20 factors or so, one solution will be better than another for a iven file. For the wholesale, SELECT F WITH, The fewest disk records will almost lways in. For files that have ~10 records/group and have ~10% of the groups verflowed, then perhaps 1% of record reads will do a second read for the verflow buffer because the target key was not in the primary group. Writing a > ew record would possibly hit the 10% mark for reading overflow buffers. But owering the split.load will increase the number of splits slightly, and ncrease the total number of groups considerably. What you have shown is that ou need to increase the the modulus (and select time) of a large file more han 0% in order to decrease the read and update times for you records 0.5% of the ime (assuming, that you have only reduced the number of overflow groups by 50%.) As Charles suggests, this is an interesting exercise, but your actual results ill rapidly change if you actually add /remove records from your file, change he load or number of files on your system, put in a new drive, cpu, memory oard, or install a new release of Universe, move to raid, etc. -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Wjhonson ent: Thursday, July 05, 2012 2:38 PM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files he hardward "look ahead" of the disk drive reader will grab consecutive frames" into memory, since it assumes you'll want the "next" frame next. o the less overflow you have, the faster a full file scan will become. t least that's my theory ;) Original Message----- rom: Rick Nuckolls o: 'U2 Users List' ent: Thu, Jul 5, 2012 2:29 pm ubject: Re: [U2] RESIZE - dynamic files hris, or the type of use that you described earlier; BASIC selects and reads, ducing overflow will have negligible performance benefit, especially compared changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in latively small percentages, then it will never merge anyway (because you will ed to delete 20-30% of the file for that to happen with the mergeload at 50%, your optimum minimum modulus solution will probably be "how ever large it ows" The overhead for a group split is not as bad as it sounds unless your dates/sec count is extremely high, such as during a copy. f you do regular SELECT and SCANS of the entire file, then your goal should be > reduce the total disk size of the file, and not worry much about common erflow. The important thing is that the file is dynamic, so you will never counter the issues that undersized statically hashed files develop. e have thousands of dynamically hashed files on our (Solaris) systems, with an > tremely low problem rate. ick Original Message- om: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Chris Austin nt: Thursday, July 05, 2012 11:21 AM : u2-users@listserver.u2ug.org bject: Re: [U2] RESIZE - dynamic files ick, ou are correct, I should be using the smaller size (I just haven't changed it t). Based on the reading I have done you should ly use the larger group size when the average reco
Re: [U2] RESIZE - dynamic files
So is there a performance increase in BASIC SELECTS by reducing overflow? Some people are saying to reduce disk space to speed up the BASIC SELECT while others say to reduce overflow.. I'm a bit confused. All of our programs that read that table use a BASIC SELECT WITH.. for a BASIC select do you gain anything by reducing overflow? Chris > To: u2-users@listserver.u2ug.org > From: wjhon...@aol.com > Date: Thu, 5 Jul 2012 20:12:21 -0400 > Subject: Re: [U2] RESIZE - dynamic files > > > A BASIC SELECT cannot use criteria at all. > It is going to walk through every record in the file, in order. > And that's the sticky wicket. That whole "in order" business. > The disk drive controller has no clue on linked frames, but it *will* do > optimistic look aheads for you. > So you are much better off, for BASIC SELECTs having nothing in overflow, at > all. :) > That way, when you go to ask for the *next* frame, it will always be > contiguous, and already sitting in memory. > > > > > > > > > -Original Message- > From: Rick Nuckolls > To: 'U2 Users List' > Sent: Thu, Jul 5, 2012 4:43 pm > Subject: Re: [U2] RESIZE - dynamic files > > > Most disks and disk systems cache huge amounts of information these days, > and, > epending on 20 factors or so, one solution will be better than another for a > iven file. > For the wholesale, SELECT F WITH, The fewest disk records will almost > always > in. For files that have ~10 records/group and have ~10% of the groups > verflowed, then perhaps 1% of record reads will do a second read for the > verflow buffer because the target key was not in the primary group. Writing > a > ew record would possibly hit the 10% mark for reading overflow buffers. But > owering the split.load will increase the number of splits slightly, and > ncrease the total number of groups considerably. What you have shown is that > ou need to increase the the modulus (and select time) of a large file more > than > 0% in order to decrease the read and update times for you records 0.5% of the > ime (assuming, that you have only reduced the number of overflow groups by > 50%.) > As Charles suggests, this is an interesting exercise, but your actual results > ill rapidly change if you actually add /remove records from your file, change > he load or number of files on your system, put in a new drive, cpu, memory > oard, or install a new release of Universe, move to raid, etc. > -Rick > -Original Message- > rom: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Wjhonson > ent: Thursday, July 05, 2012 2:38 PM > o: u2-users@listserver.u2ug.org > ubject: Re: [U2] RESIZE - dynamic files > > he hardward "look ahead" of the disk drive reader will grab consecutive > frames" into memory, since it assumes you'll want the "next" frame next. > o the less overflow you have, the faster a full file scan will become. > t least that's my theory ;) > > > Original Message- > rom: Rick Nuckolls > o: 'U2 Users List' > ent: Thu, Jul 5, 2012 2:29 pm > ubject: Re: [U2] RESIZE - dynamic files > > hris, > or the type of use that you described earlier; BASIC selects and reads, > ducing overflow will have negligible performance benefit, especially compared > changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in > latively small percentages, then it will never merge anyway (because you will > ed to delete 20-30% of the file for that to happen with the mergeload at 50%, > your optimum minimum modulus solution will probably be "how ever large it > ows" The overhead for a group split is not as bad as it sounds unless your > dates/sec count is extremely high, such as during a copy. > f you do regular SELECT and SCANS of the entire file, then your goal should > be > reduce the total disk size of the file, and not worry much about common > erflow. The important thing is that the file is dynamic, so you will never > counter the issues that undersized statically hashed files develop. > e have thousands of dynamically hashed files on our (Solaris) systems, with > an > tremely low problem rate. > ick > Original Message- > om: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Chris Austin > nt: Thursday, July 05, 2012 11:21 AM > : u2-users@listserver.u2ug.org > bject: Re: [U2] RESIZE - dynamic files > ick, > ou are correct, I should be using the smaller size (I just haven't changed it > t). Based on the reading I have done you should > ly use the
Re: [U2] RESIZE - dynamic files
On 05/07/12 23:58, Rick Nuckolls wrote: > Oops, I would of thought that if a file had, say 100,000 bytes, @ 70 percent > full, there would be 30,000 bytes "empty" or dead. Are you suggesting the > there would be 70,000 bytes of data and 42,000 bytes of dead space? Do you mean 100,000 bytes of disk space, or 100,000 bytes of data? I guess you are thinking that the file occupies, on disk, 100K. In which case it will have 70K of data and 30K of empty space. But if you are thinking that the file contains 100K of data, it will actually occupy 142K of disk space. So this particular option "wastes" 30% of the disk space it uses, but uses 42% extra space to what it optimally needed assuming perfect packing. I know, it's fun to try get your head round it :-) Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
The best thing I can say about the MINIMUM.MODULUS is that if you set it close to what you expect the file to need (at least for a while), when you start populating it from scratch, you will not lose the performance as the file keeps splitting. This can make an amazing difference in how long it take to initially populate the file. John -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Charles Stevenson Sent: Thursday, July 05, 2012 5:41 PM To: U2 Users List Subject: Re: [U2] RESIZE - dynamic files Chris, I can appreciate what you are doing as an academic exercise. You seem happy how it looks at this moment, where, because you set "MINIMUM.MODULUS 118681", you ended up with a current load of 63%. But think about it: as you add records, the load will reach 70%, per "SPLIT.LOAD 70", then splits will keep occuring and current modlus with grow past 118681. MINIMUM.MODULUS will never matter again. (This was described as an ever-growing file.) If the current config is what you want, why not just set "SPLIT.LOAD 63 MINIMUM.MODULUS 1". That way the ratio that you like today will stay like this forever. MINIMUM.MODULUS will not matter unless data is deleted. It says to not shrink the file structure below that minimally allocated disk space, even if there is no data to occupy it. That's really all MINIMUM.MODULUS is for. Play with it all you want, because it puts you in a good place when some crisis happens. At the end of the day, with this file, you'll find your tuning won't matter much. Not a lot of help, but not much harm if you tweak it wrong, either. On 7/5/2012 1:20 PM, Chris Austin wrote: > Rick, > > You are correct, I should be using the smaller size (I just haven't > changed it yet). Based on the reading I have done you should only use the > larger group size when the average record size is greater than 1000 bytes. > > As far as being better off with the defaults that's basically what I'm > trying to test (as well as learn how linear hashing works). I was able > to reduce my overflow by 18% and I only increased my empty groups by a very > small amount as well as only increased my file size by 8%. This in theory > should be better for reads/writes than what I had before. > > To test the performance I need to write a ton of records and then capture the > output and compare the output using timestamps. > > Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
That's what we use, 'BASIC SELECT' statements for this table, looping through records to build reports. It's an accounting table that has about 200-300 records WRITES a day, with an average of about ~250 bytes per record. We obviously have more READ operations since we are always building up these reports so I was hoping my #'s looked right. 1) I reduced overflow by 18%. 2) I only increased file size ~8%. So we do a combination of BASIC SELECTS and WRITES. Everything is done in the latest version of Rocket's Universe, PICK using BASIC for our programs that contain the SELECTS. Chris > To: u2-users@listserver.u2ug.org > From: wjhon...@aol.com > Date: Thu, 5 Jul 2012 20:12:21 -0400 > Subject: Re: [U2] RESIZE - dynamic files > > > A BASIC SELECT cannot use criteria at all. > It is going to walk through every record in the file, in order. > And that's the sticky wicket. That whole "in order" business. > The disk drive controller has no clue on linked frames, but it *will* do > optimistic look aheads for you. > So you are much better off, for BASIC SELECTs having nothing in overflow, at > all. :) > That way, when you go to ask for the *next* frame, it will always be > contiguous, and already sitting in memory. > > > > > > > > > -Original Message- > From: Rick Nuckolls > To: 'U2 Users List' > Sent: Thu, Jul 5, 2012 4:43 pm > Subject: Re: [U2] RESIZE - dynamic files > > > Most disks and disk systems cache huge amounts of information these days, > and, > epending on 20 factors or so, one solution will be better than another for a > iven file. > For the wholesale, SELECT F WITH, The fewest disk records will almost > always > in. For files that have ~10 records/group and have ~10% of the groups > verflowed, then perhaps 1% of record reads will do a second read for the > verflow buffer because the target key was not in the primary group. Writing > a > ew record would possibly hit the 10% mark for reading overflow buffers. But > owering the split.load will increase the number of splits slightly, and > ncrease the total number of groups considerably. What you have shown is that > ou need to increase the the modulus (and select time) of a large file more > than > 0% in order to decrease the read and update times for you records 0.5% of the > ime (assuming, that you have only reduced the number of overflow groups by > 50%.) > As Charles suggests, this is an interesting exercise, but your actual results > ill rapidly change if you actually add /remove records from your file, change > he load or number of files on your system, put in a new drive, cpu, memory > oard, or install a new release of Universe, move to raid, etc. > -Rick > -Original Message- > rom: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Wjhonson > ent: Thursday, July 05, 2012 2:38 PM > o: u2-users@listserver.u2ug.org > ubject: Re: [U2] RESIZE - dynamic files > > he hardward "look ahead" of the disk drive reader will grab consecutive > frames" into memory, since it assumes you'll want the "next" frame next. > o the less overflow you have, the faster a full file scan will become. > t least that's my theory ;) > > > Original Message- > rom: Rick Nuckolls > o: 'U2 Users List' > ent: Thu, Jul 5, 2012 2:29 pm > ubject: Re: [U2] RESIZE - dynamic files > > hris, > or the type of use that you described earlier; BASIC selects and reads, > ducing overflow will have negligible performance benefit, especially compared > changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in > latively small percentages, then it will never merge anyway (because you will > ed to delete 20-30% of the file for that to happen with the mergeload at 50%, > your optimum minimum modulus solution will probably be "how ever large it > ows" The overhead for a group split is not as bad as it sounds unless your > dates/sec count is extremely high, such as during a copy. > f you do regular SELECT and SCANS of the entire file, then your goal should > be > reduce the total disk size of the file, and not worry much about common > erflow. The important thing is that the file is dynamic, so you will never > counter the issues that undersized statically hashed files develop. > e have thousands of dynamically hashed files on our (Solaris) systems, with > an > tremely low problem rate. > ick > Original Message- > om: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] > n Behalf Of Chris Austin > nt: Th
Re: [U2] RESIZE - dynamic files
This will be mostly true if the full extent of the file was allocated at one time as a contiguous block, which could be a big plus. As a file grows, sectors will be allocated piecemeal and when the hardware reads ahead, it will not necessarily be reading sectors in the same file. Curiously, an old Pr1me CAM file had a trick around this, though it was late coming onto the scene. Unix also has a few tricks, but they are only partial solutions to file fragmentation. And Windows Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Thursday, July 05, 2012 5:12 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files A BASIC SELECT cannot use criteria at all. It is going to walk through every record in the file, in order. And that's the sticky wicket. That whole "in order" business. The disk drive controller has no clue on linked frames, but it *will* do optimistic look aheads for you. So you are much better off, for BASIC SELECTs having nothing in overflow, at all. :) That way, when you go to ask for the *next* frame, it will always be contiguous, and already sitting in memory. -Original Message- From: Rick Nuckolls To: 'U2 Users List' Sent: Thu, Jul 5, 2012 4:43 pm Subject: Re: [U2] RESIZE - dynamic files Most disks and disk systems cache huge amounts of information these days, and, epending on 20 factors or so, one solution will be better than another for a iven file. For the wholesale, SELECT F WITH, The fewest disk records will almost always in. For files that have ~10 records/group and have ~10% of the groups verflowed, then perhaps 1% of record reads will do a second read for the verflow buffer because the target key was not in the primary group. Writing a ew record would possibly hit the 10% mark for reading overflow buffers. But owering the split.load will increase the number of splits slightly, and ncrease the total number of groups considerably. What you have shown is that ou need to increase the the modulus (and select time) of a large file more than 0% in order to decrease the read and update times for you records 0.5% of the ime (assuming, that you have only reduced the number of overflow groups by 50%.) As Charles suggests, this is an interesting exercise, but your actual results ill rapidly change if you actually add /remove records from your file, change he load or number of files on your system, put in a new drive, cpu, memory oard, or install a new release of Universe, move to raid, etc. -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Wjhonson ent: Thursday, July 05, 2012 2:38 PM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files he hardward "look ahead" of the disk drive reader will grab consecutive frames" into memory, since it assumes you'll want the "next" frame next. o the less overflow you have, the faster a full file scan will become. t least that's my theory ;) Original Message- rom: Rick Nuckolls o: 'U2 Users List' ent: Thu, Jul 5, 2012 2:29 pm ubject: Re: [U2] RESIZE - dynamic files hris, or the type of use that you described earlier; BASIC selects and reads, ducing overflow will have negligible performance benefit, especially compared changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in latively small percentages, then it will never merge anyway (because you will ed to delete 20-30% of the file for that to happen with the mergeload at 50%, your optimum minimum modulus solution will probably be "how ever large it ows" The overhead for a group split is not as bad as it sounds unless your dates/sec count is extremely high, such as during a copy. f you do regular SELECT and SCANS of the entire file, then your goal should be reduce the total disk size of the file, and not worry much about common erflow. The important thing is that the file is dynamic, so you will never counter the issues that undersized statically hashed files develop. e have thousands of dynamically hashed files on our (Solaris) systems, with an tremely low problem rate. ick Original Message- om: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin nt: Thursday, July 05, 2012 11:21 AM : u2-users@listserver.u2ug.org bject: Re: [U2] RESIZE - dynamic files ick, ou are correct, I should be using the smaller size (I just haven't changed it t). Based on the reading I have done you should ly use the larger group size when the average record size is greater than 1000 tes. s far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able reduce my overflow by 1
Re: [U2] RESIZE - dynamic files
A BASIC SELECT cannot use criteria at all. It is going to walk through every record in the file, in order. And that's the sticky wicket. That whole "in order" business. The disk drive controller has no clue on linked frames, but it *will* do optimistic look aheads for you. So you are much better off, for BASIC SELECTs having nothing in overflow, at all. :) That way, when you go to ask for the *next* frame, it will always be contiguous, and already sitting in memory. -Original Message- From: Rick Nuckolls To: 'U2 Users List' Sent: Thu, Jul 5, 2012 4:43 pm Subject: Re: [U2] RESIZE - dynamic files Most disks and disk systems cache huge amounts of information these days, and, epending on 20 factors or so, one solution will be better than another for a iven file. For the wholesale, SELECT F WITH, The fewest disk records will almost always in. For files that have ~10 records/group and have ~10% of the groups verflowed, then perhaps 1% of record reads will do a second read for the verflow buffer because the target key was not in the primary group. Writing a ew record would possibly hit the 10% mark for reading overflow buffers. But owering the split.load will increase the number of splits slightly, and ncrease the total number of groups considerably. What you have shown is that ou need to increase the the modulus (and select time) of a large file more than 0% in order to decrease the read and update times for you records 0.5% of the ime (assuming, that you have only reduced the number of overflow groups by 50%.) As Charles suggests, this is an interesting exercise, but your actual results ill rapidly change if you actually add /remove records from your file, change he load or number of files on your system, put in a new drive, cpu, memory oard, or install a new release of Universe, move to raid, etc. -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Wjhonson ent: Thursday, July 05, 2012 2:38 PM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files he hardward "look ahead" of the disk drive reader will grab consecutive frames" into memory, since it assumes you'll want the "next" frame next. o the less overflow you have, the faster a full file scan will become. t least that's my theory ;) Original Message- rom: Rick Nuckolls o: 'U2 Users List' ent: Thu, Jul 5, 2012 2:29 pm ubject: Re: [U2] RESIZE - dynamic files hris, or the type of use that you described earlier; BASIC selects and reads, ducing overflow will have negligible performance benefit, especially compared changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in latively small percentages, then it will never merge anyway (because you will ed to delete 20-30% of the file for that to happen with the mergeload at 50%, your optimum minimum modulus solution will probably be "how ever large it ows" The overhead for a group split is not as bad as it sounds unless your dates/sec count is extremely high, such as during a copy. f you do regular SELECT and SCANS of the entire file, then your goal should be reduce the total disk size of the file, and not worry much about common erflow. The important thing is that the file is dynamic, so you will never counter the issues that undersized statically hashed files develop. e have thousands of dynamically hashed files on our (Solaris) systems, with an tremely low problem rate. ick Original Message- om: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin nt: Thursday, July 05, 2012 11:21 AM : u2-users@listserver.u2ug.org bject: Re: [U2] RESIZE - dynamic files ick, ou are correct, I should be using the smaller size (I just haven't changed it t). Based on the reading I have done you should ly use the larger group size when the average record size is greater than 1000 tes. s far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able reduce my overflow by 18% and I only increased my empty groups by a very all amount as well as only increased my file size 8%. This in theory should be better for reads/writes than what I had before. o test the performance I need to write a ton of records and then capture the tput and compare the output using timestamps. hris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group ze. I think that Martin, and the UV documentation is correct in this case; you uld be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, "Martin Phillips" ote: coming > Hi, >
Re: [U2] RESIZE - dynamic files
Most disks and disk systems cache huge amounts of information these days, and, depending on 20 factors or so, one solution will be better than another for a given file. For the wholesale, SELECT F WITH, The fewest disk records will almost always win. For files that have ~10 records/group and have ~10% of the groups overflowed, then perhaps 1% of record reads will do a second read for the overflow buffer because the target key was not in the primary group. Writing a new record would possibly hit the 10% mark for reading overflow buffers. But lowering the split.load will increase the number of splits slightly, and increase the total number of groups considerably. What you have shown is that you need to increase the the modulus (and select time) of a large file more than 10% in order to decrease the read and update times for you records 0.5% of the time (assuming, that you have only reduced the number of overflow groups by ~50%.) As Charles suggests, this is an interesting exercise, but your actual results will rapidly change if you actually add /remove records from your file, change the load or number of files on your system, put in a new drive, cpu, memory board, or install a new release of Universe, move to raid, etc. -Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Thursday, July 05, 2012 2:38 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files The hardward "look ahead" of the disk drive reader will grab consecutive "frames" into memory, since it assumes you'll want the "next" frame next. So the less overflow you have, the faster a full file scan will become. At least that's my theory ;) -Original Message- From: Rick Nuckolls To: 'U2 Users List' Sent: Thu, Jul 5, 2012 2:29 pm Subject: Re: [U2] RESIZE - dynamic files Chris, For the type of use that you described earlier; BASIC selects and reads, educing overflow will have negligible performance benefit, especially compared o changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in elatively small percentages, then it will never merge anyway (because you will eed to delete 20-30% of the file for that to happen with the mergeload at 50%, o your optimum minimum modulus solution will probably be "how ever large it rows" The overhead for a group split is not as bad as it sounds unless your pdates/sec count is extremely high, such as during a copy. If you do regular SELECT and SCANS of the entire file, then your goal should be o reduce the total disk size of the file, and not worry much about common verflow. The important thing is that the file is dynamic, so you will never ncounter the issues that undersized statically hashed files develop. We have thousands of dynamically hashed files on our (Solaris) systems, with an xtremely low problem rate. Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin ent: Thursday, July 05, 2012 11:21 AM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files ick, You are correct, I should be using the smaller size (I just haven't changed it et). Based on the reading I have done you should nly use the larger group size when the average record size is greater than 1000 ytes. As far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able o reduce my overflow by 18% and I only increased my empty groups by a very mall amount as well as only increased my file size y 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the utput and compare the output using timestamps. Chris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group ize. I think that Martin, and the UV documentation is correct in this case; you ould be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, "Martin Phillips" rote: coming > Hi, > > The various suggestions about setting the minimum modulus to reduce overflow re all very well but effectively you are turning a > dynamic file into a static one, complete with all the continual maintenance ork needed to keep the parameters in step with the > data. > > In most cases, the only parameter that is worth tuning is the group size to ry to pack things nicely. Even this is often fine left > alone though getting it to match the underlying o/s page size is helpful. > > I missed the start of this thread but
Re: [U2] RESIZE - dynamic files
Oops, I would of thought that if a file had, say 100,000 bytes, @ 70 percent full, there would be 30,000 bytes "empty" or dead. Are you suggesting the there would be 70,000 bytes of data and 42,000 bytes of dead space? -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists Sent: Thursday, July 05, 2012 3:24 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files On 05/07/12 16:12, Martin Phillips wrote: > A file without overflow is not necessarily the best solution. Winding the > split load down to 70% means that at least 30% of the file > is dead space. The implication of this is that the file is larger and will > take more disk reads to process sequentially from one end > to the other. Whoops Martin, I think you've made the classic percentages mistake here ... The file is 30/70, or 42% dead space at least. A file with the default 80% split is at least 25% dead space. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
On 05/07/12 14:49, Chris Austin wrote: > > Disk space is not a factor, as we are a smaller shop and disk space comes > cheap. However, one thing I did notice is when I increased the modulus to a > very large > number which then increased my disk space to about 3-4x of my record data, my > SELECT queries were slower. > > Are the 2 factors when choosing HOW the file is used based on whether your > using? > > 1) a lot of SELECTS (then looping through the records) Is that a BASIC select, or a RETRIEVE select? > 2) grabbing individual records (not using a SELECT) > > With this file we really do a lot of SELECTS (option 1), then loop through > the records. With that being said and based on the reading I've done here it > would appear it's better to have a little overflow > and not use up so much disk space for modulus (groups) for this application > since we do use a lot of SELECT queries. Is this correct? If your selects are BASIC selects, then you won't notice too much difference. If they are RETRIEVE selects, then reducing SPLIT will increase the cost of the SELECT. In both cases, if the RETRIEVE select is not BY, then the cost of processing the list should not be seriously impacted. (On a SELECT WITH index, however, reducing overflow will speed things up a bit, probably not an awful lot.) > > Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 > bytes'. > > It would seem to me that I would want to REDUCE my split to ~70% to reduce > overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger > than my current modulus (~10% bigger) since this > will be a growing file and will never merge. In my case using the formula > might not make sense since this file will never merge. Does this make sense? > If the file will only ever grow, then MINIMUM.MODULUS is probably a waste of time. You are best using that in one of two circumstances, either (a) you are populating a file with a large number of initial records and you are forcing the modulus to what it's likely to end up anyway, or (b) your file grows and shrinks violently in size, and you are forcing it to its typical state. The first scenario simply avoids a bunch of inevitable splits, the second avoids a yoyo split/merge/split scenario. I'd just leave the settings at 80/20, and only use MINIMUM.MODULUS if I was creating a copy of the file (setting the new minimum at the current modulo of the existing file). Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
On 05/07/12 16:12, Martin Phillips wrote: > A file without overflow is not necessarily the best solution. Winding the > split load down to 70% means that at least 30% of the file > is dead space. The implication of this is that the file is larger and will > take more disk reads to process sequentially from one end > to the other. Whoops Martin, I think you've made the classic percentages mistake here ... The file is 30/70, or 42% dead space at least. A file with the default 80% split is at least 25% dead space. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Chris, I can appreciate what you are doing as an academic exercise. You seem happy how it looks at this moment, where, because you set "MINIMUM.MODULUS 118681", you ended up with a current load of 63%. But think about it: as you add records, the load will reach 70%, per "SPLIT.LOAD 70", then splits will keep occuring and current modlus with grow past 118681. MINIMUM.MODULUS will never matter again. (This was described as an ever-growing file.) If the current config is what you want, why not just set "SPLIT.LOAD 63 MINIMUM.MODULUS 1". That way the ratio that you like today will stay like this forever. MINIMUM.MODULUS will not matter unless data is deleted. It says to not shrink the file structure below that minimally allocated disk space, even if there is no data to occupy it. That's really all MINIMUM.MODULUS is for. Play with it all you want, because it puts you in a good place when some crisis happens. At the end of the day, with this file, you'll find your tuning won't matter much. Not a lot of help, but not much harm if you tweak it wrong, either. On 7/5/2012 1:20 PM, Chris Austin wrote: Rick, You are correct, I should be using the smaller size (I just haven't changed it yet). Based on the reading I have done you should only use the larger group size when the average record size is greater than 1000 bytes. As far as being better off with the defaults that's basically what I'm trying to test (as well as learn how linear hashing works). I was able to reduce my overflow by 18% and I only increased my empty groups by a very small amount as well as only increased my file size by 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the output and compare the output using timestamps. Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
The hardward "look ahead" of the disk drive reader will grab consecutive "frames" into memory, since it assumes you'll want the "next" frame next. So the less overflow you have, the faster a full file scan will become. At least that's my theory ;) -Original Message- From: Rick Nuckolls To: 'U2 Users List' Sent: Thu, Jul 5, 2012 2:29 pm Subject: Re: [U2] RESIZE - dynamic files Chris, For the type of use that you described earlier; BASIC selects and reads, educing overflow will have negligible performance benefit, especially compared o changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in elatively small percentages, then it will never merge anyway (because you will eed to delete 20-30% of the file for that to happen with the mergeload at 50%, o your optimum minimum modulus solution will probably be "how ever large it rows" The overhead for a group split is not as bad as it sounds unless your pdates/sec count is extremely high, such as during a copy. If you do regular SELECT and SCANS of the entire file, then your goal should be o reduce the total disk size of the file, and not worry much about common verflow. The important thing is that the file is dynamic, so you will never ncounter the issues that undersized statically hashed files develop. We have thousands of dynamically hashed files on our (Solaris) systems, with an xtremely low problem rate. Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin ent: Thursday, July 05, 2012 11:21 AM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files ick, You are correct, I should be using the smaller size (I just haven't changed it et). Based on the reading I have done you should nly use the larger group size when the average record size is greater than 1000 ytes. As far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able o reduce my overflow by 18% and I only increased my empty groups by a very mall amount as well as only increased my file size y 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the utput and compare the output using timestamps. Chris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group ize. I think that Martin, and the UV documentation is correct in this case; you ould be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, "Martin Phillips" rote: coming > Hi, > > The various suggestions about setting the minimum modulus to reduce overflow re all very well but effectively you are turning a > dynamic file into a static one, complete with all the continual maintenance ork needed to keep the parameters in step with the > data. > > In most cases, the only parameter that is worth tuning is the group size to ry to pack things nicely. Even this is often fine left > alone though getting it to match the underlying o/s page size is helpful. > > I missed the start of this thread but, unless you have a performance problem r are seriously short of space, my recommendation > would be to leave the dynamic files to look after themselves. > > A file without overflow is not necessarily the best solution. Winding the plit load down to 70% means that at least 30% of the file > is dead space. The implication of this is that the file is larger and will ake more disk reads to process sequentially from one end > to the other. > > > Martin Phillips > Ladybridge Systems Ltd > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England > +44 (0)1604-709200 > > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin > Sent: 05 July 2012 15:19 > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > I was able to drop from 30% overflow to 12% by making 2 changes: > > 1) changed the split from 80% to 70% (that alone reduce 10% overflow) > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record ata + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) > > My disk size only went up 8%.. > > My file looks like this now: > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32B
Re: [U2] RESIZE - dynamic files
Chris, For the type of use that you described earlier; BASIC selects and reads, reducing overflow will have negligible performance benefit, especially compared to changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in relatively small percentages, then it will never merge anyway (because you will need to delete 20-30% of the file for that to happen with the mergeload at 50%, so your optimum minimum modulus solution will probably be "how ever large it grows" The overhead for a group split is not as bad as it sounds unless your updates/sec count is extremely high, such as during a copy. If you do regular SELECT and SCANS of the entire file, then your goal should be to reduce the total disk size of the file, and not worry much about common overflow. The important thing is that the file is dynamic, so you will never encounter the issues that undersized statically hashed files develop. We have thousands of dynamically hashed files on our (Solaris) systems, with an extremely low problem rate. Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Thursday, July 05, 2012 11:21 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files Rick, You are correct, I should be using the smaller size (I just haven't changed it yet). Based on the reading I have done you should only use the larger group size when the average record size is greater than 1000 bytes. As far as being better off with the defaults that's basically what I'm trying to test (as well as learn how linear hashing works). I was able to reduce my overflow by 18% and I only increased my empty groups by a very small amount as well as only increased my file size by 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the output and compare the output using timestamps. Chris > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Thu, 5 Jul 2012 09:22:02 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > Chis, > > I still am wondering what is prompting you to continue using the larger group > size. > > I think that Martin, and the UV documentation is correct in this case; you > would be as well or better off with the defaults. > > -Rick > > On Jul 5, 2012, at 9:13 AM, "Martin Phillips" > wrote: > coming > > Hi, > > > > The various suggestions about setting the minimum modulus to reduce > > overflow are all very well but effectively you are turning a > > dynamic file into a static one, complete with all the continual maintenance > > work needed to keep the parameters in step with the > > data. > > > > In most cases, the only parameter that is worth tuning is the group size to > > try to pack things nicely. Even this is often fine left > > alone though getting it to match the underlying o/s page size is helpful. > > > > I missed the start of this thread but, unless you have a performance > > problem or are seriously short of space, my recommendation > > would be to leave the dynamic files to look after themselves. > > > > A file without overflow is not necessarily the best solution. Winding the > > split load down to 70% means that at least 30% of the file > > is dead space. The implication of this is that the file is larger and will > > take more disk reads to process sequentially from one end > > to the other. > > > > > > Martin Phillips > > Ladybridge Systems Ltd > > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England > > +44 (0)1604-709200 > > > > > > > > -Original Message- > > From: u2-users-boun...@listserver.u2ug.org > > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > > Sent: 05 July 2012 15:19 > > To: u2-users@listserver.u2ug.org > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > I was able to drop from 30% overflow to 12% by making 2 changes: > > > > 1) changed the split from 80% to 70% (that alone reduce 10% overflow) > > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record > > data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) > > > > My disk size only went up 8%.. > > > > My file looks like this now: > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > N
Re: [U2] RESIZE - dynamic files
Rick, You are correct, I should be using the smaller size (I just haven't changed it yet). Based on the reading I have done you should only use the larger group size when the average record size is greater than 1000 bytes. As far as being better off with the defaults that's basically what I'm trying to test (as well as learn how linear hashing works). I was able to reduce my overflow by 18% and I only increased my empty groups by a very small amount as well as only increased my file size by 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the output and compare the output using timestamps. Chris > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Thu, 5 Jul 2012 09:22:02 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > Chis, > > I still am wondering what is prompting you to continue using the larger group > size. > > I think that Martin, and the UV documentation is correct in this case; you > would be as well or better off with the defaults. > > -Rick > > On Jul 5, 2012, at 9:13 AM, "Martin Phillips" > wrote: > coming > > Hi, > > > > The various suggestions about setting the minimum modulus to reduce > > overflow are all very well but effectively you are turning a > > dynamic file into a static one, complete with all the continual maintenance > > work needed to keep the parameters in step with the > > data. > > > > In most cases, the only parameter that is worth tuning is the group size to > > try to pack things nicely. Even this is often fine left > > alone though getting it to match the underlying o/s page size is helpful. > > > > I missed the start of this thread but, unless you have a performance > > problem or are seriously short of space, my recommendation > > would be to leave the dynamic files to look after themselves. > > > > A file without overflow is not necessarily the best solution. Winding the > > split load down to 70% means that at least 30% of the file > > is dead space. The implication of this is that the file is larger and will > > take more disk reads to process sequentially from one end > > to the other. > > > > > > Martin Phillips > > Ladybridge Systems Ltd > > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England > > +44 (0)1604-709200 > > > > > > > > -Original Message- > > From: u2-users-boun...@listserver.u2ug.org > > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > > Sent: 05 July 2012 15:19 > > To: u2-users@listserver.u2ug.org > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > I was able to drop from 30% overflow to 12% by making 2 changes: > > > > 1) changed the split from 80% to 70% (that alone reduce 10% overflow) > > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record > > data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) > > > > My disk size only went up 8%.. > > > > My file looks like this now: > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, > >14431 overflowed, 778 badly ) > > Number of records .. 1292377 > > Large record size .. 3267 bytes > > Number of large records 180 > > Group size . 4096 bytes > > Load factors ... 70% (split), 50% (merge) and 63% (actual) > > Total size . 546869248 bytes > > Total size of record data .. 287789178 bytes > > Total size of record IDs ... 21539538 bytes > > Unused space ... 237532340 bytes > > Total space for records 546861056 bytes > > > > Chris > > > > > > > >> From: keith.john...@datacom.co.nz > >> To: u2-users@listserver.u2ug.org > >> Date: Wed, 4 Jul 2012 14:05:02 +1200 > >> Subject: Re: [U2] RESIZE - dynamic files > >> > >> Doug may have had a key bounce in his input > >> > >>> Let's do the math: > >>> > >>> 258687736 (Record Size) > >>> 192283300 (Key Size) > >>> > >> > >> The key size is actually 19283300 in Chris'
Re: [U2] RESIZE - dynamic files
Chis, I still am wondering what is prompting you to continue using the larger group size. I think that Martin, and the UV documentation is correct in this case; you would be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, "Martin Phillips" wrote: coming > Hi, > > The various suggestions about setting the minimum modulus to reduce overflow > are all very well but effectively you are turning a > dynamic file into a static one, complete with all the continual maintenance > work needed to keep the parameters in step with the > data. > > In most cases, the only parameter that is worth tuning is the group size to > try to pack things nicely. Even this is often fine left > alone though getting it to match the underlying o/s page size is helpful. > > I missed the start of this thread but, unless you have a performance problem > or are seriously short of space, my recommendation > would be to leave the dynamic files to look after themselves. > > A file without overflow is not necessarily the best solution. Winding the > split load down to 70% means that at least 30% of the file > is dead space. The implication of this is that the file is larger and will > take more disk reads to process sequentially from one end > to the other. > > > Martin Phillips > Ladybridge Systems Ltd > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England > +44 (0)1604-709200 > > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: 05 July 2012 15:19 > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > I was able to drop from 30% overflow to 12% by making 2 changes: > > 1) changed the split from 80% to 70% (that alone reduce 10% overflow) > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record > data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) > > My disk size only went up 8%.. > > My file looks like this now: > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, >14431 overflowed, 778 badly ) > Number of records .. 1292377 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 70% (split), 50% (merge) and 63% (actual) > Total size . 546869248 bytes > Total size of record data .. 287789178 bytes > Total size of record IDs ... 21539538 bytes > Unused space ... 237532340 bytes > Total space for records 546861056 bytes > > Chris > > > >> From: keith.john...@datacom.co.nz >> To: u2-users@listserver.u2ug.org >> Date: Wed, 4 Jul 2012 14:05:02 +1200 >> Subject: Re: [U2] RESIZE - dynamic files >> >> Doug may have had a key bounce in his input >> >>> Let's do the math: >>> >>> 258687736 (Record Size) >>> 192283300 (Key Size) >>> >> >> The key size is actually 19283300 in Chris' figures >> >> Regarding 68,063 being less than the current modulus of 82,850. I think the >> answer may lie in the splitting process. >> >> As I understand it, the first time a split occurs group 1 is split and its >> contents are split between new group 1 and new group 2. > All the other groups effectively get 1 added to their number. The next split > is group 3 (which was 2) into 3 and 4 and so forth. A > pointer is kept to say where the next split will take place and also to help > sort out how to adjust the algorithm to identify which > group matches a given key. >> >> Based on this, if you started with 1000 groups, by the time you have split >> the 500th time you will have 1500 groups. The first > 1000 will be relatively empty, the last 500 will probably be overflowed, but > not terribly badly. By the time you get to the 1000th > split, you will have 2000 groups and they will, one hopes, be quite > reasonably spread with very little overflow. >> >> So I expect the average access times would drift up and down in a cycle. >> The cycle time would get longer as the file gets bigger > but the worst time would be roughly the the same each cycle. >> >> Given the power of two introduced into the algorithm by the before/af
Re: [U2] RESIZE - dynamic files
Hi, The various suggestions about setting the minimum modulus to reduce overflow are all very well but effectively you are turning a dynamic file into a static one, complete with all the continual maintenance work needed to keep the parameters in step with the data. In most cases, the only parameter that is worth tuning is the group size to try to pack things nicely. Even this is often fine left alone though getting it to match the underlying o/s page size is helpful. I missed the start of this thread but, unless you have a performance problem or are seriously short of space, my recommendation would be to leave the dynamic files to look after themselves. A file without overflow is not necessarily the best solution. Winding the split load down to 70% means that at least 30% of the file is dead space. The implication of this is that the file is larger and will take more disk reads to process sequentially from one end to the other. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 05 July 2012 15:19 To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 70% (split), 50% (merge) and 63% (actual) Total size . 546869248 bytes Total size of record data .. 287789178 bytes Total size of record IDs ... 21539538 bytes Unused space ... 237532340 bytes Total space for records 546861056 bytes Chris > From: keith.john...@datacom.co.nz > To: u2-users@listserver.u2ug.org > Date: Wed, 4 Jul 2012 14:05:02 +1200 > Subject: Re: [U2] RESIZE - dynamic files > > Doug may have had a key bounce in his input > > > Let's do the math: > > > > 258687736 (Record Size) > > 192283300 (Key Size) > > > > The key size is actually 19283300 in Chris' figures > > Regarding 68,063 being less than the current modulus of 82,850. I think the > answer may lie in the splitting process. > > As I understand it, the first time a split occurs group 1 is split and its > contents are split between new group 1 and new group 2. All the other groups effectively get 1 added to their number. The next split is group 3 (which was 2) into 3 and 4 and so forth. A pointer is kept to say where the next split will take place and also to help sort out how to adjust the algorithm to identify which group matches a given key. > > Based on this, if you started with 1000 groups, by the time you have split > the 500th time you will have 1500 groups. The first 1000 will be relatively empty, the last 500 will probably be overflowed, but not terribly badly. By the time you get to the 1000th split, you will have 2000 groups and they will, one hopes, be quite reasonably spread with very little overflow. > > So I expect the average access times would drift up and down in a cycle. The > cycle time would get longer as the file gets bigger but the worst time would be roughly the the same each cycle. > > Given the power of two introduced into the algorithm by the before/after the > split thing, I wonder if there is such a need to start off with a prime? > > Regards, Keith > > PS I'm getting a bit Tony^H^H^H^Hverbose nowadays. > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 70% (split), 50% (merge) and 63% (actual) Total size . 546869248 bytes Total size of record data .. 287789178 bytes Total size of record IDs ... 21539538 bytes Unused space ... 237532340 bytes Total space for records 546861056 bytes Chris > From: keith.john...@datacom.co.nz > To: u2-users@listserver.u2ug.org > Date: Wed, 4 Jul 2012 14:05:02 +1200 > Subject: Re: [U2] RESIZE - dynamic files > > Doug may have had a key bounce in his input > > > Let's do the math: > > > > 258687736 (Record Size) > > 192283300 (Key Size) > > > > The key size is actually 19283300 in Chris' figures > > Regarding 68,063 being less than the current modulus of 82,850. I think the > answer may lie in the splitting process. > > As I understand it, the first time a split occurs group 1 is split and its > contents are split between new group 1 and new group 2. All the other groups > effectively get 1 added to their number. The next split is group 3 (which was > 2) into 3 and 4 and so forth. A pointer is kept to say where the next split > will take place and also to help sort out how to adjust the algorithm to > identify which group matches a given key. > > Based on this, if you started with 1000 groups, by the time you have split > the 500th time you will have 1500 groups. The first 1000 will be relatively > empty, the last 500 will probably be overflowed, but not terribly badly. By > the time you get to the 1000th split, you will have 2000 groups and they > will, one hopes, be quite reasonably spread with very little overflow. > > So I expect the average access times would drift up and down in a cycle. The > cycle time would get longer as the file gets bigger but the worst time would > be roughly the the same each cycle. > > Given the power of two introduced into the algorithm by the before/after the > split thing, I wonder if there is such a need to start off with a prime? > > Regards, Keith > > PS I'm getting a bit Tony^H^H^H^Hverbose nowadays. > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Disk space is not a factor, as we are a smaller shop and disk space comes cheap. However, one thing I did notice is when I increased the modulus to a very large number which then increased my disk space to about 3-4x of my record data, my SELECT queries were slower. Are the 2 factors when choosing HOW the file is used based on whether your using? 1) a lot of SELECTS (then looping through the records) 2) grabbing individual records (not using a SELECT) With this file we really do a lot of SELECTS (option 1), then loop through the records. With that being said and based on the reading I've done here it would appear it's better to have a little overflow and not use up so much disk space for modulus (groups) for this application since we do use a lot of SELECT queries. Is this correct? Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 bytes'. It would seem to me that I would want to REDUCE my split to ~70% to reduce overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger than my current modulus (~10% bigger) since this will be a growing file and will never merge. In my case using the formula might not make sense since this file will never merge. Does this make sense? File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 92903 current ( minimum 31, 87 empty, 28248 overflowed, 2510 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 501219328 bytes Total size of record data .. 287426366 bytes Total size of record IDs ... 21539682 bytes Unused space ... 192245088 bytes Total space for records 501211136 bytes With all that being said if I change the following: 1) SPLIT.LOAD to 70% 2) MINIMUM.MODULUS > 130,000 That's all I should really need to do to 'tweak' the performance of this file.. If this doesn't sound right I would be interested to hear how it should be tweaked instead. Thanks for all the help so far, I think this is all starting to make sense. Chris > From: ro...@stamina.com.au > To: u2-users@listserver.u2ug.org > Date: Wed, 4 Jul 2012 01:36:26 + > Subject: Re: [U2] RESIZE - dynamic files > > I would suggest that then actual goal is to achieve maximum performance for > your system, so knowing HOW the file is used on a daily basis can also > influence decisions. Disk is a cheap commodity, so having some "wastage" in > file utilization shouldn't factor. > > > Ross Ferris > Stamina Software > Visage > Better by Design! ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Hi all, > I wouldn't actually be surprised if QM is like PI. Drifting away from U2 but the question was asked The initial implementation of dynamic files in QM was fairly close to that of PI/open but it was totally reworked long before the product went onto general release, resulting in some useful performance gains. Like UV, a QM dynamic file is represented by a directory but the DATA.30 and OVER.30 items become %0 and %1. Further items may exist to hold alternate key indices. The underlying mechanism of dynamic files is common to PI, PI/open, UV and QM but UniData goes its own way. Although a couple of the numbers have to be changed for UV, I think that the technical note at http://www.openqm.org/downloads/dynamic_files.pdf is largely applicable to UV, at least in principle. There are some substantial differences in how the two products perform split/merge operations, especially with regard to management of locking tables, but this is not the right forum to discuss this further. Interestingly, the UV Internals course used to state that the dynamic file hashing algorithm was the same one as static file type 18. Experiments suggest that this is not true and it looks as though UV uses the same public domain hashing algorithm that we chose for QM. As a useful tip for users running UV (or QM) on Windows systems, getting the Windows memory caching parameters set correctly can make a massive difference to performance. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
On 04/07/12 17:44, Charles Stevenson wrote: >>SMAT -d (or ANALYZE.SHM -d) see uv/bin/smat[.exe] > uv/bin/analyze.shm[.exe] > > Dynamic Files: > Slot # Inode Device Ref Count Htype Split Merge Curmod Basemod > Largerec Filesp Selects Nextsplit > 0 1285128087 209307792516208050 4001 > 2048 3267 2782736 0 1954 > 1 153221440 151542860060208050 397040 > 262144 1628 58641084 0134897 > 2 1155376080 317006236 6208050 81 64 > 1628 133616 018 > 3 924071961 976405761 2208050 957 512 > 1628 1249180 0 446 > 4 619894993 1297457141 1208050 1157 > 1024 1628 3837400 0 134 > 5 1401440370 656655020 6218050 213429 > 131072 1628 54052576 0 82358 > 6 1053905064 1350670129 2208050 365 256 > 1628 529956 0 110 > 7 963519080 1084306943 2208050 2564 > 2048 1628 4019040 0 517 > 8 1909033200 47372346598208050 3851 > 2048 3267 12775756 0 1804 >etc. > > Because of the concurrency difficulties that Brian mentioned . . . > > On 7/4/2012 5:26 AM, Brian Leach wrote: >> What makes the implementation difficult is that Litwin et al were all >> assuming a single threaded access to an in-memory list. Concurrent >> access whilst maintaining the load factor, split pointer and splitting >> all add a lot more complexity, unless you lock the whole file for the >> duration of an IO operation and kill the performance. > . . . is why UV reserves a table in shared memory for dynamic files, per > SMAT -d. > The 1st user that opens the file causes the control info in the file > header to be loaded to shared memory, where it remains until Ref Count > drops to 0. > (It also get written to the file whenever there is a change. At least > on modern versions.) Actually, thinking about it, why do you need to lock the entire file when splitting or merging? A merge actually could be done very quickly, to merge groups 10 and 2 you just chain 10 on to the end of 2 and don't bother actually consolidating them. But to split 2 into 2 and 10, you just need an exclusive lock on both of them. Any attempt to access 1 or 3 or 9 can just sail right on by - only if a process wants to access the group being split do you need to stall it until you've finished. Although that is a problem if you're sequentially scanning the file - which does block split/merge while you're doing it. I remember coming across a very badly sized dynamic file where that had obviously happened - I guess someone had left a program half way through a BASIC SELECT for a week or so and the file had grown somewhat horrendously. It slowly corrected itself though. (I found it because our client's system was horribly slow and I was looking for the cause. This wasn't it though - it turned out to be some nasty code somewhere else, can't remember exactly what.) Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
On 04/07/12 19:59, Rick Nuckolls wrote: > I believe PiOpen used a directory with two files in it ‘&$0’ and ‘&$1’ > corresponding to DATA.30 and OVER.30. If the numbers went up from there, I > think that they corresponded to alternate keys, ie ‘&$2’ and ‘&$3’ > represented DATA.30 and OVER.30 for the first alternate key. > And &$2, and &$3, and the rest, iirc ... > I do not think that PiOpen supported statically hashed files. (Pr1me > Information did) I'd be very surprised if it didn't. I might look up the manuals in my garage and check ... Or try to boot my EXL7330 and actually see what it does -) > > All of that is a few years ago Agreed. But I dug into that at the time, and I'm pretty certain there were a lot more than just two files in most dynamic file directories... I might even have a CD somewhere with a tape-dump on it ... > > Unidata uses dat001 and over001 with the number increasing to allow for very > large files (I think). > > -Rick > Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
I believe PiOpen used a directory with two files in it ‘&$0’ and ‘&$1’ corresponding to DATA.30 and OVER.30. If the numbers went up from there, I think that they corresponded to alternate keys, ie ‘&$2’ and ‘&$3’ represented DATA.30 and OVER.30 for the first alternate key. I do not think that PiOpen supported statically hashed files. (Pr1me Information did) All of that is a few years ago Unidata uses dat001 and over001 with the number increasing to allow for very large files (I think). -Rick On Jul 4, 2012, at 10:51 AM, Wols Lists wrote: > On 04/07/12 11:26, Brian Leach wrote: >>> All the other groups effectively get 1 added to their number >> Not exactly. >> >> Sorry to those who already know this, but maybe it's time to go over linear >> hashing in theory .. >> >> Linear hashing was a system devised by Litwin and originally only for >> in-memory lists. In fact there's some good implementations in C# that >> provide better handling of Dictionary types. Applying it to a file system >> adds some complexity but it's basically the same theory. >> >> Let's start with a file that has 100 groups initially defined (that's 0 >> through 99). That is your minimum starting point and should ensure that it >> never shrinks below that, so it doesn't begin it's life with loads of splits >> right from the start as you populate the file. You would size this similarly >> to the way you size a regular hashed file for your initial content: no point >> making work for yourself (or the database). >> >> As data gets added, because the content is allocated unevenly, some of that >> load will be in primary and some in overflow: that's just the way of the >> world. No hashing is perfect. Unlike a static file, the overflow can't be >> added to the end of the file as a linked list (* why nobody has done managed >> overflow is beyond me), it has to sit in a separate file. > > I don't know what the definition of "badly overflowed" is, but assuming > that a badly overflowed group has two blocks of overflow, then those > file stats seem perfectly okay. As Brian has explained, the distribution > of records is "lumpy" and as a percentage of the file, there aren't many > badly overflowed groups. > > You've got roughly 1/3 of groups overflowed - with an 80% split that > doesn't seem at all out of order - on average each group is 80% full so > 1/3rd more than 100% full is fine. > > You've got (in thousands) one and a half groups badly overflowed out of > eighty-three. That's less than two percent. That's nothing. > > As for why no-one has done managed overflow, I think there are various > reasons. The first successful implementation (Prime INFORMATION) didn't > need it. It used a peculiar type of file called a "Segmented Directory" > and while I don't know for certain what PI did, I strongly suspect each > group had its own normal file so if a group overflowed, it just created > a new block at the end of the file. Same with large records, it > allocated a bunch of overflow blocks. This file structure was far more > evident with PI-Open - at the OS level a dynamic file was a OS directory > with lots of numbered files in it. > > The UV implementation of "one file for data, one file for overflow" may > be unique to UV. I don't know. What little I know of UD tells me it's > different, and others like QM could well be different again. I wouldn't > actually be surprised if QM is like PI. > > Cheers, > Wol > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
On 04/07/12 11:26, Brian Leach wrote: >> All the other groups effectively get 1 added to their number > Not exactly. > > Sorry to those who already know this, but maybe it's time to go over linear > hashing in theory .. > > Linear hashing was a system devised by Litwin and originally only for > in-memory lists. In fact there's some good implementations in C# that > provide better handling of Dictionary types. Applying it to a file system > adds some complexity but it's basically the same theory. > > Let's start with a file that has 100 groups initially defined (that's 0 > through 99). That is your minimum starting point and should ensure that it > never shrinks below that, so it doesn't begin it's life with loads of splits > right from the start as you populate the file. You would size this similarly > to the way you size a regular hashed file for your initial content: no point > making work for yourself (or the database). > > As data gets added, because the content is allocated unevenly, some of that > load will be in primary and some in overflow: that's just the way of the > world. No hashing is perfect. Unlike a static file, the overflow can't be > added to the end of the file as a linked list (* why nobody has done managed > overflow is beyond me), it has to sit in a separate file. I don't know what the definition of "badly overflowed" is, but assuming that a badly overflowed group has two blocks of overflow, then those file stats seem perfectly okay. As Brian has explained, the distribution of records is "lumpy" and as a percentage of the file, there aren't many badly overflowed groups. You've got roughly 1/3 of groups overflowed - with an 80% split that doesn't seem at all out of order - on average each group is 80% full so 1/3rd more than 100% full is fine. You've got (in thousands) one and a half groups badly overflowed out of eighty-three. That's less than two percent. That's nothing. As for why no-one has done managed overflow, I think there are various reasons. The first successful implementation (Prime INFORMATION) didn't need it. It used a peculiar type of file called a "Segmented Directory" and while I don't know for certain what PI did, I strongly suspect each group had its own normal file so if a group overflowed, it just created a new block at the end of the file. Same with large records, it allocated a bunch of overflow blocks. This file structure was far more evident with PI-Open - at the OS level a dynamic file was a OS directory with lots of numbered files in it. The UV implementation of "one file for data, one file for overflow" may be unique to UV. I don't know. What little I know of UD tells me it's different, and others like QM could well be different again. I wouldn't actually be surprised if QM is like PI. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
>SMAT -d (or ANALYZE.SHM -d) see uv/bin/smat[.exe] uv/bin/analyze.shm[.exe] Dynamic Files: Slot # Inode Device Ref Count Htype Split Merge Curmod Basemod Largerec Filesp Selects Nextsplit 0 1285128087 209307792516208050 4001 2048 3267 2782736 0 1954 1 153221440 151542860060208050 397040 262144 1628 58641084 0134897 2 1155376080 317006236 6208050 81 64 1628 133616 018 3 924071961 976405761 2208050 957 512 1628 1249180 0 446 4 619894993 1297457141 1208050 1157 1024 1628 3837400 0 134 5 1401440370 656655020 6218050 213429 131072 1628 54052576 0 82358 6 1053905064 1350670129 2208050 365 256 1628 529956 0 110 7 963519080 1084306943 2208050 2564 2048 1628 4019040 0 517 8 1909033200 47372346598208050 3851 2048 3267 12775756 0 1804 etc. Because of the concurrency difficulties that Brian mentioned . . . On 7/4/2012 5:26 AM, Brian Leach wrote: What makes the implementation difficult is that Litwin et al were all assuming a single threaded access to an in-memory list. Concurrent access whilst maintaining the load factor, split pointer and splitting all add a lot more complexity, unless you lock the whole file for the duration of an IO operation and kill the performance. . . . is why UV reserves a table in shared memory for dynamic files, per SMAT -d. The 1st user that opens the file causes the control info in the file header to be loaded to shared memory, where it remains until Ref Count drops to 0. (It also get written to the file whenever there is a change. At least on modern versions.) Rick's post makes good sense if you work the numbers in the SMAT table. Notice that (Curmod - Basemod) + 1 = Nextsplit (off by 1 because groups start at 0.) As Rick pointed out, Basemod is always a power of 2. It is used by the hashing algorithms. E.g., That 64 will eventually change to 128 or 32, once enough splits or merges happen. Notice also that the future "Nextsplit" group number is set, i.e., predictable. Remember Brian & Rick (others?) saying that split/merge decisions are determined by the entire file load, not which individual group that might happen to be in heavy overload? They were right: it is methodical. Chris, Notice that every number in the Split, Merge, & Largerec columns are the default values. Although I do have exceptions, any random grab of 9 files like this would likely show straight default values. Generally, fine-tuning isn't worth the bother. It's more bang for the IT buck to buy more memory, disk than to pay Brian or Rick to squeeze performance out of type-30 files. cds ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
This makes it sound as if you might need to search two groups for a record, which is not correct. If the initial hash is based on the larger modulo, and the group exists, then the key will be in the higher number group. If the result of the first hash is larger than the modulus of the of the table, then you rehash with the smaller modulus. And the modulo used for hashing is always a power of two. So if the initial hash function on a key is f(x), then the key will either be in f(x) mod 2**n or, if that group has not been created, then in f(x) mod 2**(n-1). n increases each time that the modulus grows to equal 2**n+1. So For a modulus of 3 or 4, n = 2; for 5,6,7,8, n =3. For instance: If your groups are numbered 0-5 (6 groups), and your hash value is 5, then you are in the last (6th) group because the 5 mod 8 is 5. Likewise 6 mod 8 is 6, but this would be beyond the highest group we have (5). 6 mod 4 is 2, and that is the group where 6 should fall. Likewise 7 should fall into group 3. After two more splits (of groups 2 & 3) the modulus will be 8, and no rehashing is necessary until we next split group 0 and add a 9th group, at which point we start with a mod 16, and use 8 if the first result is over 8 (8 would go into the 9th group, 9 would hash into the second group, #1 -- 9 mod 4 -> 1. Admittedly, this is probably at least as confusing as every other explanation of the process ;-) -Rick On Jul 4, 2012, at 3:26 AM, Brian Leach wrote: >> All the other groups effectively get 1 added to their number > > Not exactly. > > Sorry to those who already know this, but maybe it's time to go over linear > hashing in theory .. > > Linear hashing was a system devised by Litwin and originally only for > in-memory lists. In fact there's some good implementations in C# that > provide better handling of Dictionary types. Applying it to a file system > adds some complexity but it's basically the same theory. > > Let's start with a file that has 100 groups initially defined (that's 0 > through 99). That is your minimum starting point and should ensure that it > never shrinks below that, so it doesn't begin it's life with loads of splits > right from the start as you populate the file. You would size this similarly > to the way you size a regular hashed file for your initial content: no point > making work for yourself (or the database). > > As data gets added, because the content is allocated unevenly, some of that > load will be in primary and some in overflow: that's just the way of the > world. No hashing is perfect. Unlike a static file, the overflow can't be > added to the end of the file as a linked list (* why nobody has done managed > overflow is beyond me), it has to sit in a separate file. > > At some point the amount of data held in respect of the available space > reaches a critical level and the file needs to reorganize. Rather than split > the most heavily populated group - which would be the obvious thing - linear > hashing works on the basis of a split pointer that moves incrementally > through the file. So the first split breaks group 0 and adds group 100 to > the end of the file, hopefully moving around half the content of group 0 to > this new group. Of course, there is no guarantee that it will (depending on > key structure) and also no guarantee that this will help anything, if group > 0 isn't overflowed or populated anyway. So the next write may also cause a > split, except now to split group 1 into a new group 101, and so forth. > > Eventually the pointer will reach the end and all the initial 100 groups > will have been split, and the whole process restarts with the split pointer > moving back to zero. You now have 200 groups and by this time everything > should in theory have levelled out, but in the meantime there is still > overloading and stuff will still be in overflow. The next split will create > group 200 and split half of group 0 into it, and the whole process repeats > for ever. > > Oversized records (> buffer size) also get moved out because they stuff up > the block allocation. > > So why this crazy system, rather than hitting the filled groups as they get > overstuffed? Because it makes finding a record easy. Because linear hashing > is based on a power of 2, the maths is simple - if the group is after the > split point, the record MUST be in that group (or its overflow). If it is > before the split point, it could be in the original group or the split > group: so you can just rehash with double the modulus to check which one > without even having to scan the groups. > > What makes the implementation difficult is that Litwin et al were all > assuming a single threaded access to an in-memory list. Concurrent access > whilst maintaining the load factor, split pointer and splitting all add a > lot more complexity, unless you lock the whole file for the duration of an > IO operation and kill the performance. > > And coming back to the manual, storing la
Re: [U2] RESIZE - dynamic files
Good explanation, Brian! To anyone who skipped it because it looked long: read it anyway. cds On 7/4/2012 5:26 AM, Brian Leach wrote: Sorry to those who already know this, but maybe it's time to go over linear hashing in theory .. Linear hashing was a system devised by Litwin and originally only for in-memory lists. In fact there's some good implementations in C# that provide better handling of Dictionary types. Applying it to a file system adds some complexity but it's basically the same theory. Let's start with a file that has 100 groups initially defined (that's 0 through 99). That is your minimum starting point and should ensure that it never shrinks below that, so it doesn't begin it's life with loads of splits right from the start as you populate the file. You would size this similarly to the way you size a regular hashed file for your initial content: no point making work for yourself (or the database). As data gets added, because the content is allocated unevenly, some of that load will be in primary and some in overflow: that's just the way of the world. No hashing is perfect. Unlike a static file, the overflow can't be added to the end of the file as a linked list (* why nobody has done managed overflow is beyond me), it has to sit in a separate file. At some point the amount of data held in respect of the available space reaches a critical level and the file needs to reorganize. Rather than split the most heavily populated group - which would be the obvious thing - linear hashing works on the basis of a split pointer that moves incrementally through the file. So the first split breaks group 0 and adds group 100 to the end of the file, hopefully moving around half the content of group 0 to this new group. Of course, there is no guarantee that it will depending on key structure) and also no guarantee that this will help anything, if group 0 isn't overflowed or populated anyway. So the next write may also cause a split, except now to split group 1 into a new group 101, and so forth. Eventually the pointer will reach the end and all the initial 100 groups will have been split, and the whole process restarts with the split pointer moving back to zero. You now have 200 groups and by this time everything should in theory have levelled out, but in the meantime there is still overloading and stuff will still be in overflow. The next split will create group 200 and split half of group 0 into it, and the whole process repeats for ever. Oversized records (> buffer size) also get moved out because they stuff up the block allocation. So why this crazy system, rather than hitting the filled groups as they get overstuffed? Because it makes finding a record easy. Because linear hashing is based on a power of 2, the maths is simple - if the group is after the split point, the record MUST be in that group (or its overflow). If it is before the split point, it could be in the original group or the split group: so you can just rehash with double the modulus to check which one without even having to scan the groups. What makes the implementation difficult is that Litwin et al were all assuming a single threaded access to an in-memory list. Concurrent access whilst maintaining the load factor, split pointer and splitting all add a lot more complexity, unless you lock the whole file for the duration of an IO operation and kill the performance. And coming back to the manual, storing large numbers of data items - even large ones - in a type 19 file is a bad idea. Traversing directories is slow, especially in Windows, and locking is done against the whole directory. Brian ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
> All the other groups effectively get 1 added to their number Not exactly. Sorry to those who already know this, but maybe it's time to go over linear hashing in theory .. Linear hashing was a system devised by Litwin and originally only for in-memory lists. In fact there's some good implementations in C# that provide better handling of Dictionary types. Applying it to a file system adds some complexity but it's basically the same theory. Let's start with a file that has 100 groups initially defined (that's 0 through 99). That is your minimum starting point and should ensure that it never shrinks below that, so it doesn't begin it's life with loads of splits right from the start as you populate the file. You would size this similarly to the way you size a regular hashed file for your initial content: no point making work for yourself (or the database). As data gets added, because the content is allocated unevenly, some of that load will be in primary and some in overflow: that's just the way of the world. No hashing is perfect. Unlike a static file, the overflow can't be added to the end of the file as a linked list (* why nobody has done managed overflow is beyond me), it has to sit in a separate file. At some point the amount of data held in respect of the available space reaches a critical level and the file needs to reorganize. Rather than split the most heavily populated group - which would be the obvious thing - linear hashing works on the basis of a split pointer that moves incrementally through the file. So the first split breaks group 0 and adds group 100 to the end of the file, hopefully moving around half the content of group 0 to this new group. Of course, there is no guarantee that it will (depending on key structure) and also no guarantee that this will help anything, if group 0 isn't overflowed or populated anyway. So the next write may also cause a split, except now to split group 1 into a new group 101, and so forth. Eventually the pointer will reach the end and all the initial 100 groups will have been split, and the whole process restarts with the split pointer moving back to zero. You now have 200 groups and by this time everything should in theory have levelled out, but in the meantime there is still overloading and stuff will still be in overflow. The next split will create group 200 and split half of group 0 into it, and the whole process repeats for ever. Oversized records (> buffer size) also get moved out because they stuff up the block allocation. So why this crazy system, rather than hitting the filled groups as they get overstuffed? Because it makes finding a record easy. Because linear hashing is based on a power of 2, the maths is simple - if the group is after the split point, the record MUST be in that group (or its overflow). If it is before the split point, it could be in the original group or the split group: so you can just rehash with double the modulus to check which one without even having to scan the groups. What makes the implementation difficult is that Litwin et al were all assuming a single threaded access to an in-memory list. Concurrent access whilst maintaining the load factor, split pointer and splitting all add a lot more complexity, unless you lock the whole file for the duration of an IO operation and kill the performance. And coming back to the manual, storing large numbers of data items - even large ones - in a type 19 file is a bad idea. Traversing directories is slow, especially in Windows, and locking is done against the whole directory.. Brian ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Doug may have had a key bounce in his input > Let's do the math: > > 258687736 (Record Size) > 192283300 (Key Size) > The key size is actually 19283300 in Chris' figures Regarding 68,063 being less than the current modulus of 82,850. I think the answer may lie in the splitting process. As I understand it, the first time a split occurs group 1 is split and its contents are split between new group 1 and new group 2. All the other groups effectively get 1 added to their number. The next split is group 3 (which was 2) into 3 and 4 and so forth. A pointer is kept to say where the next split will take place and also to help sort out how to adjust the algorithm to identify which group matches a given key. Based on this, if you started with 1000 groups, by the time you have split the 500th time you will have 1500 groups. The first 1000 will be relatively empty, the last 500 will probably be overflowed, but not terribly badly. By the time you get to the 1000th split, you will have 2000 groups and they will, one hopes, be quite reasonably spread with very little overflow. So I expect the average access times would drift up and down in a cycle. The cycle time would get longer as the file gets bigger but the worst time would be roughly the the same each cycle. Given the power of two introduced into the algorithm by the before/after the split thing, I wonder if there is such a need to start off with a prime? Regards, Keith PS I'm getting a bit Tony^H^H^H^Hverbose nowadays. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
I would suggest that then actual goal is to achieve maximum performance for your system, so knowing HOW the file is used on a daily basis can also influence decisions. Disk is a cheap commodity, so having some "wastage" in file utilization shouldn't factor. Ross Ferris Stamina Software Visage > Better by Design! -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Wednesday, 4 July 2012 7:38 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files This is why I'm confused.. Is the goal here to reduce 'overflow' or to keep the 'Total size' of the disk down? If the goal is to keep the total disk size down then it would appear you would want your actual load % a lot higher than 37%.. and then ignore 'some' of the overflow.. Chris > But the total size of your file is up 60%. Reading in 60% more records in a > full select of the file is going to be much slower than a few more overflows. > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris > Austin > Sent: Tuesday, July 03, 2012 2:15 PM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > Dan, > > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my > Actual Load has really gone down (as well as overflow). See below for the > results: > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 23 current ( minimum 23, 5263 empty, > 3957 overflowed, 207 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 90% (split), 50% (merge) and 37% (actual) > Total size . 836235264 bytes > Total size of record data .. 287394719 bytes > Total size of record IDs ... 21508521 bytes > Unused space ... 527323832 bytes > Total space for records 836227072 bytes > > My overflow is now @ 2% > My Load is @ 37% (actual) > > granted my empty groups are now up to almost 3% but I hope that won't be a > big factor. How does this look? > > Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
From the System Description manual: Important Considerations Dynamic files are meant to make file management easier for users. The default parameters are set so that most dynamic files work efficiently. If you decide to change the parameters of a dynamic file, keep the following considerations in mind: 􀂄 Use the SEQ.NUM hashing algorithm only when your record IDs are numeric, sequential, and consecutive. Nonconsecutive numbers should not be hashed using the SEQ.NUM hashing algorithm. 􀂄 Use a group size of 2 only if you expect the average record size to be larger than 1000 bytes. If your record size is larger than 2000 bytes, consider using a nonhashed file—type 1 or 19. 􀂄 Large record size should generally not be changed. Storing the data of a large record in the overflow buffer causes that data not to be included in the split and merge calculations. Also, the extra data length does not slow access to subsequent records. By choosing a large record size of 0%, all the records are considered large. In this case, record IDs can be accessed extremely quickly by commands such as SELECT, but access to the actual data is much less efficient. 􀂄 A small split load causes less data to be stored in each group buffer, resulting in faster access time and less overflow at the expense of requiring extra memory. A large split load causes more data to be stored in each group buffer, resulting in better use of memory at the expense of slower access time and more overflow. A split load of 100% disables splits. 􀂄 The gap between merge load and split load should be large enough so that splits and merges do not occur too frequently. The split and merge processes take a significant amount of processing time. If you make the merge load too small, memory usage can be very poor. Also, selection time is increased when record IDs are distributed in more groups than are needed. A merge load of 0% disables merges. 􀂄 Consider increasing the minimum modulo if you intend to add a lot of initial data to the file. Much data-entry time can be saved by avoiding the initial splits that can occur if you enter a lot of initial data. You may want to readjust this value after -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Charles Stevenson Sent: Tuesday, July 03, 2012 3:34 PM To: U2 Users List Subject: Re: [U2] RESIZE - dynamic files Chris, Let's back way up. I take it your original question is a general one, not specific to one poorly performing problematic file. Is that right? If so, generally speaking, you just don't get a lot out of fine-tuning dynamic files. Tweaking the default parameters doesn't usually make a whole lot of difference. Several people have said something similar in this thread. Other than deciding which hashing algorithm, I generally use the defaults and only tweak things once the file proves problematic, which usually means slow I/O. When a problem erupts, look carefully at how that specific file is used, as Susan & others have said. You might get hold of Fitzgerald&Long's paper on how dynamic files work. If you understand the fundamentals, you'll understand how to attack your problem file, applying the ideas Rick & others have talked about here. You may go several years without having to resort to that. Chuck Stevenson On 7/2/2012 2:22 PM, Chris Austin wrote: > I was wondering if anyone had instructions on RESIZE with a dynamic file? For > example I have a file called 'TEST_FILE' > with the following: > > 01 ANALYZE.FILE TEST_FILE > File name .. TEST_FILE > Pathname ... TEST_FILE > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 83261 current ( minimum 31 ) > Large record size .. 3267 bytes > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 450613248 bytes > > How do you calculate what the modulus and separation should be? I can't use > HASH.HELP on a type 30 file to see the recommended settings > so I was wondering how best you figure out the file RESIZE. > > Thanks, > > Chris > ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Chris, Let's back way up. I take it your original question is a general one, not specific to one poorly performing problematic file. Is that right? If so, generally speaking, you just don't get a lot out of fine-tuning dynamic files. Tweaking the default parameters doesn't usually make a whole lot of difference. Several people have said something similar in this thread. Other than deciding which hashing algorithm, I generally use the defaults and only tweak things once the file proves problematic, which usually means slow I/O. When a problem erupts, look carefully at how that specific file is used, as Susan & others have said. You might get hold of Fitzgerald&Long's paper on how dynamic files work. If you understand the fundamentals, you'll understand how to attack your problem file, applying the ideas Rick & others have talked about here. You may go several years without having to resort to that. Chuck Stevenson On 7/2/2012 2:22 PM, Chris Austin wrote: I was wondering if anyone had instructions on RESIZE with a dynamic file? For example I have a file called 'TEST_FILE' with the following: 01 ANALYZE.FILE TEST_FILE File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 83261 current ( minimum 31 ) Large record size .. 3267 bytes Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 450613248 bytes How do you calculate what the modulus and separation should be? I can't use HASH.HELP on a type 30 file to see the recommended settings so I was wondering how best you figure out the file RESIZE. Thanks, Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Unless the minimum modulus is configured high enough to artificially lower the actual load, the actual load will rise to the designated split.load as the file grows. The split.load indicates nothing about the specific load of any given group; so if it is set to 90%, then on average, each group will be 90% full, and adding a (400byte) record to a group will send it into overflow, but since 400 bytes is a trivial percentage of your overall file load, many groups will be overflowed before the total load factor exceeds 90%. Okay, there is a slight distortion with the numbers there, but the idea is that all buckets are not loaded equally, so if the average is "almost full" the reality is "many overflowed". -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 2:52 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files I set the split load based on what Dan suggested: "I'd take the merge down a little, to maybe 30% or even less, and maybe knock the split up a bit - say, 90% - to cut down on the splitting." I thought this would cut down on splitting. Is there a certain formula, or way to calculate the split.load? What should my SPLIT.LOAD be around, and how do you come up with that %? Chris > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 14:45:28 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > 37% is a very low load. Reading disk records takes much longer than parsing > the records out of a disk record. With variable record size and moderately > poor hashing, overflow is inevitable. So, do you want 80,000 extra groups, > or 20,000 overflow buffers? I would go with the smaller number. But for the > love of Knuth, do not set your split.load to 90% unless you have a perfectly > hashed file with uniformly sized records. > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 2:38 PM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > This is why I'm confused.. Is the goal here to reduce 'overflow' or to > keep the 'Total size' of the disk down? If the goal is to keep the total > disk size down then it would appear > you would want your actual load % a lot higher than 37%.. and then ignore > 'some' of the overflow.. > > Chris > > > > But the total size of your file is up 60%. Reading in 60% more records in > > a full select of the file is going to be much slower than a few more > > overflows. > > > > > > -Original Message- > > From: u2-users-boun...@listserver.u2ug.org > > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > > Sent: Tuesday, July 03, 2012 2:15 PM > > To: u2-users@listserver.u2ug.org > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > Dan, > > > > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and > > my Actual Load has really gone down (as well as overflow). See below for > > the results: > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 23 current ( minimum 23, 5263 empty, > > 3957 overflowed, 207 badly ) > > Number of records .. 1290469 > > Large record size .. 3267 bytes > > Number of large records 180 > > Group size . 4096 bytes > > Load factors ... 90% (split), 50% (merge) and 37% (actual) > > Total size . 836235264 bytes > > Total size of record data .. 287394719 bytes > > Total size of record IDs ... 21508521 bytes > > Unused space ... 527323832 bytes > > Total space for records 836227072 bytes > > > > My overflow is now @ 2% > > My Load is @ 37% (actual) > > > > granted my empty groups are now up to almost 3% but I hope that won't be a > > big factor. How does this look? > > > > Chris > > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___
Re: [U2] RESIZE - dynamic files
I set the split load based on what Dan suggested: "I'd take the merge down a little, to maybe 30% or even less, and maybe knock the split up a bit - say, 90% - to cut down on the splitting." I thought this would cut down on splitting. Is there a certain formula, or way to calculate the split.load? What should my SPLIT.LOAD be around, and how do you come up with that %? Chris > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 14:45:28 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > 37% is a very low load. Reading disk records takes much longer than parsing > the records out of a disk record. With variable record size and moderately > poor hashing, overflow is inevitable. So, do you want 80,000 extra groups, > or 20,000 overflow buffers? I would go with the smaller number. But for the > love of Knuth, do not set your split.load to 90% unless you have a perfectly > hashed file with uniformly sized records. > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 2:38 PM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > This is why I'm confused.. Is the goal here to reduce 'overflow' or to > keep the 'Total size' of the disk down? If the goal is to keep the total > disk size down then it would appear > you would want your actual load % a lot higher than 37%.. and then ignore > 'some' of the overflow.. > > Chris > > > > But the total size of your file is up 60%. Reading in 60% more records in > > a full select of the file is going to be much slower than a few more > > overflows. > > > > > > -Original Message- > > From: u2-users-boun...@listserver.u2ug.org > > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > > Sent: Tuesday, July 03, 2012 2:15 PM > > To: u2-users@listserver.u2ug.org > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > Dan, > > > > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and > > my Actual Load has really gone down (as well as overflow). See below for > > the results: > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 23 current ( minimum 23, 5263 empty, > > 3957 overflowed, 207 badly ) > > Number of records .. 1290469 > > Large record size .. 3267 bytes > > Number of large records 180 > > Group size . 4096 bytes > > Load factors ... 90% (split), 50% (merge) and 37% (actual) > > Total size . 836235264 bytes > > Total size of record data .. 287394719 bytes > > Total size of record IDs ... 21508521 bytes > > Unused space ... 527323832 bytes > > Total space for records 836227072 bytes > > > > My overflow is now @ 2% > > My Load is @ 37% (actual) > > > > granted my empty groups are now up to almost 3% but I hope that won't be a > > big factor. How does this look? > > > > Chris > > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Chris, This is why file-sizing is something that requires careful thought. As some of the other responders have indicated, sometimes you want to keep overflow to a minimum (because accessing individual records that are in overflow takes extra disk reads, which slow down your system, and adding new records to a group that is already in overflow will inevitably be slower than adding a new record to a group which is not in overflow), and sometimes you don't (eg if you have a file that is primarily read in a sequential fashion where you do a Basic SELECT, and then loop through the file reading every single record). Because most of the files that I have supported in my career have been read and written primarily as single-record reads, I have always chosen to minimize overflow as my default criteria, and only sized things for sequential reads when the file is rarely written, rarely read as anything but a 'read them all in no particular order' fashion, and that happens rarely in my experience. However, as other responders have written, 'your mileage may vary'! Look at how the file is used. Look at what resources you have. Then decide... Susan M. Lynch F. W. Davison & Company, Inc. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 07/03/2012 5:38 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files This is why I'm confused.. Is the goal here to reduce 'overflow' or to keep the 'Total size' of the disk down? If the goal is to keep the total disk size down then it would appear you would want your actual load % a lot higher than 37%.. and then ignore 'some' of the overflow.. Chris > But the total size of your file is up 60%. Reading in 60% more records in a full select of the file is going to be much slower than a few more overflows. > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 2:15 PM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > Dan, > > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my Actual Load has really gone down (as well as overflow). See below for the results: > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 23 current ( minimum 23, 5263 empty, > 3957 overflowed, 207 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 90% (split), 50% (merge) and 37% (actual) > Total size . 836235264 bytes > Total size of record data .. 287394719 bytes > Total size of record IDs ... 21508521 bytes > Unused space ... 527323832 bytes > Total space for records 836227072 bytes > > My overflow is now @ 2% > My Load is @ 37% (actual) > > granted my empty groups are now up to almost 3% but I hope that won't be a big factor. How does this look? > > Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
37% is a very low load. Reading disk records takes much longer than parsing the records out of a disk record. With variable record size and moderately poor hashing, overflow is inevitable. So, do you want 80,000 extra groups, or 20,000 overflow buffers? I would go with the smaller number. But for the love of Knuth, do not set your split.load to 90% unless you have a perfectly hashed file with uniformly sized records. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 2:38 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files This is why I'm confused.. Is the goal here to reduce 'overflow' or to keep the 'Total size' of the disk down? If the goal is to keep the total disk size down then it would appear you would want your actual load % a lot higher than 37%.. and then ignore 'some' of the overflow.. Chris > But the total size of your file is up 60%. Reading in 60% more records in a > full select of the file is going to be much slower than a few more overflows. > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 2:15 PM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > Dan, > > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my > Actual Load has really gone down (as well as overflow). See below for the > results: > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 23 current ( minimum 23, 5263 empty, > 3957 overflowed, 207 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 90% (split), 50% (merge) and 37% (actual) > Total size . 836235264 bytes > Total size of record data .. 287394719 bytes > Total size of record IDs ... 21508521 bytes > Unused space ... 527323832 bytes > Total space for records 836227072 bytes > > My overflow is now @ 2% > My Load is @ 37% (actual) > > granted my empty groups are now up to almost 3% but I hope that won't be a > big factor. How does this look? > > Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Disks get "bigger" much faster than the rate they get "faster". So the overflow is the thing to minimize. -Original Message- From: Chris Austin To: u2-users Sent: Tue, Jul 3, 2012 2:38 pm Subject: Re: [U2] RESIZE - dynamic files his is why I'm confused.. Is the goal here to reduce 'overflow' or to eep the 'Total size' of the disk down? If the goal is to keep the total disk size down then it would appear ou would want your actual load % a lot higher than 37%.. and then ignore 'some' f the overflow.. Chris But the total size of your file is up 60%. Reading in 60% more records in a ull select of the file is going to be much slower than a few more overflows. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 2:15 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files Dan, I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my ctual Load has really gone down (as well as overflow). See below for the esults: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 23 current ( minimum 23, 5263 empty, 3957 overflowed, 207 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 37% (actual) Total size . 836235264 bytes Total size of record data .. 287394719 bytes Total size of record IDs ... 21508521 bytes Unused space ... 527323832 bytes Total space for records 836227072 bytes My overflow is now @ 2% My Load is @ 37% (actual) granted my empty groups are now up to almost 3% but I hope that won't be a big actor. How does this look? Chris __ 2-Users mailing list 2-us...@listserver.u2ug.org ttp://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
This is why I'm confused.. Is the goal here to reduce 'overflow' or to keep the 'Total size' of the disk down? If the goal is to keep the total disk size down then it would appear you would want your actual load % a lot higher than 37%.. and then ignore 'some' of the overflow.. Chris > But the total size of your file is up 60%. Reading in 60% more records in a > full select of the file is going to be much slower than a few more overflows. > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 2:15 PM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > Dan, > > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my > Actual Load has really gone down (as well as overflow). See below for the > results: > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 23 current ( minimum 23, 5263 empty, > 3957 overflowed, 207 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 90% (split), 50% (merge) and 37% (actual) > Total size . 836235264 bytes > Total size of record data .. 287394719 bytes > Total size of record IDs ... 21508521 bytes > Unused space ... 527323832 bytes > Total space for records 836227072 bytes > > My overflow is now @ 2% > My Load is @ 37% (actual) > > granted my empty groups are now up to almost 3% but I hope that won't be a > big factor. How does this look? > > Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
I should have said "60% more disk records", to be clear. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Rick Nuckolls Sent: Tuesday, July 03, 2012 2:24 PM To: 'U2 Users List' Subject: Re: [U2] RESIZE - dynamic files But the total size of your file is up 60%. Reading in 60% more records in a full select of the file is going to be much slower than a few more overflows. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 2:15 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files Dan, I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my Actual Load has really gone down (as well as overflow). See below for the results: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 23 current ( minimum 23, 5263 empty, 3957 overflowed, 207 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 37% (actual) Total size . 836235264 bytes Total size of record data .. 287394719 bytes Total size of record IDs ... 21508521 bytes Unused space ... 527323832 bytes Total space for records 836227072 bytes My overflow is now @ 2% My Load is @ 37% (actual) granted my empty groups are now up to almost 3% but I hope that won't be a big factor. How does this look? Chris > From: dangf...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 16:57:34 -0400 > Subject: Re: [U2] RESIZE - dynamic files > > > One rule of thumb is to make sure that you have an average of 10 or less > items in each group. Going by that, you'd want a minimum mod of 130k or more. > I've also noticed that files approach the "sweet spot" for minimizing > overflow without having excessive empty groups when the total size is pretty > nearly twice the data size. > > The goal can vary according to your situation. I'm personally not all that > afraid of making the modulus a little too large, as overflow is a pretty bad > performance hit (overflow means at least two disk reads to retrieve your > data, "badly" means at least 2 extra disk reads, and I've seen files where > that was thousands (this file isn't that bad, but 20% of your data is forcing > at least one extra disk read). Empty groups contribute to overhead on a > sequential search, so you'd want to consider how often you do a sequential > search on a file - usually, that's a pretty inefficient way to retrieve data, > but, again, your mileage may vary. > > To me, 20% is too much overflow, and 114 empty groups is trivial; much less > than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what > it looks like there. That'll give you an average of 6 records per group, not > unreasonably shallow, and it's likely to be a while before you have to resize > again. > > > From: cjausti...@hotmail.com > > To: u2-users@listserver.u2ug.org > > Date: Tue, 3 Jul 2012 15:23:23 -0500 > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > I guess what I need to know is what's an acceptable % of overflow for a > > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using > > the calculated min modulus) > > I'm still left with ~ 20% of overflow (see below). Is 20% overflow > > considered acceptable on average or should I keep tinkering with it to > > reach a lower overflow %? > > > > Correct me if I'm wrong but it seems the goal here is to REDUCE the > > overflow % while not creating too many modulus (groups). > > > > Chris > > > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 105715 current ( minimum 103889, 114 empty, > > 21092 overflowed, 1452 badly ) > > Number of records .. 1290469 > > Large record size .. 3267 bytes > > Number of large records 180 > > Group size .
Re: [U2] RESIZE - dynamic files
But the total size of your file is up 60%. Reading in 60% more records in a full select of the file is going to be much slower than a few more overflows. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 2:15 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files Dan, I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my Actual Load has really gone down (as well as overflow). See below for the results: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 23 current ( minimum 23, 5263 empty, 3957 overflowed, 207 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 37% (actual) Total size . 836235264 bytes Total size of record data .. 287394719 bytes Total size of record IDs ... 21508521 bytes Unused space ... 527323832 bytes Total space for records 836227072 bytes My overflow is now @ 2% My Load is @ 37% (actual) granted my empty groups are now up to almost 3% but I hope that won't be a big factor. How does this look? Chris > From: dangf...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 16:57:34 -0400 > Subject: Re: [U2] RESIZE - dynamic files > > > One rule of thumb is to make sure that you have an average of 10 or less > items in each group. Going by that, you'd want a minimum mod of 130k or more. > I've also noticed that files approach the "sweet spot" for minimizing > overflow without having excessive empty groups when the total size is pretty > nearly twice the data size. > > The goal can vary according to your situation. I'm personally not all that > afraid of making the modulus a little too large, as overflow is a pretty bad > performance hit (overflow means at least two disk reads to retrieve your > data, "badly" means at least 2 extra disk reads, and I've seen files where > that was thousands (this file isn't that bad, but 20% of your data is forcing > at least one extra disk read). Empty groups contribute to overhead on a > sequential search, so you'd want to consider how often you do a sequential > search on a file - usually, that's a pretty inefficient way to retrieve data, > but, again, your mileage may vary. > > To me, 20% is too much overflow, and 114 empty groups is trivial; much less > than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what > it looks like there. That'll give you an average of 6 records per group, not > unreasonably shallow, and it's likely to be a while before you have to resize > again. > > > From: cjausti...@hotmail.com > > To: u2-users@listserver.u2ug.org > > Date: Tue, 3 Jul 2012 15:23:23 -0500 > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > I guess what I need to know is what's an acceptable % of overflow for a > > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using > > the calculated min modulus) > > I'm still left with ~ 20% of overflow (see below). Is 20% overflow > > considered acceptable on average or should I keep tinkering with it to > > reach a lower overflow %? > > > > Correct me if I'm wrong but it seems the goal here is to REDUCE the > > overflow % while not creating too many modulus (groups). > > > > Chris > > > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 105715 current ( minimum 103889, 114 empty, > > 21092 overflowed, 1452 badly ) > > Number of records .. 1290469 > > Large record size .. 3267 bytes > > Number of large records 180 > > Group size . 4096 bytes > > Load factors ... 90% (split), 50% (merge) and 70% (actual) > > Total size ......... 522260480 bytes > > Total size of record data .. 287400239 bytes > > Total size of record IDs ... 21508521 bytes > > Unused space ...
Re: [U2] RESIZE - dynamic files
Dan, I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my Actual Load has really gone down (as well as overflow). See below for the results: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 23 current ( minimum 23, 5263 empty, 3957 overflowed, 207 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 37% (actual) Total size . 836235264 bytes Total size of record data .. 287394719 bytes Total size of record IDs ... 21508521 bytes Unused space ... 527323832 bytes Total space for records 836227072 bytes My overflow is now @ 2% My Load is @ 37% (actual) granted my empty groups are now up to almost 3% but I hope that won't be a big factor. How does this look? Chris > From: dangf...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 16:57:34 -0400 > Subject: Re: [U2] RESIZE - dynamic files > > > One rule of thumb is to make sure that you have an average of 10 or less > items in each group. Going by that, you'd want a minimum mod of 130k or more. > I've also noticed that files approach the "sweet spot" for minimizing > overflow without having excessive empty groups when the total size is pretty > nearly twice the data size. > > The goal can vary according to your situation. I'm personally not all that > afraid of making the modulus a little too large, as overflow is a pretty bad > performance hit (overflow means at least two disk reads to retrieve your > data, "badly" means at least 2 extra disk reads, and I've seen files where > that was thousands (this file isn't that bad, but 20% of your data is forcing > at least one extra disk read). Empty groups contribute to overhead on a > sequential search, so you'd want to consider how often you do a sequential > search on a file - usually, that's a pretty inefficient way to retrieve data, > but, again, your mileage may vary. > > To me, 20% is too much overflow, and 114 empty groups is trivial; much less > than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what > it looks like there. That'll give you an average of 6 records per group, not > unreasonably shallow, and it's likely to be a while before you have to resize > again. > > > From: cjausti...@hotmail.com > > To: u2-users@listserver.u2ug.org > > Date: Tue, 3 Jul 2012 15:23:23 -0500 > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > I guess what I need to know is what's an acceptable % of overflow for a > > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using > > the calculated min modulus) > > I'm still left with ~ 20% of overflow (see below). Is 20% overflow > > considered acceptable on average or should I keep tinkering with it to > > reach a lower overflow %? > > > > Correct me if I'm wrong but it seems the goal here is to REDUCE the > > overflow % while not creating too many modulus (groups). > > > > Chris > > > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 105715 current ( minimum 103889, 114 empty, > > 21092 overflowed, 1452 badly ) > > Number of records .. 1290469 > > Large record size .. 3267 bytes > > Number of large records 180 > > Group size . 4096 bytes > > Load factors ... 90% (split), 50% (merge) and 70% (actual) > > Total size ..... 522260480 bytes > > Total size of record data .. 287400239 bytes > > Total size of record IDs ... 21508521 bytes > > Unused space ... 213343528 bytes > > Total space for records 522252288 bytes > > > > > From: r...@lynden.com > > > To: u2-users@listserver.u2ug.org > > > Date: Tue, 3 Jul 2012 13:10:43 -0700 > > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > The split load is not affecting anything here, since it is more than the > >
Re: [U2] RESIZE - dynamic files
One rule of thumb is to make sure that you have an average of 10 or less items in each group. Going by that, you'd want a minimum mod of 130k or more. I've also noticed that files approach the "sweet spot" for minimizing overflow without having excessive empty groups when the total size is pretty nearly twice the data size. The goal can vary according to your situation. I'm personally not all that afraid of making the modulus a little too large, as overflow is a pretty bad performance hit (overflow means at least two disk reads to retrieve your data, "badly" means at least 2 extra disk reads, and I've seen files where that was thousands (this file isn't that bad, but 20% of your data is forcing at least one extra disk read). Empty groups contribute to overhead on a sequential search, so you'd want to consider how often you do a sequential search on a file - usually, that's a pretty inefficient way to retrieve data, but, again, your mileage may vary. To me, 20% is too much overflow, and 114 empty groups is trivial; much less than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what it looks like there. That'll give you an average of 6 records per group, not unreasonably shallow, and it's likely to be a while before you have to resize again. > From: cjausti...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 15:23:23 -0500 > Subject: Re: [U2] RESIZE - dynamic files > > > I guess what I need to know is what's an acceptable % of overflow for a > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using > the calculated min modulus) > I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered > acceptable on average or should I keep tinkering with it to reach a lower > overflow %? > > Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow > % while not creating too many modulus (groups). > > Chris > > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 105715 current ( minimum 103889, 114 empty, > 21092 overflowed, 1452 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 90% (split), 50% (merge) and 70% (actual) > Total size . 522260480 bytes > Total size of record data .. 287400239 bytes > Total size of record IDs ... 21508521 bytes > Unused space ... 213343528 bytes > Total space for records 522252288 bytes > > > From: r...@lynden.com > > To: u2-users@listserver.u2ug.org > > Date: Tue, 3 Jul 2012 13:10:43 -0700 > > Subject: Re: [U2] RESIZE - dynamic files > > > > The split load is not affecting anything here, since it is more than the > > actual load. What your overflow suggests is that you lower the split.load > > value to 70$% or below. You could go ahead and set the merge.load to an > > arbitrarily low number ("1"), and it will probably never do a merge, which > > would be the same as specifying a minimum.modulus equal to "as large as it > > ever gets". The exception to this is during file creation & clear.file, > > when the minimum.modulus value determines the initial disk allocation. > > Short of going to an arbitrarily large minimum.modulus, and a very low > > split.load, you are going to have some overflow (unless you have sequential > > keys & like sized records). > > > > -Rick > > > > -Original Message- > > From: u2-users-boun...@listserver.u2ug.org > > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > > Sent: Tuesday, July 03, 2012 12:54 PM > > To: u2-users@listserver.u2ug.org > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > Using the formula below, and changing the split to 90% I get the following: > > > > File name .. GENACCTRN_POSTED > > Pathname ... GENACCTRN_POSTED > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 103889 current ( minimum 103889, 114 empty, > > 22249 overflowed, 1764 badly ) > > Number of records ...
Re: [U2] RESIZE - dynamic files
The actual load is 70% on your file. The split.load of 90 was set after the file was loaded. If you leave it at that value, and add another 100,000 records, your modulus will not grow, but the number of overflowed groups will. Perhaps you need to look at is as "80% not overflowed". Despite the output, I doubt that any of those overflows are that bad. -Rick On Jul 3, 2012, at 1:23 PM, "Chris Austin" wrote: > > I guess what I need to know is what's an acceptable % of overflow for a > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using > the calculated min modulus) > I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered > acceptable on average or should I keep tinkering with it to reach a lower > overflow %? > > Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow > % while not creating too many modulus (groups). > > Chris > > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 105715 current ( minimum 103889, 114 empty, >21092 overflowed, 1452 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 90% (split), 50% (merge) and 70% (actual) > Total size . 522260480 bytes > Total size of record data .. 287400239 bytes > Total size of record IDs ... 21508521 bytes > Unused space ... 213343528 bytes > Total space for records 522252288 bytes > >> From: r...@lynden.com >> To: u2-users@listserver.u2ug.org >> Date: Tue, 3 Jul 2012 13:10:43 -0700 >> Subject: Re: [U2] RESIZE - dynamic files >> >> The split load is not affecting anything here, since it is more than the >> actual load. What your overflow suggests is that you lower the split.load >> value to 70$% or below. You could go ahead and set the merge.load to an >> arbitrarily low number ("1"), and it will probably never do a merge, which >> would be the same as specifying a minimum.modulus equal to "as large as it >> ever gets". The exception to this is during file creation & clear.file, >> when the minimum.modulus value determines the initial disk allocation. >> Short of going to an arbitrarily large minimum.modulus, and a very low >> split.load, you are going to have some overflow (unless you have sequential >> keys & like sized records). >> >> -Rick >> >> -Original Message- >> From: u2-users-boun...@listserver.u2ug.org >> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin >> Sent: Tuesday, July 03, 2012 12:54 PM >> To: u2-users@listserver.u2ug.org >> Subject: Re: [U2] RESIZE - dynamic files >> >> >> Using the formula below, and changing the split to 90% I get the following: >> >> File name .. GENACCTRN_POSTED >> Pathname ... GENACCTRN_POSTED >> File type .. DYNAMIC >> File style and revision 32BIT Revision 12 >> Hashing Algorithm .. GENERAL >> No. of groups (modulus) 103889 current ( minimum 103889, 114 empty, >>22249 overflowed, 1764 badly ) >> Number of records .. 1290469 >> Large record size .. 3267 bytes >> Number of large records 180 >> Group size . 4096 bytes >> Load factors ... 90% (split), 50% (merge) and 72% (actual) >> Total size . 519921664 bytes >> Total size of record data .. 287400591 bytes >> Total size of record IDs ... 21508497 bytes >> Unused space ... 211004384 bytes >> Total space for records 519913472 bytes >> >> How does this look in terms of performance? >> >> My Actual load went down 8% as well as some overflow but it looks like my >> load % is still high at 72% I'm wondering if I should raise the >> MINIMUM.MODULUS even more >> since I still have a decent amount of overflow and not many large records. >> >> Chris >> >> >>> From: r...@lynden.com >>> To: u2-users@listserver.u2ug.org >>> Date: Tue, 3 Jul 2012 10:21:16 -0700 >>> Subject: Re: [U2] RESIZE - dynamic files >>> >>>
Re: [U2] RESIZE - dynamic files
I guess what I need to know is what's an acceptable % of overflow for a dynamic file? For example, when I change the SPLIT LOAD to 90% (while using the calculated min modulus) I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered acceptable on average or should I keep tinkering with it to reach a lower overflow %? Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow % while not creating too many modulus (groups). Chris File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 105715 current ( minimum 103889, 114 empty, 21092 overflowed, 1452 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 70% (actual) Total size . 522260480 bytes Total size of record data .. 287400239 bytes Total size of record IDs ... 21508521 bytes Unused space ... 213343528 bytes Total space for records 522252288 bytes > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 13:10:43 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > The split load is not affecting anything here, since it is more than the > actual load. What your overflow suggests is that you lower the split.load > value to 70$% or below. You could go ahead and set the merge.load to an > arbitrarily low number ("1"), and it will probably never do a merge, which > would be the same as specifying a minimum.modulus equal to "as large as it > ever gets". The exception to this is during file creation & clear.file, > when the minimum.modulus value determines the initial disk allocation. Short > of going to an arbitrarily large minimum.modulus, and a very low split.load, > you are going to have some overflow (unless you have sequential keys & like > sized records). > > -Rick > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 12:54 PM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > Using the formula below, and changing the split to 90% I get the following: > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 103889 current ( minimum 103889, 114 empty, > 22249 overflowed, 1764 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 90% (split), 50% (merge) and 72% (actual) > Total size . 519921664 bytes > Total size of record data .. 287400591 bytes > Total size of record IDs ... 21508497 bytes > Unused space ... 211004384 bytes > Total space for records 519913472 bytes > > How does this look in terms of performance? > > My Actual load went down 8% as well as some overflow but it looks like my > load % is still high at 72% I'm wondering if I should raise the > MINIMUM.MODULUS even more > since I still have a decent amount of overflow and not many large records. > > Chris > > > > From: r...@lynden.com > > To: u2-users@listserver.u2ug.org > > Date: Tue, 3 Jul 2012 10:21:16 -0700 > > Subject: Re: [U2] RESIZE - dynamic files > > > > (record + id / 4096 or 2048) > > > > You need to factor in overhead & the split factor: (records + ids) * 1.1 > > * 1.25 / 4096(for 80%) > > > > If you use a 20% merge factor and a 80% split factor, the file will start > > merging unless you delete 60 percent of your records. If you use 90% split > > factor, you will have more overflowed groups. These numbers refer to the > > total amount of data in the file, not to any individual group. > > > > For records of the size that you have, I do not see any advantage to using > > a larger, 4096, group size. You will end up with twice the number of > > records per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access. > > > > -Rick > > > &
Re: [U2] RESIZE - dynamic files
The split load is not affecting anything here, since it is more than the actual load. What your overflow suggests is that you lower the split.load value to 70$% or below. You could go ahead and set the merge.load to an arbitrarily low number ("1"), and it will probably never do a merge, which would be the same as specifying a minimum.modulus equal to "as large as it ever gets". The exception to this is during file creation & clear.file, when the minimum.modulus value determines the initial disk allocation. Short of going to an arbitrarily large minimum.modulus, and a very low split.load, you are going to have some overflow (unless you have sequential keys & like sized records). -Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 12:54 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files Using the formula below, and changing the split to 90% I get the following: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 103889 current ( minimum 103889, 114 empty, 22249 overflowed, 1764 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 72% (actual) Total size . 519921664 bytes Total size of record data .. 287400591 bytes Total size of record IDs ... 21508497 bytes Unused space ... 211004384 bytes Total space for records 519913472 bytes How does this look in terms of performance? My Actual load went down 8% as well as some overflow but it looks like my load % is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even more since I still have a decent amount of overflow and not many large records. Chris > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 10:21:16 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > (record + id / 4096 or 2048) > > You need to factor in overhead & the split factor: (records + ids) * 1.1 * > 1.25 / 4096(for 80%) > > If you use a 20% merge factor and a 80% split factor, the file will start > merging unless you delete 60 percent of your records. If you use 90% split > factor, you will have more overflowed groups. These numbers refer to the > total amount of data in the file, not to any individual group. > > For records of the size that you have, I do not see any advantage to using a > larger, 4096, group size. You will end up with twice the number of records > per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access. > > -Rick > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 9:48 AM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 92776 current ( minimum 31, 89 empty, > 28229 overflowed, 2485 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 500600832 bytes > Total size of record data .. 287035391 bytes > Total size of record IDs ... 21508449 bytes > Unused space ... 192048800 bytes > Total space for records 500592640 bytes > > Using the record above, how would I calculate the following? > > 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the > current number)? > 2) SPLIT - would 90% seem about right? > 3) MERGE - would 20% seem about right? > 4) Large Record Size - does 3276 seem right? > 5) Group Size - should I be using 4096? > > I'm just a bit confused as to how to set these, I saw the formula to > calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I > always get a lower number > than my current modulus.. > > I also saw where it said to simply take your current modulus #
Re: [U2] RESIZE - dynamic files
I would recommend that if you intend to do resizing on a regular basis an you want to improve the performance of the file you might consider resizing the file to a static file type so that you can have more control over the hashing algorithm, separation and modulo. Chris Austin wrote: Using the formula below, and changing the split to 90% I get the following: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 103889 current ( minimum 103889, 114 empty, 22249 overflowed, 1764 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 72% (actual) Total size . 519921664 bytes Total size of record data .. 287400591 bytes Total size of record IDs ... 21508497 bytes Unused space ... 211004384 bytes Total space for records 519913472 bytes How does this look in terms of performance? My Actual load went down 8% as well as some overflow but it looks like my load % is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even more since I still have a decent amount of overflow and not many large records. Chris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Tue, 3 Jul 2012 10:21:16 -0700 Subject: Re: [U2] RESIZE - dynamic files (record + id / 4096 or 2048) You need to factor in overhead & the split factor: (records + ids) * 1.1 * 1.25 / 4096(for 80%) If you use a 20% merge factor and a 80% split factor, the file will start merging unless you delete 60 percent of your records. If you use 90% split factor, you will have more overflowed groups. These numbers refer to the total amount of data in the file, not to any individual group. For records of the size that you have, I do not see any advantage to using a larger, 4096, group size. You will end up with twice the number of records per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access. -Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 9:48 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 92776 current ( minimum 31, 89 empty, 28229 overflowed, 2485 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 500600832 bytes Total size of record data .. 287035391 bytes Total size of record IDs ... 21508449 bytes Unused space ... 192048800 bytes Total space for records 500592640 bytes Using the record above, how would I calculate the following? 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the current number)? 2) SPLIT - would 90% seem about right? 3) MERGE - would 20% seem about right? 4) Large Record Size - does 3276 seem right? 5) Group Size - should I be using 4096? I'm just a bit confused as to how to set these, I saw the formula to calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a lower number than my current modulus.. I also saw where it said to simply take your current modulus # and add 10-20% and set the MINIMUM.MODULUS based on that.. Based on the table above I'm just trying to get an idea of what these should be set at. Thanks, Chris From: cjausti...@hotmail.com To: u2-users@listserver.u2ug.org Date: Tue, 3 Jul 2012 10:28:17 -0500 Subject: Re: [U2] RESIZE - dynamic files Doug, When I do the math I come up with a different # (see below): File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 82850 current ( minimum 24, 104 empty, 26225 overflowed, 1441 badly ) Number of records .. 1157122 Large record size .. 2036 bytes Number of large records 576 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 44
Re: [U2] RESIZE - dynamic files
Using the formula below, and changing the split to 90% I get the following: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 103889 current ( minimum 103889, 114 empty, 22249 overflowed, 1764 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 90% (split), 50% (merge) and 72% (actual) Total size . 519921664 bytes Total size of record data .. 287400591 bytes Total size of record IDs ... 21508497 bytes Unused space ... 211004384 bytes Total space for records 519913472 bytes How does this look in terms of performance? My Actual load went down 8% as well as some overflow but it looks like my load % is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even more since I still have a decent amount of overflow and not many large records. Chris > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 10:21:16 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > (record + id / 4096 or 2048) > > You need to factor in overhead & the split factor: (records + ids) * 1.1 * > 1.25 / 4096(for 80%) > > If you use a 20% merge factor and a 80% split factor, the file will start > merging unless you delete 60 percent of your records. If you use 90% split > factor, you will have more overflowed groups. These numbers refer to the > total amount of data in the file, not to any individual group. > > For records of the size that you have, I do not see any advantage to using a > larger, 4096, group size. You will end up with twice the number of records > per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access. > > -Rick > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 9:48 AM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 92776 current ( minimum 31, 89 empty, > 28229 overflowed, 2485 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 500600832 bytes > Total size of record data .. 287035391 bytes > Total size of record IDs ... 21508449 bytes > Unused space ... 192048800 bytes > Total space for records 500592640 bytes > > Using the record above, how would I calculate the following? > > 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the > current number)? > 2) SPLIT - would 90% seem about right? > 3) MERGE - would 20% seem about right? > 4) Large Record Size - does 3276 seem right? > 5) Group Size - should I be using 4096? > > I'm just a bit confused as to how to set these, I saw the formula to > calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I > always get a lower number > than my current modulus.. > > I also saw where it said to simply take your current modulus # and add 10-20% > and set the MINIMUM.MODULUS based on that.. > > Based on the table above I'm just trying to get an idea of what these should > be set at. > > Thanks, > > Chris > > > > From: cjausti...@hotmail.com > > To: u2-users@listserver.u2ug.org > > Date: Tue, 3 Jul 2012 10:28:17 -0500 > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > Doug, > > > > When I do the math I come up with a different # (see below): > > > > File name .. TEST_FILE > > Pathname ... TEST_FILE > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 82850 current ( minimum 24, 104 empty, > > 26225 overflowed, 1441 badly ) > > Number of records .
Re: [U2] RESIZE - dynamic files
Doug, The data is growing over time with this file. Does that mean I should ignore the formula? Or should I still use a lower MINIMUM.MODULO than the actual modulo #.. Is the idea to reduce overflow by lowering the split? What is this 'overflow' referring to? > > 2) SPLIT - would 90% seem about right? > > > Depends on the history of the file. Is the data growing over time? The > way the file looks now the split should be reduced because you have 31% in > overflow. So basically don't spend much time worrying about large record size? > 4) Large Record Size - does 3276 seem right? > > > Can be calculated with a lot of effort, but yield little gain. This seems like a moot point as well, as long as the ratio in regards to the MINIMUM.MODULO is set proportionally? > 5) Group Size - should I be using 4096? > > > You have two group sizes on dynamic files 2048 and 4096. If you lower it > you need to double your modulo, roughly. If you keep it the same you need > to increase your modulo because 31% of your file is in overflow. Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
So for this example what would be a good SPLIT level and what would be a good MERGE level to use? It was my understanding that I wanted to lower my merge to something below 50% and increase the split to reduce splitting. Chris > From: r...@lynden.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 10:21:16 -0700 > Subject: Re: [U2] RESIZE - dynamic files > > (record + id / 4096 or 2048) > > You need to factor in overhead & the split factor: (records + ids) * 1.1 * > 1.25 / 4096(for 80%) > > If you use a 20% merge factor and a 80% split factor, the file will start > merging unless you delete 60 percent of your records. If you use 90% split > factor, you will have more overflowed groups. These numbers refer to the > total amount of data in the file, not to any individual group. > > For records of the size that you have, I do not see any advantage to using a > larger, 4096, group size. You will end up with twice the number of records > per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access. > > -Rick > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: Tuesday, July 03, 2012 9:48 AM > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > > File name .. GENACCTRN_POSTED > Pathname ... GENACCTRN_POSTED > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 92776 current ( minimum 31, 89 empty, > 28229 overflowed, 2485 badly ) > Number of records .. 1290469 > Large record size .. 3267 bytes > Number of large records 180 > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 500600832 bytes > Total size of record data .. 287035391 bytes > Total size of record IDs ... 21508449 bytes > Unused space ... 192048800 bytes > Total space for records 500592640 bytes > > Using the record above, how would I calculate the following? > > 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the > current number)? > 2) SPLIT - would 90% seem about right? > 3) MERGE - would 20% seem about right? > 4) Large Record Size - does 3276 seem right? > 5) Group Size - should I be using 4096? > > I'm just a bit confused as to how to set these, I saw the formula to > calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I > always get a lower number > than my current modulus.. > > I also saw where it said to simply take your current modulus # and add 10-20% > and set the MINIMUM.MODULUS based on that.. > > Based on the table above I'm just trying to get an idea of what these should > be set at. > > Thanks, > > Chris > > > > From: cjausti...@hotmail.com > > To: u2-users@listserver.u2ug.org > > Date: Tue, 3 Jul 2012 10:28:17 -0500 > > Subject: Re: [U2] RESIZE - dynamic files > > > > > > Doug, > > > > When I do the math I come up with a different # (see below): > > > > File name .. TEST_FILE > > Pathname ... TEST_FILE > > File type .. DYNAMIC > > File style and revision 32BIT Revision 12 > > Hashing Algorithm .. GENERAL > > No. of groups (modulus) 82850 current ( minimum 24, 104 empty, > > 26225 overflowed, 1441 badly ) > > Number of records .. 1157122 > > Large record size .. 2036 bytes > > Number of large records 576 > > Group size . 4096 bytes > > Load factors ... 80% (split), 50% (merge) and 80% (actual) > > Total size . 449605632 bytes > > Total size of record data .. 258687736 bytes > > Total size of record IDs ... 19283300 bytes > > Unused space ... 171626404 bytes > > Total space for records 449597440 bytes > > > > -- > > 258,687,736 bytes - Total size of record data > > 19,283,300 bytes - Total size of record IDs > > === > > 277,971,036 bytes (record + id's) > > > > 277,971,036 / 4,084 = 68,063 bytes (minimum modulus) > > -- > > > > 68,063 is less than the current modulus of 82,850. Something with this > > formula doesn't seem right because if I use that
Re: [U2] RESIZE - dynamic files
(record + id / 4096 or 2048) You need to factor in overhead & the split factor: (records + ids) * 1.1 * 1.25 / 4096(for 80%) If you use a 20% merge factor and a 80% split factor, the file will start merging unless you delete 60 percent of your records. If you use 90% split factor, you will have more overflowed groups. These numbers refer to the total amount of data in the file, not to any individual group. For records of the size that you have, I do not see any advantage to using a larger, 4096, group size. You will end up with twice the number of records per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access. -Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Tuesday, July 03, 2012 9:48 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 92776 current ( minimum 31, 89 empty, 28229 overflowed, 2485 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 500600832 bytes Total size of record data .. 287035391 bytes Total size of record IDs ... 21508449 bytes Unused space ... 192048800 bytes Total space for records 500592640 bytes Using the record above, how would I calculate the following? 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the current number)? 2) SPLIT - would 90% seem about right? 3) MERGE - would 20% seem about right? 4) Large Record Size - does 3276 seem right? 5) Group Size - should I be using 4096? I'm just a bit confused as to how to set these, I saw the formula to calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a lower number than my current modulus.. I also saw where it said to simply take your current modulus # and add 10-20% and set the MINIMUM.MODULUS based on that.. Based on the table above I'm just trying to get an idea of what these should be set at. Thanks, Chris > From: cjausti...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 10:28:17 -0500 > Subject: Re: [U2] RESIZE - dynamic files > > > Doug, > > When I do the math I come up with a different # (see below): > > File name .. TEST_FILE > Pathname ... TEST_FILE > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 82850 current ( minimum 24, 104 empty, > 26225 overflowed, 1441 badly ) > Number of records .. 1157122 > Large record size .. 2036 bytes > Number of large records 576 > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 449605632 bytes > Total size of record data .. 258687736 bytes > Total size of record IDs ... 19283300 bytes > Unused space ... 171626404 bytes > Total space for records 449597440 bytes > > -- > 258,687,736 bytes - Total size of record data > 19,283,300 bytes - Total size of record IDs > === > 277,971,036 bytes (record + id's) > > 277,971,036 / 4,084 = 68,063 bytes (minimum modulus) > -- > > 68,063 is less than the current modulus of 82,850. Something with this > formula doesn't seem right because if I use that formula I always calculate a > minimum modulus of less than the current modulus. > > Thanks, > > Chris > > > > > Date: Mon, 2 Jul 2012 16:08:16 -0600 > > From: dave...@gmail.com > > To: u2-users@listserver.u2ug.org > > Subject: Re: [U2] RESIZE - dynamic files > > > > Hi Chris: > > > > You cannot get away with not resizing dynamic files in my experience. The > > files do not split and merge like we are led to believe. The separator is > > not used on dynamic files. Your Universe file is badly sized. The math > > below will get you reasonably file size. > > > > Let's do the math: > > > > 258687736 (Record Size) > > 192283300 (Key Size) > > > > 450,971,036 (Data and Key Size) > > > > 4096 (Group Size) > > - 12 (32 Bit Overhe
Re: [U2] RESIZE - dynamic files
See comment interspersed... Using the record above, how would I calculate the following? > > 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the > current number)? > Should be less the the current size, if you want the file to merge > 2) SPLIT - would 90% seem about right? > Depends on the history of the file. Is the data growing over time? The way the file looks now the split should be reduced because you have 31% in overflow. 3) MERGE - would 20% seem about right? > Won't be used on a growth file, so the file history is important. 4) Large Record Size - does 3276 seem right? > Can be calculated with a lot of effort, but yield little gain. 5) Group Size - should I be using 4096? > You have two group sizes on dynamic files 2048 and 4096. If you lower it you need to double your modulo, roughly. If you keep it the same you need to increase your modulo because 31% of your file is in overflow. > > ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
No worries Doug. I'm just wondering if the calculation makes sense (if we use the example below): File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 92776 current ( minimum 31, 89 empty, 28229 overflowed, 2485 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 500600832 bytes Total size of record data .. 287035391 bytes Total size of record IDs ... 21508449 bytes Unused space ... 192048800 bytes Total space for records 500592640 bytes FORMULA -> (287,035,391+21,508,449) / (4,084) = 75,549 MINIMUM.MODULUS The question I have is whether 75,549 makes sense for this record. I thought the MINIMUM.MODULUS was supposed to be bigger than the number of groups (92,776 in this case)? Chris > Date: Tue, 3 Jul 2012 11:04:53 -0600 > From: dave...@gmail.com > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > Yep, I added an extra 2 in the key value. Oh, the perils of cut and > paste... > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Yep, I added an extra 2 in the key value. Oh, the perils of cut and paste... ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 92776 current ( minimum 31, 89 empty, 28229 overflowed, 2485 badly ) Number of records .. 1290469 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 500600832 bytes Total size of record data .. 287035391 bytes Total size of record IDs ... 21508449 bytes Unused space ... 192048800 bytes Total space for records 500592640 bytes Using the record above, how would I calculate the following? 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the current number)? 2) SPLIT - would 90% seem about right? 3) MERGE - would 20% seem about right? 4) Large Record Size - does 3276 seem right? 5) Group Size - should I be using 4096? I'm just a bit confused as to how to set these, I saw the formula to calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a lower number than my current modulus.. I also saw where it said to simply take your current modulus # and add 10-20% and set the MINIMUM.MODULUS based on that.. Based on the table above I'm just trying to get an idea of what these should be set at. Thanks, Chris > From: cjausti...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Tue, 3 Jul 2012 10:28:17 -0500 > Subject: Re: [U2] RESIZE - dynamic files > > > Doug, > > When I do the math I come up with a different # (see below): > > File name .. TEST_FILE > Pathname ... TEST_FILE > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 82850 current ( minimum 24, 104 empty, > 26225 overflowed, 1441 badly ) > Number of records .. 1157122 > Large record size .. 2036 bytes > Number of large records 576 > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 449605632 bytes > Total size of record data .. 258687736 bytes > Total size of record IDs ... 19283300 bytes > Unused space ... 171626404 bytes > Total space for records 449597440 bytes > > -- > 258,687,736 bytes - Total size of record data > 19,283,300 bytes - Total size of record IDs > === > 277,971,036 bytes (record + id's) > > 277,971,036 / 4,084 = 68,063 bytes (minimum modulus) > -- > > 68,063 is less than the current modulus of 82,850. Something with this > formula doesn't seem right because if I use that formula I always calculate a > minimum modulus of less than the current modulus. > > Thanks, > > Chris > > > > > Date: Mon, 2 Jul 2012 16:08:16 -0600 > > From: dave...@gmail.com > > To: u2-users@listserver.u2ug.org > > Subject: Re: [U2] RESIZE - dynamic files > > > > Hi Chris: > > > > You cannot get away with not resizing dynamic files in my experience. The > > files do not split and merge like we are led to believe. The separator is > > not used on dynamic files. Your Universe file is badly sized. The math > > below will get you reasonably file size. > > > > Let's do the math: > > > > 258687736 (Record Size) > > 192283300 (Key Size) > > > > 450,971,036 (Data and Key Size) > > > > 4096 (Group Size) > > - 12 (32 Bit Overhead) > > > > 4084 Usable Space > > > > 450971036/4084 = Minimum Modulo 110424 (Prime is 110431) > > > > > > [ad] > > I hate doing this math all of the time. I have a reasonably priced resize > > program called XLr8Resizer for $99.00 to do this for me. > > [/ad] > > > > Regards, > > Doug > > www.u2logic.com/tools.html > > "XLr8Resizer for the rest of us" > > ___ > > U2-Users mailing list > > U2-Users@listserver.u2ug.org > > http://listserver.u2ug.org/mailman/listinfo/u2-users > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Doug, When I do the math I come up with a different # (see below): File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 82850 current ( minimum 24, 104 empty, 26225 overflowed, 1441 badly ) Number of records .. 1157122 Large record size .. 2036 bytes Number of large records 576 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 449605632 bytes Total size of record data .. 258687736 bytes Total size of record IDs ... 19283300 bytes Unused space ... 171626404 bytes Total space for records 449597440 bytes -- 258,687,736 bytes - Total size of record data 19,283,300 bytes - Total size of record IDs === 277,971,036 bytes (record + id's) 277,971,036 / 4,084 = 68,063 bytes (minimum modulus) -- 68,063 is less than the current modulus of 82,850. Something with this formula doesn't seem right because if I use that formula I always calculate a minimum modulus of less than the current modulus. Thanks, Chris > Date: Mon, 2 Jul 2012 16:08:16 -0600 > From: dave...@gmail.com > To: u2-users@listserver.u2ug.org > Subject: Re: [U2] RESIZE - dynamic files > > Hi Chris: > > You cannot get away with not resizing dynamic files in my experience. The > files do not split and merge like we are led to believe. The separator is > not used on dynamic files. Your Universe file is badly sized. The math > below will get you reasonably file size. > > Let's do the math: > > 258687736 (Record Size) > 192283300 (Key Size) > > 450,971,036 (Data and Key Size) > > 4096 (Group Size) > - 12 (32 Bit Overhead) > > 4084 Usable Space > > 450971036/4084 = Minimum Modulo 110424 (Prime is 110431) > > > [ad] > I hate doing this math all of the time. I have a reasonably priced resize > program called XLr8Resizer for $99.00 to do this for me. > [/ad] > > Regards, > Doug > www.u2logic.com/tools.html > "XLr8Resizer for the rest of us" > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Almost. Though the file will look after itself, it may not do so very well. Dynamic files, for best performance, do sometimes need periodic resizing. Having said that it is true that some never resize Dynamic files. If the minimum modulo is much lower than the actual, then this will cause constant splits to occur if the file is constantly growing. The 80% actual load is further indication of this. What can be even worse is if the file then shrinks dramatically in this case as very intensive merges will takes place - not desirable if you expect the file to grow again. In this case I would choose a new modulo greater than the actual - how much bigger depends on the rate of growth expected. That is with the current separation - the best separation you will only determine by examining the size of the records. "Martin Phillips" wrote in message news:<00f601cd588c$cd3d1310$67b73930$@ladybridge.com>... > Hi Chris, > > The whole point of dynamic files is that you don't do RESIZE. The file will > look after itself, automatically responding to > variations in the volume of data. > > There are "knobs to twiddle" but in most cases they can safely be left at > their defaults. A dynamic file will never perform as well > as a perfectly tuned static file but they are a heck of a lot better than > typical static files that haven't been reconfigured for > ages. > > > Martin Phillips > Ladybridge Systems Ltd > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England > +44 (0)1604-709200 > > > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: 02 July 2012 20:22 > To: u2-users@listserver.u2ug.org > Subject: [U2] RESIZE - dynamic files > > > I was wondering if anyone had instructions on RESIZE with a dynamic file? For > example I have a file called 'TEST_FILE' > with the following: > > 01 ANALYZE.FILE TEST_FILE > File name .. TEST_FILE > Pathname ... TEST_FILE > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 83261 current ( minimum 31 ) > Large record size .. 3267 bytes > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 450613248 bytes > > How do you calculate what the modulus and separation should be? I can't use > HASH.HELP on a type 30 file to see the recommended > settings > so I was wondering how best you figure out the file RESIZE. > > Thanks, > > Chris > > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > This message contains information that may be privileged or confidential and is the property of GPM Development Ltd. It is intended only for the person to whom it is addressed. If you are not the intended recipient ,you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. This e-mail was sent to you by GPM Development Ltd. We are incorporated under the laws of England and Wales (company no. 2292156 and VAT registration no. 523 5622 63). Our registered office is 6th Floor, AMP House, Croydon, Surrey CR0 2LX. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Chris, I second the thought that, because of the splitting and merging of groups, it can be a waste of effort to overwork the "sizing" of a dynamic file. One problem with your TEST_FILE below is that the Large Record Size is spec'ed at less than 50% of the group size. Each record that is larger than the "large record size" is given at least one full sized buffer in the overflow file, so a record of 2037 bytes, in your example, would occupy 4096 bytes of space. The ID and pointer is left in the primary data group. It appears that your records average 250 bytes, so this probably is not a large factor, but that would also suggest that you stick to a GROUP.SIZE of 1 (2048 bytes) rather than 2. Btw, each of your 576 large records probably counts towards the "overflowed badly" column, though, from an access point of view, the group might be in optimal shape. The effective modulo of a dynamic file is based on the space used by the not(large records), but the " Total size of record data" includes the full buffer size of the overflow records, I believe, and so should not be used to compute the total size of your data. For record sizes like you have, I would compute the total of the ids+records, add about 10% for overhead, divide by the group size (2048, if you use the default), multiply by 1.25 (allow for the 80% splitting factor), and then set the minimum modulus to the next larger prime number. In the example below, you can see 50 large records in a single group of a dynamic file, but only the id's are in the primary buffer. If you do the math, you will find that each 1001 record is using up a 4096 byte overflow buffer. File name .. BIGD Pathname ... BIGD File type .. DYNAMIC Hashing Algorithm .. GENERAL No. of groups (modulus) 1 current ( minimum 1, 0 empty, 1 overflowed, 1 badly ) Number of records .. 50 Large record size .. 1000 bytes Number of large records 50 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 30% (actual) Total size . 217088 bytes Total size of record data .. 205466 bytes Total size of record IDs ... 534 bytes Unused space ... 2896 bytes Total space for records 208896 bytes LIST BIGD TOTAL EVAL "LEN(@ID)" TOTAL EVAL "LEN(@RECORD)" DET.SUP 18:03:29 07- 02-12 PAGE 1 LEN(@ID)..LEN(@RECORD) == 134 50050 50 records listed. Note that if I stuck to the defaults and used sequential ids, I would have saved more than 1/2 of the disk space, but still have used 150% of the total id+record size. File name .. BIGD Pathname ... BIGD File type .. DYNAMIC Hashing Algorithm .. GENERAL No. of groups (modulus) 31 current ( minimum 1, 3 empty, 4 overflowed, 0 badly ) Number of records .. 50 Large record size .. 1628 bytes Number of large records 0 Group size . 2048 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 79872 bytes Total size of record data .. 50709 bytes Total size of record IDs ... 91 bytes Unused space ... 24976 bytes Total space for records 75776 bytes Rick Nuckolls Lynden Inc -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Monday, July 02, 2012 2:07 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files The dynamic file I'm working with is below. What do 'overflowed' and 'badly' refer to under MODULUS? Is the goal of the RESIZE to eliminate that overflow? Any ideas what I should change to achieve this? File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 82850 current ( minimum 24, 104 empty, 26225 overflowed, 1441 badly ) Number of records .. 1157122 Large record size .. 2036 bytes Number of large records 576 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 449605632 bytes Total size of record data .. 258687736 bytes Total size of record IDs ... 19283300 bytes Unused space ... 171626404 bytes Total space for records 449597440 bytes Thanks, Chris > From: cjausti...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Mon, 2 Jul 2012 14:
Re: [U2] RESIZE - dynamic files
Hi Chris: You cannot get away with not resizing dynamic files in my experience. The files do not split and merge like we are led to believe. The separator is not used on dynamic files. Your Universe file is badly sized. The math below will get you reasonably file size. Let's do the math: 258687736 (Record Size) 192283300 (Key Size) 450,971,036 (Data and Key Size) 4096 (Group Size) - 12 (32 Bit Overhead) 4084 Usable Space 450971036/4084 = Minimum Modulo 110424 (Prime is 110431) [ad] I hate doing this math all of the time. I have a reasonably priced resize program called XLr8Resizer for $99.00 to do this for me. [/ad] Regards, Doug www.u2logic.com/tools.html "XLr8Resizer for the rest of us" ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
I guess my main question is regarding the 'overflow' and 'badly' #'s which you can see when you do an ANALYZE.FILE STATISTICS. Is the goal not to have any overflow #? And what is 'badly'? After playing around with RESIZE on this file, I was able to come up with the following: RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 24 82850 current ( minimum 24, 104 empty,26225 overflowed, 1441 badly ) RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 1000 82850 current ( minimum 1000, 104 empty,26225 overflowed, 1441 badly ) RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 99420 99420 current ( minimum 99420, 182 empty,18725 overflowed, 1054 badly ) RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 119304 119304 current ( minimum 119304, 247 empty, 9511 overflowed, 406 badly ) RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 143165 143165 current ( minimum 143165, 1328 empty,4333 overflowed, 259 badly ) RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 171799 171799 current ( minimum 171799, 3814 empty,3063 overflowed, 237 badly ) RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 223339 223339 current ( minimum 223339, 9215 empty,1810 overflowed, 222 badly ) As you can see as I increase my MINIMUM.MODULUS, my 'overflowed' and 'badly' #'s go down. Is this the goal when tuning a dynamic file? Chris > From: martinphill...@ladybridge.com > To: u2-users@listserver.u2ug.org > Date: Mon, 2 Jul 2012 20:56:40 +0100 > Subject: Re: [U2] RESIZE - dynamic files > > Hi Chris, > > The whole point of dynamic files is that you don't do RESIZE. The file will > look after itself, automatically responding to > variations in the volume of data. > > There are "knobs to twiddle" but in most cases they can safely be left at > their defaults. A dynamic file will never perform as well > as a perfectly tuned static file but they are a heck of a lot better than > typical static files that haven't been reconfigured for > ages. > > > Martin Phillips > Ladybridge Systems Ltd > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England > +44 (0)1604-709200 > > > > > -Original Message- > From: u2-users-boun...@listserver.u2ug.org > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin > Sent: 02 July 2012 20:22 > To: u2-users@listserver.u2ug.org > Subject: [U2] RESIZE - dynamic files > > > I was wondering if anyone had instructions on RESIZE with a dynamic file? For > example I have a file called 'TEST_FILE' > with the following: > > 01 ANALYZE.FILE TEST_FILE > File name .. TEST_FILE > Pathname ... TEST_FILE > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 83261 current ( minimum 31 ) > Large record size .. 3267 bytes > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 450613248 bytes > > How do you calculate what the modulus and separation should be? I can't use > HASH.HELP on a type 30 file to see the recommended > settings > so I was wondering how best you figure out the file RESIZE. > > Thanks, > > Chris > > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users > > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Group size appears adequate (although anytime anything hashes into the group(s) with the largest record [3267b], you'll split: 3267 is 79.8% of 4096, so if you have a lot of records up in the 3K range, you may want to increase group size and decrease min modulus accordingly), but the minimum modulus should be a prime north of the current modulus, with a padding factor based on growth expectations. The sweet spot is where you have enough data in each group to avoid merging (I'd argue that 50% is a bit high for the merge; but that's because I'm unafraid of unused space, while I'm averse to file maintenance overhead), but not so much that you do a lot of splitting. You should do a count on the number of records, too. It almost never makes sense to have the modulus exceed the number of records by a substantial percentage. So, you should increase minimum modulus to 83267 or higher, unless you double the group size to 8K, in which case something around 50K as a modulus sounds good. I'd take the merge down a little, to maybe 30% or even less, and maybe knock the split up a bit - say, 90% - to cut down on the splitting. > From: cjausti...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Mon, 2 Jul 2012 14:55:21 -0500 > Subject: [U2] RESIZE - dynamic files > > > I was wondering if anyone had instructions on RESIZE with a dynamic file? For > example I have a file called 'TEST_FILE' > with the following: > > 01 ANALYZE.FILE TEST_FILE > File name .. TEST_FILE > Pathname ... TEST_FILE > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 83261 current ( minimum 31 ) > Large record size .. 3267 bytes > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 450613248 bytes > > How > do you calculate what the modulus and separation should be? I can't use > HASH.HELP on a type 30 file to see the recommended settings > so I was wondering how best you figure out the file RESIZE. > > Thanks, > > Chris > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
The dynamic file I'm working with is below. What do 'overflowed' and 'badly' refer to under MODULUS? Is the goal of the RESIZE to eliminate that overflow? Any ideas what I should change to achieve this? File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 82850 current ( minimum 24, 104 empty, 26225 overflowed, 1441 badly ) Number of records .. 1157122 Large record size .. 2036 bytes Number of large records 576 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 449605632 bytes Total size of record data .. 258687736 bytes Total size of record IDs ... 19283300 bytes Unused space ... 171626404 bytes Total space for records 449597440 bytes Thanks, Chris > From: cjausti...@hotmail.com > To: u2-users@listserver.u2ug.org > Date: Mon, 2 Jul 2012 14:55:21 -0500 > Subject: [U2] RESIZE - dynamic files > > > I was wondering if anyone had instructions on RESIZE with a dynamic file? For > example I have a file called 'TEST_FILE' > with the following: > > 01 ANALYZE.FILE TEST_FILE > File name .. TEST_FILE > Pathname ... TEST_FILE > File type .. DYNAMIC > File style and revision 32BIT Revision 12 > Hashing Algorithm .. GENERAL > No. of groups (modulus) 83261 current ( minimum 31 ) > Large record size .. 3267 bytes > Group size . 4096 bytes > Load factors ... 80% (split), 50% (merge) and 80% (actual) > Total size . 450613248 bytes > > How > do you calculate what the modulus and separation should be? I can't use > HASH.HELP on a type 30 file to see the recommended settings > so I was wondering how best you figure out the file RESIZE. > > Thanks, > > Chris > ___ > U2-Users mailing list > U2-Users@listserver.u2ug.org > http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Hi Chris, The whole point of dynamic files is that you don't do RESIZE. The file will look after itself, automatically responding to variations in the volume of data. There are "knobs to twiddle" but in most cases they can safely be left at their defaults. A dynamic file will never perform as well as a perfectly tuned static file but they are a heck of a lot better than typical static files that haven't been reconfigured for ages. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 02 July 2012 20:22 To: u2-users@listserver.u2ug.org Subject: [U2] RESIZE - dynamic files I was wondering if anyone had instructions on RESIZE with a dynamic file? For example I have a file called 'TEST_FILE' with the following: 01 ANALYZE.FILE TEST_FILE File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 83261 current ( minimum 31 ) Large record size .. 3267 bytes Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 450613248 bytes How do you calculate what the modulus and separation should be? I can't use HASH.HELP on a type 30 file to see the recommended settings so I was wondering how best you figure out the file RESIZE. Thanks, Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
[U2] RESIZE - dynamic files
I was wondering if anyone had instructions on RESIZE with a dynamic file? For example I have a file called 'TEST_FILE' with the following: 01 ANALYZE.FILE TEST_FILE File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 83261 current ( minimum 31 ) Large record size .. 3267 bytes Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 450613248 bytes How do you calculate what the modulus and separation should be? I can't use HASH.HELP on a type 30 file to see the recommended settings so I was wondering how best you figure out the file RESIZE. Thanks, Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
[U2] RESIZE - dynamic files
I was wondering if anyone had instructions on RESIZE with a dynamic file? For example I have a file called 'TEST_FILE' with the following: 01 ANALYZE.FILE TEST_FILE File name .. TEST_FILE Pathname ... TEST_FILE File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 83261 current ( minimum 31 ) Large record size .. 3267 bytes Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 450613248 bytes How do you calculate what the modulus and separation should be? I can't use HASH.HELP on a type 30 file to see the recommended settings so I was wondering how best you figure out the file RESIZE. Thanks, Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE DYNAMIC FILES
[EMAIL PROTECTED] wrote on 06/12/2006 12:57:03 PM: > What does the guide -r option do ? > > We have been using the -a option. The -r option sends guide output to a hashed file. This makes it very easy to select for files that are undersized, or that have corruption. So I'll often do a CREATE.FILE DATA UDT_GUIDE 101, then edit the VOC entry of UDT_GUIDE so attribute 3 points to @UDTHOME/sys/D_UDT_GUIDE. Then I can do something like this from ECL: !guide /some_dir/some_file -na -ne -ns -r UDT_GUIDE This will create a record in UDT_GUIDE keyed as /some_dir/some_file. With that information for all of your files, you can do something like this: list UDT_GUIDE WITH STATUS LIKE ...2 (to find files with level 2 overflow) list UDT_GUIDE WITH STATUS LIKE Err... (to file files with corruption) list UDT_GUIDE MAXSIZ AVGSIZ DEVSIZ COUNT AVGKEY (to get the info for the dynamic file sizing calculations) It's SO much easier than writing code to parse through the text output of guide. Tim Snyder Consulting I/T Specialist , U2 Professional Services North American Lab Services DB2 Information Management, IBM Software Group 717-545-6403 [EMAIL PROTECTED] --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
RE: [U2] RESIZE DYNAMIC FILES
We have used the product here before. I think our license on it lapsed. I have been using the guide for several years instead of using fast. "Hennessey, Mark F." <[EMAIL PROTECTED]> wrote: Should I put [AD] in the subject line for an "unsolicited testimonial"? :) The best advice I can give you is to buy a product called FAST: http://www.fitzlong.com/ A great tool for analyzing and resizing files, be they dynamic or standard hashed files. Excellent support from excellent people at a great price. There might be more expensive utilities out there, but I can't imagine that there is anything better. Mark Hennessey (a FAST Customer since 2002) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Dave S Sent: Monday, June 12, 2006 10:25 AM To: u2-users@listserver.u2ug.org Subject: [U2] RESIZE DYNAMIC FILES Does anyone have any tech tips on how to select parameters when resizing dynamic files ? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
Re: [U2] RESIZE DYNAMIC FILES
What does the guide -r option do ? We have been using the -a option. Timothy Snyder <[EMAIL PROTECTED]> wrote: [EMAIL PROTECTED] wrote on 06/12/2006 10:24:51 AM: > Does anyone have any tech tips on how to select parameters when > resizing dynamic files ? The following is from a published tech tip. It provides guidelines, but of course the nature of MV files makes it difficult to predict optimal sizing. To get the appropriate input data, run guide with the -r option to send the output to a hashed file. Point the dictionary of that file as directed, and you'll have what you need. It's important to note that this only applies to KEYONLY files. === Formula for determining base modulo, block size, SPLIT_LOAD, and MERGE_LOAD for UniData KEYONLY Dynamic Files Note that the variables used are the same as the DICT items in $UDTHOME/sys/D_UDT_GUIDE. Considerations: a) The following does not take into account the Unix disk record (frame) size so it is best to select a block size based on the number of items you?d like in a group. b) No one method will provide absolute results but these calculations will minimize level one overflow caused by a high SPLIT_LOAD value. c) Type 0 works best for most Dynamic Files but it is best to check a small sample via the GROUP.STAT command. Step 1: Determine the blocksize. (Use 4096 unless the Items per group is larger then 35 or less then 2) A) If the MAXSIZ < 1K ITEMSIZE = 10 * MAXSIZ B) If 1 K < MAXSIZ < 3 K ITEMSIZE = 5 * MAXSIZ C) If MAXSIZ > 3 K ITEMSIZE = 5 * (AVGSIZ + DEVSIZ) Once you determine the item size, use it to determine the NEWBLOCKSIZE. A) ITEMSIZE < 1024; NEWBLOCKSIZE = 1024 B) 1024 > ITEMSIZE < 2048; NEWBLOCKSIZE = 2048 C) 2048 > ITEMSIZE < 4096; NEWBLOCKSIZE = 4096 D) 4096 > ITEMSIZE < 8192; NEWBLOCKSIZE = 8192 E) 8192 > ITEMSIZE < 16384; NEWBLOCKSIZE = 16384 Step 2: Determine the actual number of items per group. ITEMS_PER_GROUP = NEWBLOCKSIZE-32 / AVGSIZ Step 3: Determine the base modulo BASEMODULO = COUNT / ITEMS_PER_GROUP Step 4: Determine SPLIT_LOAD SPLIT_LOAD=INTAVGKEY + 9)*ITEMS_PER_GROUP)/NEW_BLOCKSIZE)*100)+1 If the SPLIT_LOAD is less then ten then: SPLIT_LOAD = 10 Step 5: Determine MERGE_LOAD MERGE_LOAD = SPLIT_LOAD / 2 (Rounded up) Tim Snyder Consulting I/T Specialist , U2 Professional Services North American Lab Services DB2 Information Management, IBM Software Group 717-545-6403 [EMAIL PROTECTED] --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
Re: [U2] RESIZE DYNAMIC FILES
[EMAIL PROTECTED] wrote on 06/12/2006 10:24:51 AM: > Does anyone have any tech tips on how to select parameters when > resizing dynamic files ? The following is from a published tech tip. It provides guidelines, but of course the nature of MV files makes it difficult to predict optimal sizing. To get the appropriate input data, run guide with the -r option to send the output to a hashed file. Point the dictionary of that file as directed, and you'll have what you need. It's important to note that this only applies to KEYONLY files. === Formula for determining base modulo, block size, SPLIT_LOAD, and MERGE_LOAD for UniData KEYONLY Dynamic Files Note that the variables used are the same as the DICT items in $UDTHOME/sys/D_UDT_GUIDE. Considerations: a) The following does not take into account the Unix disk record (frame) size so it is best to select a block size based on the number of items you?d like in a group. b) No one method will provide absolute results but these calculations will minimize level one overflow caused by a high SPLIT_LOAD value. c) Type 0 works best for most Dynamic Files but it is best to check a small sample via the GROUP.STAT command. Step 1: Determine the blocksize. (Use 4096 unless the Items per group is larger then 35 or less then 2) A) If the MAXSIZ < 1K ITEMSIZE = 10 * MAXSIZ B) If 1 K < MAXSIZ < 3 K ITEMSIZE = 5 * MAXSIZ C) If MAXSIZ > 3 K ITEMSIZE = 5 * (AVGSIZ + DEVSIZ) Once you determine the item size, use it to determine the NEWBLOCKSIZE. A) ITEMSIZE < 1024; NEWBLOCKSIZE = 1024 B) 1024 > ITEMSIZE < 2048; NEWBLOCKSIZE = 2048 C) 2048 > ITEMSIZE < 4096; NEWBLOCKSIZE = 4096 D) 4096 > ITEMSIZE < 8192; NEWBLOCKSIZE = 8192 E) 8192 > ITEMSIZE < 16384; NEWBLOCKSIZE = 16384 Step 2: Determine the actual number of items per group. ITEMS_PER_GROUP = NEWBLOCKSIZE-32 / AVGSIZ Step 3: Determine the base modulo BASEMODULO = COUNT / ITEMS_PER_GROUP Step 4: Determine SPLIT_LOAD SPLIT_LOAD=INTAVGKEY + 9)*ITEMS_PER_GROUP)/NEW_BLOCKSIZE)*100)+1 If the SPLIT_LOAD is less then ten then: SPLIT_LOAD = 10 Step 5: Determine MERGE_LOAD MERGE_LOAD = SPLIT_LOAD / 2 (Rounded up) Tim Snyder Consulting I/T Specialist , U2 Professional Services North American Lab Services DB2 Information Management, IBM Software Group 717-545-6403 [EMAIL PROTECTED] --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
RE: [U2] RESIZE DYNAMIC FILES
Should I put [AD] in the subject line for an "unsolicited testimonial"? :) The best advice I can give you is to buy a product called FAST: http://www.fitzlong.com/ A great tool for analyzing and resizing files, be they dynamic or standard hashed files. Excellent support from excellent people at a great price. There might be more expensive utilities out there, but I can't imagine that there is anything better. Mark Hennessey (a FAST Customer since 2002) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Dave S Sent: Monday, June 12, 2006 10:25 AM To: u2-users@listserver.u2ug.org Subject: [U2] RESIZE DYNAMIC FILES Does anyone have any tech tips on how to select parameters when resizing dynamic files ? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/ --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/
[U2] RESIZE DYNAMIC FILES
Does anyone have any tech tips on how to select parameters when resizing dynamic files ? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --- u2-users mailing list u2-users@listserver.u2ug.org To unsubscribe please visit http://listserver.u2ug.org/