Re: [U2] UniVerse LIST statement question [not-secure]
Thanks for all of the responses. -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Hennessey, Mark F. Sent: Monday, July 02, 2012 9:53 AM To: 'U2 Users List' Subject: [U2] UniVerse LIST statement question [not-secure] I need to do a UniVerse LIST statement that would only populate a column if the contents met certain criteria. For example, suppose we have a file with details of telephone usage and that 3 associated mulitvalued fields contain date call was made, duration and if the call was a toll call. Is it possible to limit the output of the date call made and associated columns to a date range without that being a select criteria? If I were to do something like: LIST CALLS EMP.NAME EMP.LOCATION WITH DATE.CALL GE 2012-06-01 AND WITH DATE.CALL LE 2012-06-30 DURATION TOLL WITH @ID EQ '123456' I would get zero record if employee 123456 did not make any calls in June. What I would like to see is the employer name and location returned with the date, duration and toll columns empty. I'm trying to do this in a LIST statement as it will be run by U2 Web Services (and for the time being a subroutine is off the table...) Any advice, or an authoritative NO, It can not be done would be greatly appreciated. Mark Hennessey State of Connecticut Department of Social Services Information Technology Services Child Support Systems Voice: 860-424-5261 Fax: 860-424-4813 CONFIDENTIAL INFORMATION: The information contained in this e-mail may be confidential and protected from general disclosure. If the recipient or reader of this e-mail is not the intended recipient or a person responsible to receive this e-mail for the intended recipient, please do not disseminate, distribute or copy it. If you received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately. We will take immediate and appropriate action to see to it that this mistake is corrected.[*LD*] ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Disk space is not a factor, as we are a smaller shop and disk space comes cheap. However, one thing I did notice is when I increased the modulus to a very large number which then increased my disk space to about 3-4x of my record data, my SELECT queries were slower. Are the 2 factors when choosing HOW the file is used based on whether your using? 1) a lot of SELECTS (then looping through the records) 2) grabbing individual records (not using a SELECT) With this file we really do a lot of SELECTS (option 1), then loop through the records. With that being said and based on the reading I've done here it would appear it's better to have a little overflow and not use up so much disk space for modulus (groups) for this application since we do use a lot of SELECT queries. Is this correct? Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 bytes'. It would seem to me that I would want to REDUCE my split to ~70% to reduce overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger than my current modulus (~10% bigger) since this will be a growing file and will never merge. In my case using the formula might not make sense since this file will never merge. Does this make sense? File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 92903 current ( minimum 31, 87 empty, 28248 overflowed, 2510 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 80% (split), 50% (merge) and 80% (actual) Total size . 501219328 bytes Total size of record data .. 287426366 bytes Total size of record IDs ... 21539682 bytes Unused space ... 192245088 bytes Total space for records 501211136 bytes With all that being said if I change the following: 1) SPLIT.LOAD to 70% 2) MINIMUM.MODULUS 130,000 That's all I should really need to do to 'tweak' the performance of this file.. If this doesn't sound right I would be interested to hear how it should be tweaked instead. Thanks for all the help so far, I think this is all starting to make sense. Chris From: ro...@stamina.com.au To: u2-users@listserver.u2ug.org Date: Wed, 4 Jul 2012 01:36:26 + Subject: Re: [U2] RESIZE - dynamic files I would suggest that then actual goal is to achieve maximum performance for your system, so knowing HOW the file is used on a daily basis can also influence decisions. Disk is a cheap commodity, so having some wastage in file utilization shouldn't factor. Ross Ferris Stamina Software Visage Better by Design! ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 70% (split), 50% (merge) and 63% (actual) Total size . 546869248 bytes Total size of record data .. 287789178 bytes Total size of record IDs ... 21539538 bytes Unused space ... 237532340 bytes Total space for records 546861056 bytes Chris From: keith.john...@datacom.co.nz To: u2-users@listserver.u2ug.org Date: Wed, 4 Jul 2012 14:05:02 +1200 Subject: Re: [U2] RESIZE - dynamic files Doug may have had a key bounce in his input Let's do the math: 258687736 (Record Size) 192283300 (Key Size) The key size is actually 19283300 in Chris' figures Regarding 68,063 being less than the current modulus of 82,850. I think the answer may lie in the splitting process. As I understand it, the first time a split occurs group 1 is split and its contents are split between new group 1 and new group 2. All the other groups effectively get 1 added to their number. The next split is group 3 (which was 2) into 3 and 4 and so forth. A pointer is kept to say where the next split will take place and also to help sort out how to adjust the algorithm to identify which group matches a given key. Based on this, if you started with 1000 groups, by the time you have split the 500th time you will have 1500 groups. The first 1000 will be relatively empty, the last 500 will probably be overflowed, but not terribly badly. By the time you get to the 1000th split, you will have 2000 groups and they will, one hopes, be quite reasonably spread with very little overflow. So I expect the average access times would drift up and down in a cycle. The cycle time would get longer as the file gets bigger but the worst time would be roughly the the same each cycle. Given the power of two introduced into the algorithm by the before/after the split thing, I wonder if there is such a need to start off with a prime? Regards, Keith PS I'm getting a bit Tony^H^H^H^Hverbose nowadays. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Hi, The various suggestions about setting the minimum modulus to reduce overflow are all very well but effectively you are turning a dynamic file into a static one, complete with all the continual maintenance work needed to keep the parameters in step with the data. In most cases, the only parameter that is worth tuning is the group size to try to pack things nicely. Even this is often fine left alone though getting it to match the underlying o/s page size is helpful. I missed the start of this thread but, unless you have a performance problem or are seriously short of space, my recommendation would be to leave the dynamic files to look after themselves. A file without overflow is not necessarily the best solution. Winding the split load down to 70% means that at least 30% of the file is dead space. The implication of this is that the file is larger and will take more disk reads to process sequentially from one end to the other. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 05 July 2012 15:19 To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 70% (split), 50% (merge) and 63% (actual) Total size . 546869248 bytes Total size of record data .. 287789178 bytes Total size of record IDs ... 21539538 bytes Unused space ... 237532340 bytes Total space for records 546861056 bytes Chris From: keith.john...@datacom.co.nz To: u2-users@listserver.u2ug.org Date: Wed, 4 Jul 2012 14:05:02 +1200 Subject: Re: [U2] RESIZE - dynamic files Doug may have had a key bounce in his input Let's do the math: 258687736 (Record Size) 192283300 (Key Size) The key size is actually 19283300 in Chris' figures Regarding 68,063 being less than the current modulus of 82,850. I think the answer may lie in the splitting process. As I understand it, the first time a split occurs group 1 is split and its contents are split between new group 1 and new group 2. All the other groups effectively get 1 added to their number. The next split is group 3 (which was 2) into 3 and 4 and so forth. A pointer is kept to say where the next split will take place and also to help sort out how to adjust the algorithm to identify which group matches a given key. Based on this, if you started with 1000 groups, by the time you have split the 500th time you will have 1500 groups. The first 1000 will be relatively empty, the last 500 will probably be overflowed, but not terribly badly. By the time you get to the 1000th split, you will have 2000 groups and they will, one hopes, be quite reasonably spread with very little overflow. So I expect the average access times would drift up and down in a cycle. The cycle time would get longer as the file gets bigger but the worst time would be roughly the the same each cycle. Given the power of two introduced into the algorithm by the before/after the split thing, I wonder if there is such a need to start off with a prime? Regards, Keith PS I'm getting a bit Tony^H^H^H^Hverbose nowadays. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Chis, I still am wondering what is prompting you to continue using the larger group size. I think that Martin, and the UV documentation is correct in this case; you would be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com wrote: coming Hi, The various suggestions about setting the minimum modulus to reduce overflow are all very well but effectively you are turning a dynamic file into a static one, complete with all the continual maintenance work needed to keep the parameters in step with the data. In most cases, the only parameter that is worth tuning is the group size to try to pack things nicely. Even this is often fine left alone though getting it to match the underlying o/s page size is helpful. I missed the start of this thread but, unless you have a performance problem or are seriously short of space, my recommendation would be to leave the dynamic files to look after themselves. A file without overflow is not necessarily the best solution. Winding the split load down to 70% means that at least 30% of the file is dead space. The implication of this is that the file is larger and will take more disk reads to process sequentially from one end to the other. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 05 July 2012 15:19 To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 70% (split), 50% (merge) and 63% (actual) Total size . 546869248 bytes Total size of record data .. 287789178 bytes Total size of record IDs ... 21539538 bytes Unused space ... 237532340 bytes Total space for records 546861056 bytes Chris From: keith.john...@datacom.co.nz To: u2-users@listserver.u2ug.org Date: Wed, 4 Jul 2012 14:05:02 +1200 Subject: Re: [U2] RESIZE - dynamic files Doug may have had a key bounce in his input Let's do the math: 258687736 (Record Size) 192283300 (Key Size) The key size is actually 19283300 in Chris' figures Regarding 68,063 being less than the current modulus of 82,850. I think the answer may lie in the splitting process. As I understand it, the first time a split occurs group 1 is split and its contents are split between new group 1 and new group 2. All the other groups effectively get 1 added to their number. The next split is group 3 (which was 2) into 3 and 4 and so forth. A pointer is kept to say where the next split will take place and also to help sort out how to adjust the algorithm to identify which group matches a given key. Based on this, if you started with 1000 groups, by the time you have split the 500th time you will have 1500 groups. The first 1000 will be relatively empty, the last 500 will probably be overflowed, but not terribly badly. By the time you get to the 1000th split, you will have 2000 groups and they will, one hopes, be quite reasonably spread with very little overflow. So I expect the average access times would drift up and down in a cycle. The cycle time would get longer as the file gets bigger but the worst time would be roughly the the same each cycle. Given the power of two introduced into the algorithm by the before/after the split thing, I wonder if there is such a need to start off with a prime? Regards, Keith PS I'm getting a bit Tony^H^H^H^Hverbose nowadays. ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users ___ U2-Users mailing list
Re: [U2] RESIZE - dynamic files
Rick, You are correct, I should be using the smaller size (I just haven't changed it yet). Based on the reading I have done you should only use the larger group size when the average record size is greater than 1000 bytes. As far as being better off with the defaults that's basically what I'm trying to test (as well as learn how linear hashing works). I was able to reduce my overflow by 18% and I only increased my empty groups by a very small amount as well as only increased my file size by 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the output and compare the output using timestamps. Chris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group size. I think that Martin, and the UV documentation is correct in this case; you would be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com wrote: coming Hi, The various suggestions about setting the minimum modulus to reduce overflow are all very well but effectively you are turning a dynamic file into a static one, complete with all the continual maintenance work needed to keep the parameters in step with the data. In most cases, the only parameter that is worth tuning is the group size to try to pack things nicely. Even this is often fine left alone though getting it to match the underlying o/s page size is helpful. I missed the start of this thread but, unless you have a performance problem or are seriously short of space, my recommendation would be to leave the dynamic files to look after themselves. A file without overflow is not necessarily the best solution. Winding the split load down to 70% means that at least 30% of the file is dead space. The implication of this is that the file is larger and will take more disk reads to process sequentially from one end to the other. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 05 July 2012 15:19 To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 70% (split), 50% (merge) and 63% (actual) Total size . 546869248 bytes Total size of record data .. 287789178 bytes Total size of record IDs ... 21539538 bytes Unused space ... 237532340 bytes Total space for records 546861056 bytes Chris From: keith.john...@datacom.co.nz To: u2-users@listserver.u2ug.org Date: Wed, 4 Jul 2012 14:05:02 +1200 Subject: Re: [U2] RESIZE - dynamic files Doug may have had a key bounce in his input Let's do the math: 258687736 (Record Size) 192283300 (Key Size) The key size is actually 19283300 in Chris' figures Regarding 68,063 being less than the current modulus of 82,850. I think the answer may lie in the splitting process. As I understand it, the first time a split occurs group 1 is split and its contents are split between new group 1 and new group 2. All the other groups effectively get 1 added to their number. The next split is group 3 (which was 2) into 3 and 4 and so forth. A pointer is kept to say where the next split will take place and also to help sort out how to adjust the algorithm to identify which group matches a given key. Based on this, if you started with 1000 groups, by the time you have split the 500th time you will have 1500 groups. The first 1000 will be relatively empty, the last 500 will probably be overflowed, but not terribly badly. By the time you get
Re: [U2] RESIZE - dynamic files
Chris, For the type of use that you described earlier; BASIC selects and reads, reducing overflow will have negligible performance benefit, especially compared to changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in relatively small percentages, then it will never merge anyway (because you will need to delete 20-30% of the file for that to happen with the mergeload at 50%, so your optimum minimum modulus solution will probably be how ever large it grows The overhead for a group split is not as bad as it sounds unless your updates/sec count is extremely high, such as during a copy. If you do regular SELECT and SCANS of the entire file, then your goal should be to reduce the total disk size of the file, and not worry much about common overflow. The important thing is that the file is dynamic, so you will never encounter the issues that undersized statically hashed files develop. We have thousands of dynamically hashed files on our (Solaris) systems, with an extremely low problem rate. Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: Thursday, July 05, 2012 11:21 AM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files Rick, You are correct, I should be using the smaller size (I just haven't changed it yet). Based on the reading I have done you should only use the larger group size when the average record size is greater than 1000 bytes. As far as being better off with the defaults that's basically what I'm trying to test (as well as learn how linear hashing works). I was able to reduce my overflow by 18% and I only increased my empty groups by a very small amount as well as only increased my file size by 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the output and compare the output using timestamps. Chris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group size. I think that Martin, and the UV documentation is correct in this case; you would be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com wrote: coming Hi, The various suggestions about setting the minimum modulus to reduce overflow are all very well but effectively you are turning a dynamic file into a static one, complete with all the continual maintenance work needed to keep the parameters in step with the data. In most cases, the only parameter that is worth tuning is the group size to try to pack things nicely. Even this is often fine left alone though getting it to match the underlying o/s page size is helpful. I missed the start of this thread but, unless you have a performance problem or are seriously short of space, my recommendation would be to leave the dynamic files to look after themselves. A file without overflow is not necessarily the best solution. Winding the split load down to 70% means that at least 30% of the file is dead space. The implication of this is that the file is larger and will take more disk reads to process sequentially from one end to the other. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin Sent: 05 July 2012 15:19 To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record data + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly ) Number of records .. 1292377 Large record size .. 3267 bytes Number of large records 180 Group size . 4096 bytes Load factors ... 70% (split), 50% (merge) and 63% (actual) Total size . 546869248 bytes Total size of record data .. 287789178 bytes Total size of record IDs ...
Re: [U2] RESIZE - dynamic files
The hardward look ahead of the disk drive reader will grab consecutive frames into memory, since it assumes you'll want the next frame next. So the less overflow you have, the faster a full file scan will become. At least that's my theory ;) -Original Message- From: Rick Nuckolls r...@lynden.com To: 'U2 Users List' u2-users@listserver.u2ug.org Sent: Thu, Jul 5, 2012 2:29 pm Subject: Re: [U2] RESIZE - dynamic files Chris, For the type of use that you described earlier; BASIC selects and reads, educing overflow will have negligible performance benefit, especially compared o changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in elatively small percentages, then it will never merge anyway (because you will eed to delete 20-30% of the file for that to happen with the mergeload at 50%, o your optimum minimum modulus solution will probably be how ever large it rows The overhead for a group split is not as bad as it sounds unless your pdates/sec count is extremely high, such as during a copy. If you do regular SELECT and SCANS of the entire file, then your goal should be o reduce the total disk size of the file, and not worry much about common verflow. The important thing is that the file is dynamic, so you will never ncounter the issues that undersized statically hashed files develop. We have thousands of dynamically hashed files on our (Solaris) systems, with an xtremely low problem rate. Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin ent: Thursday, July 05, 2012 11:21 AM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files ick, You are correct, I should be using the smaller size (I just haven't changed it et). Based on the reading I have done you should nly use the larger group size when the average record size is greater than 1000 ytes. As far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able o reduce my overflow by 18% and I only increased my empty groups by a very mall amount as well as only increased my file size y 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the utput and compare the output using timestamps. Chris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group ize. I think that Martin, and the UV documentation is correct in this case; you ould be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com rote: coming Hi, The various suggestions about setting the minimum modulus to reduce overflow re all very well but effectively you are turning a dynamic file into a static one, complete with all the continual maintenance ork needed to keep the parameters in step with the data. In most cases, the only parameter that is worth tuning is the group size to ry to pack things nicely. Even this is often fine left alone though getting it to match the underlying o/s page size is helpful. I missed the start of this thread but, unless you have a performance problem r are seriously short of space, my recommendation would be to leave the dynamic files to look after themselves. A file without overflow is not necessarily the best solution. Winding the plit load down to 70% means that at least 30% of the file is dead space. The implication of this is that the file is larger and will ake more disk reads to process sequentially from one end to the other. Martin Phillips Ladybridge Systems Ltd 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England +44 (0)1604-709200 -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin Sent: 05 July 2012 15:19 To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files I was able to drop from 30% overflow to 12% by making 2 changes: 1) changed the split from 80% to 70% (that alone reduce 10% overflow) 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record ata + id) * 1.1 * 1.42857 (70% split load)] / 4096 ) My disk size only went up 8%.. My file looks like this now: File name .. GENACCTRN_POSTED Pathname ... GENACCTRN_POSTED File type .. DYNAMIC File style and revision 32BIT Revision 12 Hashing Algorithm .. GENERAL No. of groups (modulus) 118681 current ( minimum 118681, 140 empty, 14431 overflowed, 778 badly )
Re: [U2] RESIZE - dynamic files
Chris, I can appreciate what you are doing as an academic exercise. You seem happy how it looks at this moment, where, because you set MINIMUM.MODULUS 118681, you ended up with a current load of 63%. But think about it: as you add records, the load will reach 70%, per SPLIT.LOAD 70, then splits will keep occuring and current modlus with grow past 118681. MINIMUM.MODULUS will never matter again. (This was described as an ever-growing file.) If the current config is what you want, why not just set SPLIT.LOAD 63 MINIMUM.MODULUS 1. That way the ratio that you like today will stay like this forever. MINIMUM.MODULUS will not matter unless data is deleted. It says to not shrink the file structure below that minimally allocated disk space, even if there is no data to occupy it. That's really all MINIMUM.MODULUS is for. Play with it all you want, because it puts you in a good place when some crisis happens. At the end of the day, with this file, you'll find your tuning won't matter much. Not a lot of help, but not much harm if you tweak it wrong, either. On 7/5/2012 1:20 PM, Chris Austin wrote: Rick, You are correct, I should be using the smaller size (I just haven't changed it yet). Based on the reading I have done you should only use the larger group size when the average record size is greater than 1000 bytes. As far as being better off with the defaults that's basically what I'm trying to test (as well as learn how linear hashing works). I was able to reduce my overflow by 18% and I only increased my empty groups by a very small amount as well as only increased my file size by 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the output and compare the output using timestamps. Chris ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
On 05/07/12 16:12, Martin Phillips wrote: A file without overflow is not necessarily the best solution. Winding the split load down to 70% means that at least 30% of the file is dead space. The implication of this is that the file is larger and will take more disk reads to process sequentially from one end to the other. Whoops Martin, I think you've made the classic percentages mistake here ... The file is 30/70, or 42% dead space at least. A file with the default 80% split is at least 25% dead space. Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
[U2] The words to the Pick systems rap
http://books.google.com/books?id=ShGYef744mgCpg=PA41 To add a little humour ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
On 05/07/12 14:49, Chris Austin wrote: Disk space is not a factor, as we are a smaller shop and disk space comes cheap. However, one thing I did notice is when I increased the modulus to a very large number which then increased my disk space to about 3-4x of my record data, my SELECT queries were slower. Are the 2 factors when choosing HOW the file is used based on whether your using? 1) a lot of SELECTS (then looping through the records) Is that a BASIC select, or a RETRIEVE select? 2) grabbing individual records (not using a SELECT) With this file we really do a lot of SELECTS (option 1), then loop through the records. With that being said and based on the reading I've done here it would appear it's better to have a little overflow and not use up so much disk space for modulus (groups) for this application since we do use a lot of SELECT queries. Is this correct? If your selects are BASIC selects, then you won't notice too much difference. If they are RETRIEVE selects, then reducing SPLIT will increase the cost of the SELECT. In both cases, if the RETRIEVE select is not BY, then the cost of processing the list should not be seriously impacted. (On a SELECT WITH index, however, reducing overflow will speed things up a bit, probably not an awful lot.) Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 bytes'. It would seem to me that I would want to REDUCE my split to ~70% to reduce overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger than my current modulus (~10% bigger) since this will be a growing file and will never merge. In my case using the formula might not make sense since this file will never merge. Does this make sense? If the file will only ever grow, then MINIMUM.MODULUS is probably a waste of time. You are best using that in one of two circumstances, either (a) you are populating a file with a large number of initial records and you are forcing the modulus to what it's likely to end up anyway, or (b) your file grows and shrinks violently in size, and you are forcing it to its typical state. The first scenario simply avoids a bunch of inevitable splits, the second avoids a yoyo split/merge/split scenario. I'd just leave the settings at 80/20, and only use MINIMUM.MODULUS if I was creating a copy of the file (setting the new minimum at the current modulo of the existing file). Cheers, Wol ___ U2-Users mailing list U2-Users@listserver.u2ug.org http://listserver.u2ug.org/mailman/listinfo/u2-users
Re: [U2] RESIZE - dynamic files
Most disks and disk systems cache huge amounts of information these days, and, depending on 20 factors or so, one solution will be better than another for a given file. For the wholesale, SELECT F WITH, The fewest disk records will almost always win. For files that have ~10 records/group and have ~10% of the groups overflowed, then perhaps 1% of record reads will do a second read for the overflow buffer because the target key was not in the primary group. Writing a new record would possibly hit the 10% mark for reading overflow buffers. But lowering the split.load will increase the number of splits slightly, and increase the total number of groups considerably. What you have shown is that you need to increase the the modulus (and select time) of a large file more than 10% in order to decrease the read and update times for you records 0.5% of the time (assuming, that you have only reduced the number of overflow groups by ~50%.) As Charles suggests, this is an interesting exercise, but your actual results will rapidly change if you actually add /remove records from your file, change the load or number of files on your system, put in a new drive, cpu, memory board, or install a new release of Universe, move to raid, etc. -Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Thursday, July 05, 2012 2:38 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files The hardward look ahead of the disk drive reader will grab consecutive frames into memory, since it assumes you'll want the next frame next. So the less overflow you have, the faster a full file scan will become. At least that's my theory ;) -Original Message- From: Rick Nuckolls r...@lynden.com To: 'U2 Users List' u2-users@listserver.u2ug.org Sent: Thu, Jul 5, 2012 2:29 pm Subject: Re: [U2] RESIZE - dynamic files Chris, For the type of use that you described earlier; BASIC selects and reads, educing overflow will have negligible performance benefit, especially compared o changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in elatively small percentages, then it will never merge anyway (because you will eed to delete 20-30% of the file for that to happen with the mergeload at 50%, o your optimum minimum modulus solution will probably be how ever large it rows The overhead for a group split is not as bad as it sounds unless your pdates/sec count is extremely high, such as during a copy. If you do regular SELECT and SCANS of the entire file, then your goal should be o reduce the total disk size of the file, and not worry much about common verflow. The important thing is that the file is dynamic, so you will never ncounter the issues that undersized statically hashed files develop. We have thousands of dynamically hashed files on our (Solaris) systems, with an xtremely low problem rate. Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin ent: Thursday, July 05, 2012 11:21 AM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files ick, You are correct, I should be using the smaller size (I just haven't changed it et). Based on the reading I have done you should nly use the larger group size when the average record size is greater than 1000 ytes. As far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able o reduce my overflow by 18% and I only increased my empty groups by a very mall amount as well as only increased my file size y 8%. This in theory should be better for reads/writes than what I had before. To test the performance I need to write a ton of records and then capture the utput and compare the output using timestamps. Chris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group ize. I think that Martin, and the UV documentation is correct in this case; you ould be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com rote: coming Hi, The various suggestions about setting the minimum modulus to reduce overflow re all very well but effectively you are turning a dynamic file into a static one, complete with all the continual maintenance ork needed to keep the parameters in step with the data. In most cases, the only parameter that is worth tuning is the group size to ry to pack things nicely. Even this is often fine left alone though getting it to match the underlying o/s page size is helpful. I missed the start of this thread but, unless you have a performance problem r are seriously short
Re: [U2] RESIZE - dynamic files
A BASIC SELECT cannot use criteria at all. It is going to walk through every record in the file, in order. And that's the sticky wicket. That whole in order business. The disk drive controller has no clue on linked frames, but it *will* do optimistic look aheads for you. So you are much better off, for BASIC SELECTs having nothing in overflow, at all. :) That way, when you go to ask for the *next* frame, it will always be contiguous, and already sitting in memory. -Original Message- From: Rick Nuckolls r...@lynden.com To: 'U2 Users List' u2-users@listserver.u2ug.org Sent: Thu, Jul 5, 2012 4:43 pm Subject: Re: [U2] RESIZE - dynamic files Most disks and disk systems cache huge amounts of information these days, and, epending on 20 factors or so, one solution will be better than another for a iven file. For the wholesale, SELECT F WITH, The fewest disk records will almost always in. For files that have ~10 records/group and have ~10% of the groups verflowed, then perhaps 1% of record reads will do a second read for the verflow buffer because the target key was not in the primary group. Writing a ew record would possibly hit the 10% mark for reading overflow buffers. But owering the split.load will increase the number of splits slightly, and ncrease the total number of groups considerably. What you have shown is that ou need to increase the the modulus (and select time) of a large file more than 0% in order to decrease the read and update times for you records 0.5% of the ime (assuming, that you have only reduced the number of overflow groups by 50%.) As Charles suggests, this is an interesting exercise, but your actual results ill rapidly change if you actually add /remove records from your file, change he load or number of files on your system, put in a new drive, cpu, memory oard, or install a new release of Universe, move to raid, etc. -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Wjhonson ent: Thursday, July 05, 2012 2:38 PM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files he hardward look ahead of the disk drive reader will grab consecutive frames into memory, since it assumes you'll want the next frame next. o the less overflow you have, the faster a full file scan will become. t least that's my theory ;) Original Message- rom: Rick Nuckolls r...@lynden.com o: 'U2 Users List' u2-users@listserver.u2ug.org ent: Thu, Jul 5, 2012 2:29 pm ubject: Re: [U2] RESIZE - dynamic files hris, or the type of use that you described earlier; BASIC selects and reads, ducing overflow will have negligible performance benefit, especially compared changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in latively small percentages, then it will never merge anyway (because you will ed to delete 20-30% of the file for that to happen with the mergeload at 50%, your optimum minimum modulus solution will probably be how ever large it ows The overhead for a group split is not as bad as it sounds unless your dates/sec count is extremely high, such as during a copy. f you do regular SELECT and SCANS of the entire file, then your goal should be reduce the total disk size of the file, and not worry much about common erflow. The important thing is that the file is dynamic, so you will never counter the issues that undersized statically hashed files develop. e have thousands of dynamically hashed files on our (Solaris) systems, with an tremely low problem rate. ick Original Message- om: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin nt: Thursday, July 05, 2012 11:21 AM : u2-users@listserver.u2ug.org bject: Re: [U2] RESIZE - dynamic files ick, ou are correct, I should be using the smaller size (I just haven't changed it t). Based on the reading I have done you should ly use the larger group size when the average record size is greater than 1000 tes. s far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able reduce my overflow by 18% and I only increased my empty groups by a very all amount as well as only increased my file size 8%. This in theory should be better for reads/writes than what I had before. o test the performance I need to write a ton of records and then capture the tput and compare the output using timestamps. hris From: r...@lynden.com To: u2-users@listserver.u2ug.org Date: Thu, 5 Jul 2012 09:22:02 -0700 Subject: Re: [U2] RESIZE - dynamic files Chis, I still am wondering what is prompting you to continue using the larger group ze. I think that Martin, and the UV documentation is correct in this case; you uld be as well or better off with the defaults. -Rick On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com ote: coming Hi, The
Re: [U2] RESIZE - dynamic files
This will be mostly true if the full extent of the file was allocated at one time as a contiguous block, which could be a big plus. As a file grows, sectors will be allocated piecemeal and when the hardware reads ahead, it will not necessarily be reading sectors in the same file. Curiously, an old Pr1me CAM file had a trick around this, though it was late coming onto the scene. Unix also has a few tricks, but they are only partial solutions to file fragmentation. And Windows Rick -Original Message- From: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson Sent: Thursday, July 05, 2012 5:12 PM To: u2-users@listserver.u2ug.org Subject: Re: [U2] RESIZE - dynamic files A BASIC SELECT cannot use criteria at all. It is going to walk through every record in the file, in order. And that's the sticky wicket. That whole in order business. The disk drive controller has no clue on linked frames, but it *will* do optimistic look aheads for you. So you are much better off, for BASIC SELECTs having nothing in overflow, at all. :) That way, when you go to ask for the *next* frame, it will always be contiguous, and already sitting in memory. -Original Message- From: Rick Nuckolls r...@lynden.com To: 'U2 Users List' u2-users@listserver.u2ug.org Sent: Thu, Jul 5, 2012 4:43 pm Subject: Re: [U2] RESIZE - dynamic files Most disks and disk systems cache huge amounts of information these days, and, epending on 20 factors or so, one solution will be better than another for a iven file. For the wholesale, SELECT F WITH, The fewest disk records will almost always in. For files that have ~10 records/group and have ~10% of the groups verflowed, then perhaps 1% of record reads will do a second read for the verflow buffer because the target key was not in the primary group. Writing a ew record would possibly hit the 10% mark for reading overflow buffers. But owering the split.load will increase the number of splits slightly, and ncrease the total number of groups considerably. What you have shown is that ou need to increase the the modulus (and select time) of a large file more than 0% in order to decrease the read and update times for you records 0.5% of the ime (assuming, that you have only reduced the number of overflow groups by 50%.) As Charles suggests, this is an interesting exercise, but your actual results ill rapidly change if you actually add /remove records from your file, change he load or number of files on your system, put in a new drive, cpu, memory oard, or install a new release of Universe, move to raid, etc. -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Wjhonson ent: Thursday, July 05, 2012 2:38 PM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files he hardward look ahead of the disk drive reader will grab consecutive frames into memory, since it assumes you'll want the next frame next. o the less overflow you have, the faster a full file scan will become. t least that's my theory ;) Original Message- rom: Rick Nuckolls r...@lynden.com o: 'U2 Users List' u2-users@listserver.u2ug.org ent: Thu, Jul 5, 2012 2:29 pm ubject: Re: [U2] RESIZE - dynamic files hris, or the type of use that you described earlier; BASIC selects and reads, ducing overflow will have negligible performance benefit, especially compared changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in latively small percentages, then it will never merge anyway (because you will ed to delete 20-30% of the file for that to happen with the mergeload at 50%, your optimum minimum modulus solution will probably be how ever large it ows The overhead for a group split is not as bad as it sounds unless your dates/sec count is extremely high, such as during a copy. f you do regular SELECT and SCANS of the entire file, then your goal should be reduce the total disk size of the file, and not worry much about common erflow. The important thing is that the file is dynamic, so you will never counter the issues that undersized statically hashed files develop. e have thousands of dynamically hashed files on our (Solaris) systems, with an tremely low problem rate. ick Original Message- om: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin nt: Thursday, July 05, 2012 11:21 AM : u2-users@listserver.u2ug.org bject: Re: [U2] RESIZE - dynamic files ick, ou are correct, I should be using the smaller size (I just haven't changed it t). Based on the reading I have done you should ly use the larger group size when the average record size is greater than 1000 tes. s far as being better off with the defaults that's basically what I'm trying to est (as well as learn how linear hashing works). I was able reduce my overflow by 18% and I only
Re: [U2] RESIZE - dynamic files
That's what we use, 'BASIC SELECT' statements for this table, looping through records to build reports. It's an accounting table that has about 200-300 records WRITES a day, with an average of about ~250 bytes per record. We obviously have more READ operations since we are always building up these reports so I was hoping my #'s looked right. 1) I reduced overflow by 18%. 2) I only increased file size ~8%. So we do a combination of BASIC SELECTS and WRITES. Everything is done in the latest version of Rocket's Universe, PICK using BASIC for our programs that contain the SELECTS. Chris To: u2-users@listserver.u2ug.org From: wjhon...@aol.com Date: Thu, 5 Jul 2012 20:12:21 -0400 Subject: Re: [U2] RESIZE - dynamic files A BASIC SELECT cannot use criteria at all. It is going to walk through every record in the file, in order. And that's the sticky wicket. That whole in order business. The disk drive controller has no clue on linked frames, but it *will* do optimistic look aheads for you. So you are much better off, for BASIC SELECTs having nothing in overflow, at all. :) That way, when you go to ask for the *next* frame, it will always be contiguous, and already sitting in memory. -Original Message- From: Rick Nuckolls r...@lynden.com To: 'U2 Users List' u2-users@listserver.u2ug.org Sent: Thu, Jul 5, 2012 4:43 pm Subject: Re: [U2] RESIZE - dynamic files Most disks and disk systems cache huge amounts of information these days, and, epending on 20 factors or so, one solution will be better than another for a iven file. For the wholesale, SELECT F WITH, The fewest disk records will almost always in. For files that have ~10 records/group and have ~10% of the groups verflowed, then perhaps 1% of record reads will do a second read for the verflow buffer because the target key was not in the primary group. Writing a ew record would possibly hit the 10% mark for reading overflow buffers. But owering the split.load will increase the number of splits slightly, and ncrease the total number of groups considerably. What you have shown is that ou need to increase the the modulus (and select time) of a large file more than 0% in order to decrease the read and update times for you records 0.5% of the ime (assuming, that you have only reduced the number of overflow groups by 50%.) As Charles suggests, this is an interesting exercise, but your actual results ill rapidly change if you actually add /remove records from your file, change he load or number of files on your system, put in a new drive, cpu, memory oard, or install a new release of Universe, move to raid, etc. -Rick -Original Message- rom: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Wjhonson ent: Thursday, July 05, 2012 2:38 PM o: u2-users@listserver.u2ug.org ubject: Re: [U2] RESIZE - dynamic files he hardward look ahead of the disk drive reader will grab consecutive frames into memory, since it assumes you'll want the next frame next. o the less overflow you have, the faster a full file scan will become. t least that's my theory ;) Original Message- rom: Rick Nuckolls r...@lynden.com o: 'U2 Users List' u2-users@listserver.u2ug.org ent: Thu, Jul 5, 2012 2:29 pm ubject: Re: [U2] RESIZE - dynamic files hris, or the type of use that you described earlier; BASIC selects and reads, ducing overflow will have negligible performance benefit, especially compared changing the GROUP.SIZE back to 1 (2048) bytes. If you purge the file in latively small percentages, then it will never merge anyway (because you will ed to delete 20-30% of the file for that to happen with the mergeload at 50%, your optimum minimum modulus solution will probably be how ever large it ows The overhead for a group split is not as bad as it sounds unless your dates/sec count is extremely high, such as during a copy. f you do regular SELECT and SCANS of the entire file, then your goal should be reduce the total disk size of the file, and not worry much about common erflow. The important thing is that the file is dynamic, so you will never counter the issues that undersized statically hashed files develop. e have thousands of dynamically hashed files on our (Solaris) systems, with an tremely low problem rate. ick Original Message- om: u2-users-boun...@listserver.u2ug.org [mailto:u2-users-boun...@listserver.u2ug.org] n Behalf Of Chris Austin nt: Thursday, July 05, 2012 11:21 AM : u2-users@listserver.u2ug.org bject: Re: [U2] RESIZE - dynamic files ick, ou are correct, I should be using the smaller size (I just haven't changed it t). Based on the reading I have done you should ly use the larger group size when the average record size is greater than 1000 tes. s far as being better off with the defaults that's basically what I'm trying to est