Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Israel, John R.
The best thing I can say about the MINIMUM.MODULUS is that if you set it close 
to what you expect the file to need (at least for a while), when you start 
populating it from scratch, you will not lose the performance as the file keeps 
splitting.  This can make an amazing difference in how long it take to 
initially populate the file.

John

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Charles Stevenson
Sent: Thursday, July 05, 2012 5:41 PM
To: U2 Users List
Subject: Re: [U2] RESIZE - dynamic files

Chris,

I can appreciate what you are doing as an academic exercise.

You seem happy how it looks at this moment, where, because you set 
MINIMUM.MODULUS  118681, you ended up with a current load of 63%.
But think about it:  as you add records, the load will reach 70%, per 
SPLIT.LOAD 70,  then splits will keep occuring and current modlus with grow 
past 118681.  MINIMUM.MODULUS will never matter again.  (This was described as 
an ever-growing file.)

If the current config is what you want, why not just set SPLIT.LOAD 63  
MINIMUM.MODULUS 1.   That way the ratio that you like today will stay 
like this forever.

MINIMUM.MODULUS will not matter unless data is deleted.  It says to not shrink 
the file structure below that minimally allocated disk space, even if there is 
no data to occupy it.  That's really all MINIMUM.MODULUS is for.

Play with it all you want, because it puts you in a good place when some crisis 
happens.  At the end of the day, with this file, you'll find your tuning won't 
matter much.  Not a lot of help, but not much harm if you tweak it wrong, 
either.


On 7/5/2012 1:20 PM, Chris Austin wrote:
 Rick,

 You are correct, I should be using the smaller size (I just haven't 
 changed it yet). Based on the reading I have done you should only use the 
 larger group size when the average record size is greater than 1000 bytes.

 As far as being better off with the defaults that's basically what I'm 
 trying to test (as well as learn how linear hashing works). I was able 
 to reduce my overflow by 18% and I only increased my empty groups by a very 
 small amount as well as only increased my file size by 8%. This in theory 
 should be better for reads/writes than what I had before.

 To test the performance I need to write a ton of records and then capture the 
 output and compare the output using timestamps.

 Chris

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Wols Lists
On 05/07/12 23:58, Rick Nuckolls wrote:
 Oops, I would of thought that if a file had, say 100,000 bytes, @ 70 percent 
 full, there would be 30,000 bytes empty or dead. Are you suggesting the 
 there would be 70,000 bytes of data and 42,000 bytes of dead space?

Do you mean 100,000 bytes of disk space, or 100,000 bytes of data?

I guess you are thinking that the file occupies, on disk, 100K. In which
case it will have 70K of data and 30K of empty space.

But if you are thinking that the file contains 100K of data, it will
actually occupy 142K of disk space.

So this particular option wastes 30% of the disk space it uses, but
uses 42% extra space to what it optimally needed assuming perfect packing.

I know, it's fun to try get your head round it :-)

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Chris Austin

So is there a performance increase in BASIC SELECTS by reducing overflow? Some 
people are saying to reduce disk space to speed up the BASIC SELECT
while others say to reduce overflow.. I'm a bit confused. All of our programs 
that read that table use a BASIC SELECT WITH.. 

for a BASIC select do you gain anything by reducing overflow?

Chris


 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole in order business.
 The disk drive controller has no clue on linked frames, but it *will* do 
 optimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in overflow, at 
 all. :)
 That way, when you go to ask for the *next* frame, it will always be 
 contiguous, and already sitting in memory.
 
 
 
 
 
 
 
 
 -Original Message-
 From: Rick Nuckolls r...@lynden.com
 To: 'U2 Users List' u2-users@listserver.u2ug.org
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Most disks and disk systems cache huge amounts of information these days, 
 and, 
 epending on 20 factors or so, one solution will be better than another for a 
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will almost 
 always 
 in. For files that have ~10 records/group and have ~10% of the groups 
 verflowed, then perhaps 1% of record reads will do a second read for the 
 verflow buffer because the target key was not in the primary group.  Writing 
 a 
 ew record would possibly hit the 10% mark for reading overflow buffers. But 
 owering the split.load will increase the number of splits slightly, and 
 ncrease the total number of groups considerably.  What you have shown is that 
 ou need to increase the the modulus (and select time) of a large file more 
 than 
 0% in order to decrease the read and update times for you records 0.5% of the 
 ime (assuming, that you have only reduced the number of overflow groups by 
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual results 
 ill rapidly change if you actually add /remove records from your file, change 
 he load or number of files on your system, put in a new drive, cpu, memory 
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files
 
 he hardward look ahead of the disk drive reader will grab consecutive 
 frames into memory, since it assumes you'll want the next frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)
 
 
 Original Message-
 rom: Rick Nuckolls r...@lynden.com
 o: 'U2 Users List' u2-users@listserver.u2ug.org
 ent: Thu, Jul 5, 2012 2:29 pm
 ubject: Re: [U2] RESIZE - dynamic files
 
 hris,
 or the type of use that you described earlier; BASIC selects and reads, 
 ducing overflow will have negligible performance benefit, especially compared 
  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
 latively small percentages, then it will never merge anyway (because you will 
 ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
  your optimum minimum modulus solution will probably be how ever large it 
 ows  The overhead for a group split is not as bad as it sounds unless your 
 dates/sec count is extremely high, such as during a copy.
 f you do regular SELECT and SCANS of the entire file, then your goal should 
 be 
  reduce the total disk size of the file, and not worry much about common 
 erflow. The important thing is that the file is dynamic, so you will never 
 counter the issues that undersized statically hashed files develop.
 e have thousands of dynamically hashed files on our (Solaris) systems, with 
 an 
 tremely low problem rate.
 ick
 Original Message-
 om: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Chris Austin
 nt: Thursday, July 05, 2012 11:21 AM
 : u2-users@listserver.u2ug.org
 bject: Re: [U2] RESIZE - dynamic files
 ick,
 ou are correct, I should be using the smaller size (I just haven't changed it 
 t). Based on the reading I have done you should
 ly use the larger group size when the average record size is greater than 
 1000 
 tes. 
 s far as being better off with the defaults that's basically what I'm trying 
 to 
 est (as well as learn how linear hashing works). I was able
  reduce my overflow by 18% and I only increased my empty groups by a very 
 all amount as well as only increased my file size
  8%. This in theory should be better for reads/writes than what I had

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Wjhonson

What do you mean a BASIC SELECT WITH...

If you mean you are EXECUTE SELECT CUSTOMER WITH...
that is not a BASIC SELECT whose syntax is only

OPEN CUSTOMER TO F.CUSTOMER 
SELECT F.CUSTOMER

no WITH









-Original Message-
From: Chris Austin cjausti...@hotmail.com
To: u2-users u2-users@listserver.u2ug.org
Sent: Fri, Jul 6, 2012 10:23 am
Subject: Re: [U2] RESIZE - dynamic files



o is there a performance increase in BASIC SELECTS by reducing overflow? Some 
eople are saying to reduce disk space to speed up the BASIC SELECT
hile others say to reduce overflow.. I'm a bit confused. All of our programs 
hat read that table use a BASIC SELECT WITH.. 
for a BASIC select do you gain anything by reducing overflow?
Chris

 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole in order business.
 The disk drive controller has no clue on linked frames, but it *will* do 
ptimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in overflow, at 
ll. :)
 That way, when you go to ask for the *next* frame, it will always be 
ontiguous, and already sitting in memory.
 
 
 
 
 
 
 
 
 -Original Message-
 From: Rick Nuckolls r...@lynden.com
 To: 'U2 Users List' u2-users@listserver.u2ug.org
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Most disks and disk systems cache huge amounts of information these days, and, 
 epending on 20 factors or so, one solution will be better than another for a 
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will almost 
lways 
 in. For files that have ~10 records/group and have ~10% of the groups 
 verflowed, then perhaps 1% of record reads will do a second read for the 
 verflow buffer because the target key was not in the primary group.  Writing a 
 ew record would possibly hit the 10% mark for reading overflow buffers. But 
 owering the split.load will increase the number of splits slightly, and 
 ncrease the total number of groups considerably.  What you have shown is that 
 ou need to increase the the modulus (and select time) of a large file more 
han 
 0% in order to decrease the read and update times for you records 0.5% of the 
 ime (assuming, that you have only reduced the number of overflow groups by 
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual results 
 ill rapidly change if you actually add /remove records from your file, change 
 he load or number of files on your system, put in a new drive, cpu, memory 
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files
 
 he hardward look ahead of the disk drive reader will grab consecutive 
 frames into memory, since it assumes you'll want the next frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)
 
 
 Original Message-
 rom: Rick Nuckolls r...@lynden.com
 o: 'U2 Users List' u2-users@listserver.u2ug.org
 ent: Thu, Jul 5, 2012 2:29 pm
 ubject: Re: [U2] RESIZE - dynamic files
 
 hris,
 or the type of use that you described earlier; BASIC selects and reads, 
 ducing overflow will have negligible performance benefit, especially compared 
  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
 latively small percentages, then it will never merge anyway (because you will 
 ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
  your optimum minimum modulus solution will probably be how ever large it 
 ows  The overhead for a group split is not as bad as it sounds unless your 
 dates/sec count is extremely high, such as during a copy.
 f you do regular SELECT and SCANS of the entire file, then your goal should be 
  reduce the total disk size of the file, and not worry much about common 
 erflow. The important thing is that the file is dynamic, so you will never 
 counter the issues that undersized statically hashed files develop.
 e have thousands of dynamically hashed files on our (Solaris) systems, with an 
 tremely low problem rate.
 ick
 Original Message-
 om: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Chris Austin
 nt: Thursday, July 05, 2012 11:21 AM
 : u2-users@listserver.u2ug.org
 bject: Re: [U2] RESIZE - dynamic files
 ick,
 ou are correct, I should be using the smaller size (I just haven't changed it 
 t). Based on the reading I have done you should
 ly use the larger group size when the average record size

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Rick Nuckolls
Logically, the graphed solution to varying the split.load value with an 
x-axis=modulus, y-axis=time_to_select__read_the_whole_file is going to be 
parabolic, having very slow performance at modulus=1 and modulus = # of records.

If you actually want to find the precise low point, ignore all this bs, create 
a bunch of files with copies of the same data, but different moduli, restart 
your system (including all disk drives  raid devices) in order to purge all 
buffers, and then run the same program against each file.  I think that we 
would all be curious about the results.

Easier yet, just ignore the bs and use the defaults. :)

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Friday, July 06, 2012 9:56 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


So is there a performance increase in BASIC SELECTS by reducing overflow? Some 
people are saying to reduce disk space to speed up the BASIC SELECT
while others say to reduce overflow.. I'm a bit confused. All of our programs 
that read that table use a BASIC SELECT WITH.. 

for a BASIC select do you gain anything by reducing overflow?

Chris


 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole in order business.
 The disk drive controller has no clue on linked frames, but it *will* do 
 optimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in overflow, at 
 all. :)
 That way, when you go to ask for the *next* frame, it will always be 
 contiguous, and already sitting in memory.
 
 
 
 
 
 
 
 
 -Original Message-
 From: Rick Nuckolls r...@lynden.com
 To: 'U2 Users List' u2-users@listserver.u2ug.org
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Most disks and disk systems cache huge amounts of information these days, 
 and, 
 epending on 20 factors or so, one solution will be better than another for a 
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will almost 
 always 
 in. For files that have ~10 records/group and have ~10% of the groups 
 verflowed, then perhaps 1% of record reads will do a second read for the 
 verflow buffer because the target key was not in the primary group.  Writing 
 a 
 ew record would possibly hit the 10% mark for reading overflow buffers. But 
 owering the split.load will increase the number of splits slightly, and 
 ncrease the total number of groups considerably.  What you have shown is that 
 ou need to increase the the modulus (and select time) of a large file more 
 than 
 0% in order to decrease the read and update times for you records 0.5% of the 
 ime (assuming, that you have only reduced the number of overflow groups by 
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual results 
 ill rapidly change if you actually add /remove records from your file, change 
 he load or number of files on your system, put in a new drive, cpu, memory 
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files
 
 he hardward look ahead of the disk drive reader will grab consecutive 
 frames into memory, since it assumes you'll want the next frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)
 
 
 Original Message-
 rom: Rick Nuckolls r...@lynden.com
 o: 'U2 Users List' u2-users@listserver.u2ug.org
 ent: Thu, Jul 5, 2012 2:29 pm
 ubject: Re: [U2] RESIZE - dynamic files
 
 hris,
 or the type of use that you described earlier; BASIC selects and reads, 
 ducing overflow will have negligible performance benefit, especially compared 
  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
 latively small percentages, then it will never merge anyway (because you will 
 ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
  your optimum minimum modulus solution will probably be how ever large it 
 ows  The overhead for a group split is not as bad as it sounds unless your 
 dates/sec count is extremely high, such as during a copy.
 f you do regular SELECT and SCANS of the entire file, then your goal should 
 be 
  reduce the total disk size of the file, and not worry much about common 
 erflow. The important thing is that the file is dynamic, so you will never 
 counter the issues that undersized statically hashed files develop.
 e have thousands of dynamically

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Susan Lynch
Chris,

10 years ago, when I was administering a UniVerse system, the answer would
have been minimize both to the best of your ability.  But I don't know
how UniVerse has changed in the interim, during which time I have been
working on UniData systems, which are enormously different in their
handling of records in groups from any other Pick-type system I have ever
worked on (all of which were much more similar to UniVerse at that time).
And when last I administered a UniVerse system, there were no dynamic
files..

With that caveat, here are the factors:

1) a record in a UniVerse file that is stored in overflow is going to take
2 or more disk reads to retrieve if you are retrieving it by id.  However,
in a Basic select (structured as in Will's example, with no quotes, no
WITH criteria), the system will walk through the file group by group,
and will read each record, so yes, it will take 2 (or more, depending on
how deeply that group is in overflow) reads to get the data, but it will
have done the first read anyway to read those records - so for the Basic
SELECT, you probably want to minimize the number of groups read to the
extent that you can do so without putting many of the groups into
overflow.

2) to add records to the file, you have to access the file by the record
id, which means hashing the id to the group, then walking through the
group to see if the id is already in use, and if not, adding the record to
the end of the data area in use.  So for that, you absolutely want to
minimize the amount of overflow, because overflow slows you down on the
'adds'.

3) any sort/select or query read of the database will be slowed down
significantly by overflow, but you said you don't do much of that anyway.

Susan M. Lynch
F. W. Davison  Company, Inc.
-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 07/06/2012 12:56 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


So is there a performance increase in BASIC SELECTS by reducing overflow?
Some people are saying to reduce disk space to speed up the BASIC SELECT
while others say to reduce overflow.. I'm a bit confused. All of our
programs that read that table use a BASIC SELECT WITH..

for a BASIC select do you gain anything by reducing overflow?

Chris


 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files


 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole in order business.
 The disk drive controller has no clue on linked frames, but it *will* do
optimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in
overflow, at all. :)
 That way, when you go to ask for the *next* frame, it will always be
contiguous, and already sitting in memory.








 -Original Message-
 From: Rick Nuckolls r...@lynden.com
 To: 'U2 Users List' u2-users@listserver.u2ug.org
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files


 Most disks and disk systems cache huge amounts of information these
days, and,
 epending on 20 factors or so, one solution will be better than another
for a
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will
almost always
 in. For files that have ~10 records/group and have ~10% of the groups
 verflowed, then perhaps 1% of record reads will do a second read for the

 verflow buffer because the target key was not in the primary group.
Writing a
 ew record would possibly hit the 10% mark for reading overflow buffers.
But
 owering the split.load will increase the number of splits slightly, and
 ncrease the total number of groups considerably.  What you have shown is
that
 ou need to increase the the modulus (and select time) of a large file
more than
 0% in order to decrease the read and update times for you records 0.5%
of the
 ime (assuming, that you have only reduced the number of overflow groups
by
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual
results
 ill rapidly change if you actually add /remove records from your file,
change
 he load or number of files on your system, put in a new drive, cpu,
memory
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org]
 n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files

 he hardward look ahead of the disk drive reader will grab consecutive
 frames into memory, since it assumes you'll want the next frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)


 Original Message-
 rom: Rick Nuckolls r...@lynden.com
 o

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Wjhonson

You forgot the need to defragment, since someone suggested that my idea of 
using the intrinsic look-ahead ability is hampered by hard fragmentation.




-Original Message-
From: Rick Nuckolls r...@lynden.com
To: 'U2 Users List' u2-users@listserver.u2ug.org
Sent: Fri, Jul 6, 2012 11:20 am
Subject: Re: [U2] RESIZE - dynamic files


Logically, the graphed solution to varying the split.load value with an 
-axis=modulus, y-axis=time_to_select__read_the_whole_file is going to be 
arabolic, having very slow performance at modulus=1 and modulus = # of records.
If you actually want to find the precise low point, ignore all this bs, create 
a 
unch of files with copies of the same data, but different moduli, restart your 
ystem (including all disk drives  raid devices) in order to purge all buffers, 
nd then run the same program against each file.  I think that we would all be 
urious about the results.
Easier yet, just ignore the bs and use the defaults. :)
-Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
ent: Friday, July 06, 2012 9:56 AM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

o is there a performance increase in BASIC SELECTS by reducing overflow? Some 
eople are saying to reduce disk space to speed up the BASIC SELECT
hile others say to reduce overflow.. I'm a bit confused. All of our programs 
hat read that table use a BASIC SELECT WITH.. 
for a BASIC select do you gain anything by reducing overflow?
Chris

 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole in order business.
 The disk drive controller has no clue on linked frames, but it *will* do 
ptimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in overflow, at 
ll. :)
 That way, when you go to ask for the *next* frame, it will always be 
ontiguous, and already sitting in memory.
 
 
 
 
 
 
 
 
 -Original Message-
 From: Rick Nuckolls r...@lynden.com
 To: 'U2 Users List' u2-users@listserver.u2ug.org
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Most disks and disk systems cache huge amounts of information these days, and, 
 epending on 20 factors or so, one solution will be better than another for a 
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will almost 
lways 
 in. For files that have ~10 records/group and have ~10% of the groups 
 verflowed, then perhaps 1% of record reads will do a second read for the 
 verflow buffer because the target key was not in the primary group.  Writing a 
 ew record would possibly hit the 10% mark for reading overflow buffers. But 
 owering the split.load will increase the number of splits slightly, and 
 ncrease the total number of groups considerably.  What you have shown is that 
 ou need to increase the the modulus (and select time) of a large file more 
han 
 0% in order to decrease the read and update times for you records 0.5% of the 
 ime (assuming, that you have only reduced the number of overflow groups by 
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual results 
 ill rapidly change if you actually add /remove records from your file, change 
 he load or number of files on your system, put in a new drive, cpu, memory 
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files
 
 he hardward look ahead of the disk drive reader will grab consecutive 
 frames into memory, since it assumes you'll want the next frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)
 
 
 Original Message-
 rom: Rick Nuckolls r...@lynden.com
 o: 'U2 Users List' u2-users@listserver.u2ug.org
 ent: Thu, Jul 5, 2012 2:29 pm
 ubject: Re: [U2] RESIZE - dynamic files
 
 hris,
 or the type of use that you described earlier; BASIC selects and reads, 
 ducing overflow will have negligible performance benefit, especially compared 
  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
 latively small percentages, then it will never merge anyway (because you will 
 ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
  your optimum minimum modulus solution will probably be how ever large it 
 ows  The overhead for a group split is not as bad as it sounds unless your 
 dates/sec count is extremely high, such as during a copy.
 f you do regular SELECT and SCANS

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

Disk space is not a factor, as we are a smaller shop and disk space comes 
cheap. However, one thing I did notice is when I increased the modulus to a 
very large
number which then increased my disk space to about 3-4x of my record data, my 
SELECT queries were slower. 

Are the 2 factors when choosing HOW the file is used based on whether your 
using?

1) a lot of SELECTS (then looping through the records) 
2) grabbing individual records (not using a SELECT)

With this file we really do a lot of SELECTS (option 1), then loop through the 
records. With that being said and based on the reading I've done here it would 
appear it's better to have a little overflow
and not use up so much disk space for modulus (groups) for this application 
since we do use a lot of SELECT queries. Is this correct?

Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 
bytes'. 

It would seem to me that I would want to REDUCE my split to ~70% to reduce 
overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger than 
my current modulus (~10% bigger) since this
will be a growing file and will never merge. In my case using the formula might 
not make sense since this file will never merge. Does this make sense?


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92903 current ( minimum 31, 87 empty,
28248 overflowed, 2510 badly )
Number of records ..   1292377
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   501219328 bytes
Total size of record data ..   287426366 bytes
Total size of record IDs ...   21539682 bytes
Unused space ...   192245088 bytes
Total space for records    501211136 bytes


With all that being said if I change the following:

1) SPLIT.LOAD to 70%
2) MINIMUM.MODULUS  130,000

That's all I should really need to do to 'tweak' the performance of this file.. 
If this doesn't sound right I would be interested to hear how it should be 
tweaked instead. Thanks for all the help so far, I think
this is all starting to make sense.

Chris


 From: ro...@stamina.com.au
 To: u2-users@listserver.u2ug.org
 Date: Wed, 4 Jul 2012 01:36:26 +
 Subject: Re: [U2] RESIZE - dynamic files
 
 I would suggest that then actual goal is to achieve maximum performance for 
 your system, so knowing HOW the file is used on a daily basis can also 
 influence decisions. Disk is a cheap commodity, so having some wastage in 
 file utilization shouldn't factor. 
 
 
 Ross Ferris
 Stamina Software
 Visage  Better by Design!

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

I was able to drop from 30% overflow to 12% by making 2 changes:

1) changed the split from 80% to 70% (that alone reduce 10% overflow)
2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record 
data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )

My disk size only went up 8%..

My file looks like this now:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
14431 overflowed, 778 badly )
Number of records ..   1292377
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   70% (split), 50% (merge) and 63% (actual)
Total size .   546869248 bytes
Total size of record data ..   287789178 bytes
Total size of record IDs ...   21539538 bytes
Unused space ...   237532340 bytes
Total space for records    546861056 bytes

Chris



 From: keith.john...@datacom.co.nz
 To: u2-users@listserver.u2ug.org
 Date: Wed, 4 Jul 2012 14:05:02 +1200
 Subject: Re: [U2] RESIZE - dynamic files
 
 Doug may have had a key bounce in his input
 
  Let's do the math:
 
  258687736 (Record Size)
  192283300 (Key Size)
  
 
 The key size is actually 19283300 in Chris' figures
 
 Regarding 68,063 being less than the current modulus of 82,850.  I think the 
 answer may lie in the splitting process.
 
 As I understand it, the first time a split occurs group 1 is split and its 
 contents are split between new group 1 and new group 2. All the other groups 
 effectively get 1 added to their number. The next split is group 3 (which was 
 2) into 3 and 4 and so forth. A pointer is kept to say where the next split 
 will take place and also to help sort out how to adjust the algorithm to 
 identify which group matches a given key.
 
 Based on this, if you started with 1000 groups, by the time you have split 
 the 500th time you will have 1500 groups.  The first 1000 will be relatively 
 empty, the last 500 will probably be overflowed, but not terribly badly.  By 
 the time you get to the 1000th split, you will have 2000 groups and they 
 will, one hopes, be quite reasonably spread with very little overflow.
 
 So I expect the average access times would drift up and down in a cycle.  The 
 cycle time would get longer as the file gets bigger but the worst time would 
 be roughly the the same each cycle.
 
 Given the power of two introduced into the algorithm by the before/after the 
 split thing, I wonder if there is such a need to start off with a prime?
 
 Regards, Keith
 
 PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Martin Phillips
Hi,

The various suggestions about setting the minimum modulus to reduce overflow 
are all very well but effectively you are turning a
dynamic file into a static one, complete with all the continual maintenance 
work needed to keep the parameters in step with the
data.

In most cases, the only parameter that is worth tuning is the group size to try 
to pack things nicely. Even this is often fine left
alone though getting it to match the underlying o/s page size is helpful.

I missed the start of this thread but, unless you have a performance problem or 
are seriously short of space, my recommendation
would be to leave the dynamic files to look after themselves.

A file without overflow is not necessarily the best solution. Winding the split 
load down to 70% means that at least 30% of the file
is dead space. The implication of this is that the file is larger and will take 
more disk reads to process sequentially from one end
to the other.


Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
+44 (0)1604-709200



-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 05 July 2012 15:19
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


I was able to drop from 30% overflow to 12% by making 2 changes:

1) changed the split from 80% to 70% (that alone reduce 10% overflow)
2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record 
data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )

My disk size only went up 8%..

My file looks like this now:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
14431 overflowed, 778 badly )
Number of records ..   1292377
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   70% (split), 50% (merge) and 63% (actual)
Total size .   546869248 bytes
Total size of record data ..   287789178 bytes
Total size of record IDs ...   21539538 bytes
Unused space ...   237532340 bytes
Total space for records    546861056 bytes

Chris



 From: keith.john...@datacom.co.nz
 To: u2-users@listserver.u2ug.org
 Date: Wed, 4 Jul 2012 14:05:02 +1200
 Subject: Re: [U2] RESIZE - dynamic files
 
 Doug may have had a key bounce in his input
 
  Let's do the math:
 
  258687736 (Record Size)
  192283300 (Key Size)
  
 
 The key size is actually 19283300 in Chris' figures
 
 Regarding 68,063 being less than the current modulus of 82,850.  I think the 
 answer may lie in the splitting process.
 
 As I understand it, the first time a split occurs group 1 is split and its 
 contents are split between new group 1 and new group 2.
All the other groups effectively get 1 added to their number. The next split is 
group 3 (which was 2) into 3 and 4 and so forth. A
pointer is kept to say where the next split will take place and also to help 
sort out how to adjust the algorithm to identify which
group matches a given key.
 
 Based on this, if you started with 1000 groups, by the time you have split 
 the 500th time you will have 1500 groups.  The first
1000 will be relatively empty, the last 500 will probably be overflowed, but 
not terribly badly.  By the time you get to the 1000th
split, you will have 2000 groups and they will, one hopes, be quite reasonably 
spread with very little overflow.
 
 So I expect the average access times would drift up and down in a cycle.  The 
 cycle time would get longer as the file gets bigger
but the worst time would be roughly the the same each cycle.
 
 Given the power of two introduced into the algorithm by the before/after the 
 split thing, I wonder if there is such a need to
start off with a prime?
 
 Regards, Keith
 
 PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
Chis,

I still am wondering what is prompting you to continue using the larger group 
size.

I think that Martin, and the UV documentation is correct in this case; you 
would be as well or better off with the defaults.

-Rick

On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com 
wrote:
coming
 Hi,
 
 The various suggestions about setting the minimum modulus to reduce overflow 
 are all very well but effectively you are turning a
 dynamic file into a static one, complete with all the continual maintenance 
 work needed to keep the parameters in step with the
 data.
 
 In most cases, the only parameter that is worth tuning is the group size to 
 try to pack things nicely. Even this is often fine left
 alone though getting it to match the underlying o/s page size is helpful.
 
 I missed the start of this thread but, unless you have a performance problem 
 or are seriously short of space, my recommendation
 would be to leave the dynamic files to look after themselves.
 
 A file without overflow is not necessarily the best solution. Winding the 
 split load down to 70% means that at least 30% of the file
 is dead space. The implication of this is that the file is larger and will 
 take more disk reads to process sequentially from one end
 to the other.
 
 
 Martin Phillips
 Ladybridge Systems Ltd
 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
 +44 (0)1604-709200
 
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: 05 July 2012 15:19
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 I was able to drop from 30% overflow to 12% by making 2 changes:
 
 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record 
 data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
 
 My disk size only went up 8%..
 
 My file looks like this now:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
14431 overflowed, 778 badly )
 Number of records ..   1292377
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   70% (split), 50% (merge) and 63% (actual)
 Total size .   546869248 bytes
 Total size of record data ..   287789178 bytes
 Total size of record IDs ...   21539538 bytes
 Unused space ...   237532340 bytes
 Total space for records    546861056 bytes
 
 Chris
 
 
 
 From: keith.john...@datacom.co.nz
 To: u2-users@listserver.u2ug.org
 Date: Wed, 4 Jul 2012 14:05:02 +1200
 Subject: Re: [U2] RESIZE - dynamic files
 
 Doug may have had a key bounce in his input
 
 Let's do the math:
 
 258687736 (Record Size)
 192283300 (Key Size)
 
 
 The key size is actually 19283300 in Chris' figures
 
 Regarding 68,063 being less than the current modulus of 82,850.  I think the 
 answer may lie in the splitting process.
 
 As I understand it, the first time a split occurs group 1 is split and its 
 contents are split between new group 1 and new group 2.
 All the other groups effectively get 1 added to their number. The next split 
 is group 3 (which was 2) into 3 and 4 and so forth. A
 pointer is kept to say where the next split will take place and also to help 
 sort out how to adjust the algorithm to identify which
 group matches a given key.
 
 Based on this, if you started with 1000 groups, by the time you have split 
 the 500th time you will have 1500 groups.  The first
 1000 will be relatively empty, the last 500 will probably be overflowed, but 
 not terribly badly.  By the time you get to the 1000th
 split, you will have 2000 groups and they will, one hopes, be quite 
 reasonably spread with very little overflow.
 
 So I expect the average access times would drift up and down in a cycle.  
 The cycle time would get longer as the file gets bigger
 but the worst time would be roughly the the same each cycle.
 
 Given the power of two introduced into the algorithm by the before/after the 
 split thing, I wonder if there is such a need to
 start off with a prime?
 
 Regards, Keith
 
 PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
 ___
 U2-Users mailing list

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

Rick,

You are correct, I should be using the smaller size (I just haven't changed it 
yet). Based on the reading I have done you should
only use the larger group size when the average record size is greater than 
1000 bytes. 

As far as being better off with the defaults that's basically what I'm trying 
to test (as well as learn how linear hashing works). I was able
to reduce my overflow by 18% and I only increased my empty groups by a very 
small amount as well as only increased my file size
by 8%. This in theory should be better for reads/writes than what I had before. 

To test the performance I need to write a ton of records and then capture the 
output and compare the output using timestamps. 

Chris


 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Thu, 5 Jul 2012 09:22:02 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 Chis,
 
 I still am wondering what is prompting you to continue using the larger group 
 size.
 
 I think that Martin, and the UV documentation is correct in this case; you 
 would be as well or better off with the defaults.
 
 -Rick
 
 On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com 
 wrote:
 coming
  Hi,
  
  The various suggestions about setting the minimum modulus to reduce 
  overflow are all very well but effectively you are turning a
  dynamic file into a static one, complete with all the continual maintenance 
  work needed to keep the parameters in step with the
  data.
  
  In most cases, the only parameter that is worth tuning is the group size to 
  try to pack things nicely. Even this is often fine left
  alone though getting it to match the underlying o/s page size is helpful.
  
  I missed the start of this thread but, unless you have a performance 
  problem or are seriously short of space, my recommendation
  would be to leave the dynamic files to look after themselves.
  
  A file without overflow is not necessarily the best solution. Winding the 
  split load down to 70% means that at least 30% of the file
  is dead space. The implication of this is that the file is larger and will 
  take more disk reads to process sequentially from one end
  to the other.
  
  
  Martin Phillips
  Ladybridge Systems Ltd
  17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
  +44 (0)1604-709200
  
  
  
  -Original Message-
  From: u2-users-boun...@listserver.u2ug.org 
  [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
  Sent: 05 July 2012 15:19
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  I was able to drop from 30% overflow to 12% by making 2 changes:
  
  1) changed the split from 80% to 70% (that alone reduce 10% overflow)
  2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record 
  data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
  
  My disk size only went up 8%..
  
  My file looks like this now:
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
 14431 overflowed, 778 badly )
  Number of records ..   1292377
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   70% (split), 50% (merge) and 63% (actual)
  Total size .   546869248 bytes
  Total size of record data ..   287789178 bytes
  Total size of record IDs ...   21539538 bytes
  Unused space ...   237532340 bytes
  Total space for records    546861056 bytes
  
  Chris
  
  
  
  From: keith.john...@datacom.co.nz
  To: u2-users@listserver.u2ug.org
  Date: Wed, 4 Jul 2012 14:05:02 +1200
  Subject: Re: [U2] RESIZE - dynamic files
  
  Doug may have had a key bounce in his input
  
  Let's do the math:
  
  258687736 (Record Size)
  192283300 (Key Size)
  
  
  The key size is actually 19283300 in Chris' figures
  
  Regarding 68,063 being less than the current modulus of 82,850.  I think 
  the answer may lie in the splitting process.
  
  As I understand it, the first time a split occurs group 1 is split and its 
  contents are split between new group 1 and new group 2.
  All the other groups effectively get 1 added to their number. The next 
  split is group 3 (which was 2) into 3 and 4 and so forth. A
  pointer is kept to say where the next split will take place and also to 
  help sort out how to adjust the algorithm to identify which
  group matches a given key.
  
  Based on this, if you started with 1000 groups, by the time you have split 
  the 500th time you will have 1500 groups.  The first
  1000 will be relatively empty, the last 500 will probably be overflowed, 
  but not terribly badly.  By the time you get

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
Chris,

For the type of use that you described earlier; BASIC selects and reads, 
reducing overflow will have negligible performance benefit, especially compared 
to changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
relatively small percentages, then it will never merge anyway (because you will 
need to delete 20-30% of the file for that to happen with the mergeload at 50%, 
so your optimum minimum modulus solution will probably be how ever large it 
grows  The overhead for a group split is not as bad as it sounds unless your 
updates/sec count is extremely high, such as during a copy.

If you do regular SELECT and SCANS of the entire file, then your goal should be 
to reduce the total disk size of the file, and not worry much about common 
overflow. The important thing is that the file is dynamic, so you will never 
encounter the issues that undersized statically hashed files develop.

We have thousands of dynamically hashed files on our (Solaris) systems, with an 
extremely low problem rate.

Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Thursday, July 05, 2012 11:21 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Rick,

You are correct, I should be using the smaller size (I just haven't changed it 
yet). Based on the reading I have done you should
only use the larger group size when the average record size is greater than 
1000 bytes. 

As far as being better off with the defaults that's basically what I'm trying 
to test (as well as learn how linear hashing works). I was able
to reduce my overflow by 18% and I only increased my empty groups by a very 
small amount as well as only increased my file size
by 8%. This in theory should be better for reads/writes than what I had before. 

To test the performance I need to write a ton of records and then capture the 
output and compare the output using timestamps. 

Chris


 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Thu, 5 Jul 2012 09:22:02 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 Chis,
 
 I still am wondering what is prompting you to continue using the larger group 
 size.
 
 I think that Martin, and the UV documentation is correct in this case; you 
 would be as well or better off with the defaults.
 
 -Rick
 
 On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com 
 wrote:
 coming
  Hi,
  
  The various suggestions about setting the minimum modulus to reduce 
  overflow are all very well but effectively you are turning a
  dynamic file into a static one, complete with all the continual maintenance 
  work needed to keep the parameters in step with the
  data.
  
  In most cases, the only parameter that is worth tuning is the group size to 
  try to pack things nicely. Even this is often fine left
  alone though getting it to match the underlying o/s page size is helpful.
  
  I missed the start of this thread but, unless you have a performance 
  problem or are seriously short of space, my recommendation
  would be to leave the dynamic files to look after themselves.
  
  A file without overflow is not necessarily the best solution. Winding the 
  split load down to 70% means that at least 30% of the file
  is dead space. The implication of this is that the file is larger and will 
  take more disk reads to process sequentially from one end
  to the other.
  
  
  Martin Phillips
  Ladybridge Systems Ltd
  17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
  +44 (0)1604-709200
  
  
  
  -Original Message-
  From: u2-users-boun...@listserver.u2ug.org 
  [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
  Sent: 05 July 2012 15:19
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  I was able to drop from 30% overflow to 12% by making 2 changes:
  
  1) changed the split from 80% to 70% (that alone reduce 10% overflow)
  2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record 
  data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
  
  My disk size only went up 8%..
  
  My file looks like this now:
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
 14431 overflowed, 778 badly )
  Number of records ..   1292377
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   70% (split), 50% (merge) and 63% (actual)
  Total size .   546869248 bytes
  Total size of record data ..   287789178 bytes
  Total size of record IDs

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wjhonson

The hardward look ahead of the disk drive reader will grab consecutive 
frames into memory, since it assumes you'll want the next frame next.
So the less overflow you have, the faster a full file scan will become.
At least that's my theory ;)




-Original Message-
From: Rick Nuckolls r...@lynden.com
To: 'U2 Users List' u2-users@listserver.u2ug.org
Sent: Thu, Jul 5, 2012 2:29 pm
Subject: Re: [U2] RESIZE - dynamic files


Chris,
For the type of use that you described earlier; BASIC selects and reads, 
educing overflow will have negligible performance benefit, especially compared 
o changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
elatively small percentages, then it will never merge anyway (because you will 
eed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
o your optimum minimum modulus solution will probably be how ever large it 
rows  The overhead for a group split is not as bad as it sounds unless your 
pdates/sec count is extremely high, such as during a copy.
If you do regular SELECT and SCANS of the entire file, then your goal should be 
o reduce the total disk size of the file, and not worry much about common 
verflow. The important thing is that the file is dynamic, so you will never 
ncounter the issues that undersized statically hashed files develop.
We have thousands of dynamically hashed files on our (Solaris) systems, with an 
xtremely low problem rate.
Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
ent: Thursday, July 05, 2012 11:21 AM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

ick,
You are correct, I should be using the smaller size (I just haven't changed it 
et). Based on the reading I have done you should
nly use the larger group size when the average record size is greater than 1000 
ytes. 
As far as being better off with the defaults that's basically what I'm trying 
to 
est (as well as learn how linear hashing works). I was able
o reduce my overflow by 18% and I only increased my empty groups by a very 
mall amount as well as only increased my file size
y 8%. This in theory should be better for reads/writes than what I had before. 
To test the performance I need to write a ton of records and then capture the 
utput and compare the output using timestamps. 
Chris

 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Thu, 5 Jul 2012 09:22:02 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 Chis,
 
 I still am wondering what is prompting you to continue using the larger group 
ize.
 
 I think that Martin, and the UV documentation is correct in this case; you 
ould be as well or better off with the defaults.
 
 -Rick
 
 On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com 
rote:
 coming
  Hi,
  
  The various suggestions about setting the minimum modulus to reduce overflow 
re all very well but effectively you are turning a
  dynamic file into a static one, complete with all the continual maintenance 
ork needed to keep the parameters in step with the
  data.
  
  In most cases, the only parameter that is worth tuning is the group size to 
ry to pack things nicely. Even this is often fine left
  alone though getting it to match the underlying o/s page size is helpful.
  
  I missed the start of this thread but, unless you have a performance problem 
r are seriously short of space, my recommendation
  would be to leave the dynamic files to look after themselves.
  
  A file without overflow is not necessarily the best solution. Winding the 
plit load down to 70% means that at least 30% of the file
  is dead space. The implication of this is that the file is larger and will 
ake more disk reads to process sequentially from one end
  to the other.
  
  
  Martin Phillips
  Ladybridge Systems Ltd
  17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
  +44 (0)1604-709200
  
  
  
  -Original Message-
  From: u2-users-boun...@listserver.u2ug.org 
  [mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
  Sent: 05 July 2012 15:19
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  I was able to drop from 30% overflow to 12% by making 2 changes:
  
  1) changed the split from 80% to 70% (that alone reduce 10% overflow)
  2) changed the MINIMUM.MODULUS to 118,681 (calculated this way - [ (record 
ata + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
  
  My disk size only went up 8%..
  
  My file looks like this now:
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
 14431 overflowed, 778 badly

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Charles Stevenson

Chris,

I can appreciate what you are doing as an academic exercise.

You seem happy how it looks at this moment, where, because you set  
MINIMUM.MODULUS  118681, you ended up with a current load of 63%.
But think about it:  as you add records, the load will reach 70%, per 
SPLIT.LOAD 70,  then splits will keep occuring and current modlus with 
grow past 118681.  MINIMUM.MODULUS will never matter again.  (This was 
described as an ever-growing file.)


If the current config is what you want, why not just set SPLIT.LOAD 63  
MINIMUM.MODULUS 1.   That way the ratio that you like today will stay 
like this forever.


MINIMUM.MODULUS will not matter unless data is deleted.  It says to not 
shrink the file structure below that minimally allocated disk space, 
even if there is no data to occupy it.  That's really all 
MINIMUM.MODULUS is for.


Play with it all you want, because it puts you in a good place when some 
crisis happens.  At the end of the day, with this file, you'll find your 
tuning won't matter much.  Not a lot of help, but not much harm if you 
tweak it wrong, either.



On 7/5/2012 1:20 PM, Chris Austin wrote:

Rick,

You are correct, I should be using the smaller size (I just haven't changed it 
yet). Based on the reading I have done you should
only use the larger group size when the average record size is greater than 
1000 bytes.

As far as being better off with the defaults that's basically what I'm trying 
to test (as well as learn how linear hashing works). I was able
to reduce my overflow by 18% and I only increased my empty groups by a very 
small amount as well as only increased my file size
by 8%. This in theory should be better for reads/writes than what I had before.

To test the performance I need to write a ton of records and then capture the 
output and compare the output using timestamps.

Chris


___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wols Lists
On 05/07/12 16:12, Martin Phillips wrote:
 A file without overflow is not necessarily the best solution. Winding the 
 split load down to 70% means that at least 30% of the file
 is dead space. The implication of this is that the file is larger and will 
 take more disk reads to process sequentially from one end
 to the other.

Whoops Martin, I think you've made the classic percentages mistake here ...

The file is 30/70, or 42% dead space at least. A file with the default
80% split is at least 25% dead space.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wols Lists
On 05/07/12 14:49, Chris Austin wrote:
 
 Disk space is not a factor, as we are a smaller shop and disk space comes 
 cheap. However, one thing I did notice is when I increased the modulus to a 
 very large
 number which then increased my disk space to about 3-4x of my record data, my 
 SELECT queries were slower. 
 
 Are the 2 factors when choosing HOW the file is used based on whether your 
 using?
 
 1) a lot of SELECTS (then looping through the records) 

Is that a BASIC select, or a RETRIEVE select?

 2) grabbing individual records (not using a SELECT)
 
 With this file we really do a lot of SELECTS (option 1), then loop through 
 the records. With that being said and based on the reading I've done here it 
 would appear it's better to have a little overflow
 and not use up so much disk space for modulus (groups) for this application 
 since we do use a lot of SELECT queries. Is this correct?

If your selects are BASIC selects, then you won't notice too much
difference. If they are RETRIEVE selects, then reducing SPLIT will
increase the cost of the SELECT.

In both cases, if the RETRIEVE select is not BY, then the cost of
processing the list should not be seriously impacted.

(On a SELECT WITH index, however, reducing overflow will speed things up
a bit, probably not an awful lot.)
 
 Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 
 bytes'. 
 
 It would seem to me that I would want to REDUCE my split to ~70% to reduce 
 overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger 
 than my current modulus (~10% bigger) since this
 will be a growing file and will never merge. In my case using the formula 
 might not make sense since this file will never merge. Does this make sense?
 
If the file will only ever grow, then MINIMUM.MODULUS is probably a
waste of time. You are best using that in one of two circumstances,
either (a) you are populating a file with a large number of initial
records and you are forcing the modulus to what it's likely to end up
anyway, or (b) your file grows and shrinks violently in size, and you
are forcing it to its typical state.

The first scenario simply avoids a bunch of inevitable splits, the
second avoids a yoyo split/merge/split scenario.

I'd just leave the settings at 80/20, and only use MINIMUM.MODULUS if I
was creating a copy of the file (setting the new minimum at the current
modulo of the existing file).

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
Most disks and disk systems cache huge amounts of information these days, and, 
depending on 20 factors or so, one solution will be better than another for a 
given file.

For the wholesale, SELECT F WITH, The fewest disk records will almost 
always win. For files that have ~10 records/group and have ~10% of the groups 
overflowed, then perhaps 1% of record reads will do a second read for the 
overflow buffer because the target key was not in the primary group.  Writing a 
new record would possibly hit the 10% mark for reading overflow buffers. But 
lowering the split.load will increase the number of splits slightly, and 
increase the total number of groups considerably.  What you have shown is that 
you need to increase the the modulus (and select time) of a large file more 
than 10% in order to decrease the read and update times for you records 0.5% of 
the time (assuming, that you have only reduced the number of overflow groups by 
~50%.)

As Charles suggests, this is an interesting exercise, but your actual results 
will rapidly change if you actually add /remove records from your file, change 
the load or number of files on your system, put in a new drive, cpu, memory 
board, or install a new release of Universe, move to raid, etc.

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson
Sent: Thursday, July 05, 2012 2:38 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


The hardward look ahead of the disk drive reader will grab consecutive 
frames into memory, since it assumes you'll want the next frame next.
So the less overflow you have, the faster a full file scan will become.
At least that's my theory ;)




-Original Message-
From: Rick Nuckolls r...@lynden.com
To: 'U2 Users List' u2-users@listserver.u2ug.org
Sent: Thu, Jul 5, 2012 2:29 pm
Subject: Re: [U2] RESIZE - dynamic files


Chris,
For the type of use that you described earlier; BASIC selects and reads, 
educing overflow will have negligible performance benefit, especially compared 
o changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
elatively small percentages, then it will never merge anyway (because you will 
eed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
o your optimum minimum modulus solution will probably be how ever large it 
rows  The overhead for a group split is not as bad as it sounds unless your 
pdates/sec count is extremely high, such as during a copy.
If you do regular SELECT and SCANS of the entire file, then your goal should be 
o reduce the total disk size of the file, and not worry much about common 
verflow. The important thing is that the file is dynamic, so you will never 
ncounter the issues that undersized statically hashed files develop.
We have thousands of dynamically hashed files on our (Solaris) systems, with an 
xtremely low problem rate.
Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
ent: Thursday, July 05, 2012 11:21 AM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

ick,
You are correct, I should be using the smaller size (I just haven't changed it 
et). Based on the reading I have done you should
nly use the larger group size when the average record size is greater than 1000 
ytes. 
As far as being better off with the defaults that's basically what I'm trying 
to 
est (as well as learn how linear hashing works). I was able
o reduce my overflow by 18% and I only increased my empty groups by a very 
mall amount as well as only increased my file size
y 8%. This in theory should be better for reads/writes than what I had before. 
To test the performance I need to write a ton of records and then capture the 
utput and compare the output using timestamps. 
Chris

 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Thu, 5 Jul 2012 09:22:02 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 Chis,
 
 I still am wondering what is prompting you to continue using the larger group 
ize.
 
 I think that Martin, and the UV documentation is correct in this case; you 
ould be as well or better off with the defaults.
 
 -Rick
 
 On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com 
rote:
 coming
  Hi,
  
  The various suggestions about setting the minimum modulus to reduce overflow 
re all very well but effectively you are turning a
  dynamic file into a static one, complete with all the continual maintenance 
ork needed to keep the parameters in step with the
  data.
  
  In most cases, the only parameter that is worth tuning is the group size to 
ry to pack things nicely. Even this is often fine left
  alone though getting it to match the underlying o/s page size is helpful.
  
  I missed the start of this thread but, unless you have a performance problem 
r are seriously short

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wjhonson

A BASIC SELECT cannot use criteria at all.
It is going to walk through every record in the file, in order.
And that's the sticky wicket. That whole in order business.
The disk drive controller has no clue on linked frames, but it *will* do 
optimistic look aheads for you.
So you are much better off, for BASIC SELECTs having nothing in overflow, at 
all. :)
That way, when you go to ask for the *next* frame, it will always be 
contiguous, and already sitting in memory.








-Original Message-
From: Rick Nuckolls r...@lynden.com
To: 'U2 Users List' u2-users@listserver.u2ug.org
Sent: Thu, Jul 5, 2012 4:43 pm
Subject: Re: [U2] RESIZE - dynamic files


Most disks and disk systems cache huge amounts of information these days, and, 
epending on 20 factors or so, one solution will be better than another for a 
iven file.
For the wholesale, SELECT F WITH, The fewest disk records will almost 
always 
in. For files that have ~10 records/group and have ~10% of the groups 
verflowed, then perhaps 1% of record reads will do a second read for the 
verflow buffer because the target key was not in the primary group.  Writing a 
ew record would possibly hit the 10% mark for reading overflow buffers. But 
owering the split.load will increase the number of splits slightly, and 
ncrease the total number of groups considerably.  What you have shown is that 
ou need to increase the the modulus (and select time) of a large file more than 
0% in order to decrease the read and update times for you records 0.5% of the 
ime (assuming, that you have only reduced the number of overflow groups by 
50%.)
As Charles suggests, this is an interesting exercise, but your actual results 
ill rapidly change if you actually add /remove records from your file, change 
he load or number of files on your system, put in a new drive, cpu, memory 
oard, or install a new release of Universe, move to raid, etc.
-Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Wjhonson
ent: Thursday, July 05, 2012 2:38 PM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

he hardward look ahead of the disk drive reader will grab consecutive 
frames into memory, since it assumes you'll want the next frame next.
o the less overflow you have, the faster a full file scan will become.
t least that's my theory ;)


Original Message-
rom: Rick Nuckolls r...@lynden.com
o: 'U2 Users List' u2-users@listserver.u2ug.org
ent: Thu, Jul 5, 2012 2:29 pm
ubject: Re: [U2] RESIZE - dynamic files

hris,
or the type of use that you described earlier; BASIC selects and reads, 
ducing overflow will have negligible performance benefit, especially compared 
 changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
latively small percentages, then it will never merge anyway (because you will 
ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
 your optimum minimum modulus solution will probably be how ever large it 
ows  The overhead for a group split is not as bad as it sounds unless your 
dates/sec count is extremely high, such as during a copy.
f you do regular SELECT and SCANS of the entire file, then your goal should be 
 reduce the total disk size of the file, and not worry much about common 
erflow. The important thing is that the file is dynamic, so you will never 
counter the issues that undersized statically hashed files develop.
e have thousands of dynamically hashed files on our (Solaris) systems, with an 
tremely low problem rate.
ick
Original Message-
om: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
nt: Thursday, July 05, 2012 11:21 AM
: u2-users@listserver.u2ug.org
bject: Re: [U2] RESIZE - dynamic files
ick,
ou are correct, I should be using the smaller size (I just haven't changed it 
t). Based on the reading I have done you should
ly use the larger group size when the average record size is greater than 1000 
tes. 
s far as being better off with the defaults that's basically what I'm trying to 
est (as well as learn how linear hashing works). I was able
 reduce my overflow by 18% and I only increased my empty groups by a very 
all amount as well as only increased my file size
 8%. This in theory should be better for reads/writes than what I had before. 
o test the performance I need to write a ton of records and then capture the 
tput and compare the output using timestamps. 
hris
 From: r...@lynden.com
To: u2-users@listserver.u2ug.org
Date: Thu, 5 Jul 2012 09:22:02 -0700
Subject: Re: [U2] RESIZE - dynamic files

Chis,

I still am wondering what is prompting you to continue using the larger group 
ze.

I think that Martin, and the UV documentation is correct in this case; you 
uld be as well or better off with the defaults.

-Rick

On Jul 5, 2012, at 9:13 AM, Martin Phillips martinphill...@ladybridge.com 
ote:
coming
 Hi

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
This will be mostly true if the full extent of the file was allocated at one 
time as a contiguous block, which could be a big plus.
As a file grows, sectors will be allocated piecemeal and when the hardware 
reads ahead, it will not necessarily be reading sectors in the same file.
Curiously, an old Pr1me CAM file had a trick around this, though it was late 
coming onto the scene.  Unix also has a few tricks, but they are only partial 
solutions to file fragmentation.  And Windows

Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson
Sent: Thursday, July 05, 2012 5:12 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


A BASIC SELECT cannot use criteria at all.
It is going to walk through every record in the file, in order.
And that's the sticky wicket. That whole in order business.
The disk drive controller has no clue on linked frames, but it *will* do 
optimistic look aheads for you.
So you are much better off, for BASIC SELECTs having nothing in overflow, at 
all. :)
That way, when you go to ask for the *next* frame, it will always be 
contiguous, and already sitting in memory.








-Original Message-
From: Rick Nuckolls r...@lynden.com
To: 'U2 Users List' u2-users@listserver.u2ug.org
Sent: Thu, Jul 5, 2012 4:43 pm
Subject: Re: [U2] RESIZE - dynamic files


Most disks and disk systems cache huge amounts of information these days, and, 
epending on 20 factors or so, one solution will be better than another for a 
iven file.
For the wholesale, SELECT F WITH, The fewest disk records will almost 
always 
in. For files that have ~10 records/group and have ~10% of the groups 
verflowed, then perhaps 1% of record reads will do a second read for the 
verflow buffer because the target key was not in the primary group.  Writing a 
ew record would possibly hit the 10% mark for reading overflow buffers. But 
owering the split.load will increase the number of splits slightly, and 
ncrease the total number of groups considerably.  What you have shown is that 
ou need to increase the the modulus (and select time) of a large file more than 
0% in order to decrease the read and update times for you records 0.5% of the 
ime (assuming, that you have only reduced the number of overflow groups by 
50%.)
As Charles suggests, this is an interesting exercise, but your actual results 
ill rapidly change if you actually add /remove records from your file, change 
he load or number of files on your system, put in a new drive, cpu, memory 
oard, or install a new release of Universe, move to raid, etc.
-Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Wjhonson
ent: Thursday, July 05, 2012 2:38 PM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

he hardward look ahead of the disk drive reader will grab consecutive 
frames into memory, since it assumes you'll want the next frame next.
o the less overflow you have, the faster a full file scan will become.
t least that's my theory ;)


Original Message-
rom: Rick Nuckolls r...@lynden.com
o: 'U2 Users List' u2-users@listserver.u2ug.org
ent: Thu, Jul 5, 2012 2:29 pm
ubject: Re: [U2] RESIZE - dynamic files

hris,
or the type of use that you described earlier; BASIC selects and reads, 
ducing overflow will have negligible performance benefit, especially compared 
 changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
latively small percentages, then it will never merge anyway (because you will 
ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
 your optimum minimum modulus solution will probably be how ever large it 
ows  The overhead for a group split is not as bad as it sounds unless your 
dates/sec count is extremely high, such as during a copy.
f you do regular SELECT and SCANS of the entire file, then your goal should be 
 reduce the total disk size of the file, and not worry much about common 
erflow. The important thing is that the file is dynamic, so you will never 
counter the issues that undersized statically hashed files develop.
e have thousands of dynamically hashed files on our (Solaris) systems, with an 
tremely low problem rate.
ick
Original Message-
om: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
nt: Thursday, July 05, 2012 11:21 AM
: u2-users@listserver.u2ug.org
bject: Re: [U2] RESIZE - dynamic files
ick,
ou are correct, I should be using the smaller size (I just haven't changed it 
t). Based on the reading I have done you should
ly use the larger group size when the average record size is greater than 1000 
tes. 
s far as being better off with the defaults that's basically what I'm trying to 
est (as well as learn how linear hashing works). I was able
 reduce my overflow by 18% and I only

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

That's what we use, 'BASIC SELECT' statements for this table, looping through 
records to build reports. It's an accounting table that has about 200-300 
records WRITES a day, with an average
of about ~250 bytes per record. We obviously have more READ operations since we 
are always building up these reports so I was hoping my #'s looked right. 

1) I reduced overflow by 18%.
2) I only increased file size ~8%.

So we do a combination of BASIC SELECTS and WRITES. Everything is done in the 
latest version of Rocket's Universe, PICK using BASIC for our programs that 
contain the SELECTS.

Chris


 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole in order business.
 The disk drive controller has no clue on linked frames, but it *will* do 
 optimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in overflow, at 
 all. :)
 That way, when you go to ask for the *next* frame, it will always be 
 contiguous, and already sitting in memory.
 
 
 
 
 
 
 
 
 -Original Message-
 From: Rick Nuckolls r...@lynden.com
 To: 'U2 Users List' u2-users@listserver.u2ug.org
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Most disks and disk systems cache huge amounts of information these days, 
 and, 
 epending on 20 factors or so, one solution will be better than another for a 
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will almost 
 always 
 in. For files that have ~10 records/group and have ~10% of the groups 
 verflowed, then perhaps 1% of record reads will do a second read for the 
 verflow buffer because the target key was not in the primary group.  Writing 
 a 
 ew record would possibly hit the 10% mark for reading overflow buffers. But 
 owering the split.load will increase the number of splits slightly, and 
 ncrease the total number of groups considerably.  What you have shown is that 
 ou need to increase the the modulus (and select time) of a large file more 
 than 
 0% in order to decrease the read and update times for you records 0.5% of the 
 ime (assuming, that you have only reduced the number of overflow groups by 
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual results 
 ill rapidly change if you actually add /remove records from your file, change 
 he load or number of files on your system, put in a new drive, cpu, memory 
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files
 
 he hardward look ahead of the disk drive reader will grab consecutive 
 frames into memory, since it assumes you'll want the next frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)
 
 
 Original Message-
 rom: Rick Nuckolls r...@lynden.com
 o: 'U2 Users List' u2-users@listserver.u2ug.org
 ent: Thu, Jul 5, 2012 2:29 pm
 ubject: Re: [U2] RESIZE - dynamic files
 
 hris,
 or the type of use that you described earlier; BASIC selects and reads, 
 ducing overflow will have negligible performance benefit, especially compared 
  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
 latively small percentages, then it will never merge anyway (because you will 
 ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
  your optimum minimum modulus solution will probably be how ever large it 
 ows  The overhead for a group split is not as bad as it sounds unless your 
 dates/sec count is extremely high, such as during a copy.
 f you do regular SELECT and SCANS of the entire file, then your goal should 
 be 
  reduce the total disk size of the file, and not worry much about common 
 erflow. The important thing is that the file is dynamic, so you will never 
 counter the issues that undersized statically hashed files develop.
 e have thousands of dynamically hashed files on our (Solaris) systems, with 
 an 
 tremely low problem rate.
 ick
 Original Message-
 om: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] 
 n Behalf Of Chris Austin
 nt: Thursday, July 05, 2012 11:21 AM
 : u2-users@listserver.u2ug.org
 bject: Re: [U2] RESIZE - dynamic files
 ick,
 ou are correct, I should be using the smaller size (I just haven't changed it 
 t). Based on the reading I have done you should
 ly use the larger group size when the average record size is greater than 
 1000 
 tes. 
 s far as being better off with the defaults that's basically what I'm trying 
 to 
 est

Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Brian Leach
 All the other groups effectively get 1 added to their number

Not exactly.

Sorry to those who already know this, but maybe it's time to go over linear
hashing in theory ..

Linear hashing was a system devised by Litwin and originally only for
in-memory lists. In fact there's some good implementations in C# that
provide better handling of Dictionary types. Applying it to a file system
adds some complexity but it's basically the same theory.

Let's start with a file that has 100 groups initially defined (that's 0
through 99). That is your minimum starting point and should ensure that it
never shrinks below that, so it doesn't begin it's life with loads of splits
right from the start as you populate the file. You would size this similarly
to the way you size a regular hashed file for your initial content: no point
making work for yourself (or the database).

As data gets added, because the content is allocated unevenly, some of that
load will be in primary and some in overflow: that's just the way of the
world. No hashing is perfect. Unlike a static file, the overflow can't be
added to the end of the file as a linked list (* why nobody has done managed
overflow is beyond me), it has to sit in a separate file.

At some point the amount of data held in respect of the available space
reaches a critical level and the file needs to reorganize. Rather than split
the most heavily populated group - which would be the obvious thing - linear
hashing works on the basis of a split pointer that moves incrementally
through the file. So the first split breaks group 0 and adds group 100 to
the end of the file, hopefully moving around half the content of group 0 to
this new group. Of course, there is no guarantee that it will (depending on
key structure) and also no guarantee that this will help anything, if group
0 isn't overflowed or populated anyway. So the next write may also cause a
split, except now to split group 1 into a new group 101, and so forth.

Eventually the pointer will reach the end and all the initial 100 groups
will have been split, and the whole process restarts with the split pointer
moving back to zero. You now have 200 groups and by this time everything
should in theory have levelled out, but in the meantime there is still
overloading and stuff will still be in overflow. The next split will create
group 200 and split half of group 0 into it, and the whole process repeats
for ever.

Oversized records ( buffer size) also get moved out because they stuff up
the block allocation.

So why this crazy system, rather than hitting the filled groups as they get
overstuffed? Because it makes finding a record easy. Because linear hashing
is based on a power of 2, the maths is simple - if the group is after the
split point, the record MUST be in that group (or its overflow). If it is
before the split point, it could be in the original group or the split
group: so you can just rehash with double the modulus to check which one
without even having to scan the groups.

What makes the implementation difficult is that Litwin et al were all
assuming a single threaded access to an in-memory list. Concurrent access
whilst maintaining the load factor, split pointer and splitting all add a
lot more complexity, unless you lock the whole file for the duration of an
IO operation and kill the performance.

And coming back to the manual, storing large numbers of data items - even
large ones - in a type 19 file is a bad idea. Traversing directories is
slow, especially in Windows, and locking is done against the whole
directory..

Brian




___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Charles Stevenson

Good explanation, Brian!
To anyone who skipped it because it looked long:  read it anyway.
cds

On 7/4/2012 5:26 AM, Brian Leach wrote:

Sorry to those who already know this, but maybe it's time to go over linear 
hashing in theory ..

Linear hashing was a system devised by Litwin and originally only for in-memory 
lists. In fact there's some good implementations in C# that provide better 
handling of Dictionary types. Applying it to a file system adds some complexity 
but it's basically the same theory.

Let's start with a file that has 100 groups initially defined (that's 0 through 
99). That is your minimum starting point and should ensure that it never 
shrinks below that, so it doesn't begin it's life with loads of splits right 
from the start as you populate the file. You would size this similarly to the 
way you size a regular hashed file for your initial content: no point making 
work for yourself (or the database).

As data gets added, because the content is allocated unevenly, some of that 
load will be in primary and some in overflow: that's just the way of the world. 
No hashing is perfect. Unlike a static file, the overflow can't be added to the 
end of the file as a linked list (* why nobody has done managed overflow is 
beyond me), it has to sit in a separate file.

At some point the amount of data held in respect of the available space reaches 
a critical level and the file needs to reorganize. Rather than split the most 
heavily populated group - which would be the obvious thing - linear hashing 
works on the basis of a split pointer that moves incrementally through the 
file. So the first split breaks group 0 and adds group 100 to the end of the 
file, hopefully moving around half the content of group 0 to this new group. Of 
course, there is no guarantee that it will depending on key structure) and also 
no guarantee that this will help anything, if group 0 isn't overflowed or 
populated anyway. So the next write may also cause a split, except now to split 
group 1 into a new group 101, and so forth.

Eventually the pointer will reach the end and all the initial 100 groups will 
have been split, and the whole process restarts with the split pointer moving 
back to zero. You now have 200 groups and by this time everything should in 
theory have levelled out, but in the meantime there is still overloading and 
stuff will still be in overflow. The next split will create group 200 and split 
half of group 0 into it, and the whole process repeats for ever.

Oversized records ( buffer size) also get moved out because they stuff up the 
block allocation.

So why this crazy system, rather than hitting the filled groups as they get 
overstuffed? Because it makes finding a record easy. Because linear hashing is 
based on a power of 2, the maths is simple - if the group is after the split 
point, the record MUST be in that group (or its overflow). If it is before the 
split point, it could be in the original group or the split group: so you can 
just rehash with double the modulus to check which one without even having to 
scan the groups.

What makes the implementation difficult is that Litwin et al were all assuming 
a single threaded access to an in-memory list. Concurrent access whilst 
maintaining the load factor, split pointer and splitting all add a lot more 
complexity, unless you lock the whole file for the duration of an IO operation 
and kill the performance.

And coming back to the manual, storing large numbers of data items - even large 
ones - in a type 19 file is a bad idea. Traversing directories is slow, 
especially in Windows, and locking is done against the whole directory.

Brian

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Rick Nuckolls
This makes it sound as if you might need to search two groups for a record, 
which is not correct.  If the initial hash is based on the larger modulo, and 
the group exists, then the key will be in the higher number group.  If the 
result of the first hash is larger than the modulus of the of the table, then 
you rehash with the smaller modulus.

And the modulo used for hashing is always a power of two. 

So if the initial hash function on a key is f(x), then the key will either be 
in f(x) mod 2**n or, if that group has not been created, then in f(x) mod 
2**(n-1).  n increases each time that the modulus grows to equal 2**n+1. So

For a modulus of 3 or 4, n = 2; for 5,6,7,8, n =3.

For instance:

If your groups are numbered 0-5 (6 groups), and your hash value is 5, then you 
are in the last (6th) group because the 5 mod 8 is 5.  Likewise 6 mod 8 is 6, 
but this would be beyond the highest group we have (5).  6 mod 4 is 2, and that 
is the group where 6 should fall. Likewise 7 should fall into group 3.  After 
two more splits (of groups 2  3) the modulus will be 8, and no rehashing is 
necessary until we next split group 0 and add a 9th group, at which point we 
start with a mod 16, and use 8 if the first result is over 8 (8 would go into 
the 9th group, 9 would hash into the second group, #1 -- 9 mod 4 - 1.


Admittedly, this is probably at least as confusing as every other explanation 
of the process ;-)

-Rick

On Jul 4, 2012, at 3:26 AM, Brian Leach wrote:

 All the other groups effectively get 1 added to their number
 
 Not exactly.
 
 Sorry to those who already know this, but maybe it's time to go over linear
 hashing in theory ..
 
 Linear hashing was a system devised by Litwin and originally only for
 in-memory lists. In fact there's some good implementations in C# that
 provide better handling of Dictionary types. Applying it to a file system
 adds some complexity but it's basically the same theory.
 
 Let's start with a file that has 100 groups initially defined (that's 0
 through 99). That is your minimum starting point and should ensure that it
 never shrinks below that, so it doesn't begin it's life with loads of splits
 right from the start as you populate the file. You would size this similarly
 to the way you size a regular hashed file for your initial content: no point
 making work for yourself (or the database).
 
 As data gets added, because the content is allocated unevenly, some of that
 load will be in primary and some in overflow: that's just the way of the
 world. No hashing is perfect. Unlike a static file, the overflow can't be
 added to the end of the file as a linked list (* why nobody has done managed
 overflow is beyond me), it has to sit in a separate file.
 
 At some point the amount of data held in respect of the available space
 reaches a critical level and the file needs to reorganize. Rather than split
 the most heavily populated group - which would be the obvious thing - linear
 hashing works on the basis of a split pointer that moves incrementally
 through the file. So the first split breaks group 0 and adds group 100 to
 the end of the file, hopefully moving around half the content of group 0 to
 this new group. Of course, there is no guarantee that it will (depending on
 key structure) and also no guarantee that this will help anything, if group
 0 isn't overflowed or populated anyway. So the next write may also cause a
 split, except now to split group 1 into a new group 101, and so forth.
 
 Eventually the pointer will reach the end and all the initial 100 groups
 will have been split, and the whole process restarts with the split pointer
 moving back to zero. You now have 200 groups and by this time everything
 should in theory have levelled out, but in the meantime there is still
 overloading and stuff will still be in overflow. The next split will create
 group 200 and split half of group 0 into it, and the whole process repeats
 for ever.
 
 Oversized records ( buffer size) also get moved out because they stuff up
 the block allocation.
 
 So why this crazy system, rather than hitting the filled groups as they get
 overstuffed? Because it makes finding a record easy. Because linear hashing
 is based on a power of 2, the maths is simple - if the group is after the
 split point, the record MUST be in that group (or its overflow). If it is
 before the split point, it could be in the original group or the split
 group: so you can just rehash with double the modulus to check which one
 without even having to scan the groups.
 
 What makes the implementation difficult is that Litwin et al were all
 assuming a single threaded access to an in-memory list. Concurrent access
 whilst maintaining the load factor, split pointer and splitting all add a
 lot more complexity, unless you lock the whole file for the duration of an
 IO operation and kill the performance.
 
 And coming back to the manual, storing large numbers of data items - even
 large ones - in a type 19 file 

Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Charles Stevenson
SMAT -d  (or ANALYZE.SHM -d)   see uv/bin/smat[.exe] 
uv/bin/analyze.shm[.exe]


Dynamic Files:
Slot #  Inode Device Ref Count Htype Split Merge Curmod Basemod 
Largerec   Filesp Selects Nextsplit
 0 1285128087 209307792516208050 4001
2048 3267  2782736   0  1954
 1  153221440 151542860060208050 397040 
262144 1628 58641084   0134897
 2 1155376080  317006236 6208050 81  64 
1628   133616   018
 3  924071961  976405761 2208050 957 
512 1628  1249180   0   446
 4  619894993 1297457141 1208050 1157
1024 1628  3837400   0   134
 5 1401440370  656655020 6218050 213429 
131072 1628 54052576   0 82358
 6 1053905064 1350670129 2208050 365 
256 1628   529956   0   110
 7  963519080 1084306943 2208050 2564
2048 1628  4019040   0   517
 8 1909033200  47372346598208050 3851
2048 3267 12775756   0  1804

   etc.

Because of the concurrency difficulties that Brian mentioned . . .

On 7/4/2012 5:26 AM, Brian Leach wrote:

What makes the implementation difficult is that Litwin et al were all assuming 
a single threaded access to an in-memory list. Concurrent access whilst 
maintaining the load factor, split pointer and splitting all add a lot more 
complexity, unless you lock the whole file for the duration of an IO operation 
and kill the performance.
. . . is why UV reserves a table in shared memory for dynamic files, per 
SMAT -d.
The 1st user that opens the file causes the control info in the file 
header to be loaded to shared memory, where it remains until Ref Count 
drops to 0.
(It also get written to the file whenever there is a change.  At least 
on modern versions.)


Rick's post makes good sense if you work the numbers in the SMAT table.
Notice that (Curmod - Basemod) + 1 = Nextsplit  (off by 1 because groups 
start at 0.)
As Rick pointed out, Basemod is always a power of 2.  It is used by the 
hashing algorithms.  E.g., That 64 will eventually change to 128 or 32, 
once enough splits or merges happen.


Notice also that the future Nextsplit group number is set, i.e., 
predictable.  Remember Brian  Rick (others?) saying that split/merge 
decisions are determined by the entire file load, not which individual 
group that might happen to be in heavy overload? They were right: it is 
methodical.


Chris,
Notice that every number in the Split, Merge,  Largerec columns are the 
default values.
Although I do have exceptions, any random grab of 9 files like this 
would likely show straight default values.   Generally, fine-tuning 
isn't worth the bother.  It's more bang for the IT buck to buy more 
memory, disk than to pay Brian or Rick to squeeze performance out of 
type-30 files.



cds
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Wols Lists
On 04/07/12 11:26, Brian Leach wrote:
 All the other groups effectively get 1 added to their number
 Not exactly.
 
 Sorry to those who already know this, but maybe it's time to go over linear
 hashing in theory ..
 
 Linear hashing was a system devised by Litwin and originally only for
 in-memory lists. In fact there's some good implementations in C# that
 provide better handling of Dictionary types. Applying it to a file system
 adds some complexity but it's basically the same theory.
 
 Let's start with a file that has 100 groups initially defined (that's 0
 through 99). That is your minimum starting point and should ensure that it
 never shrinks below that, so it doesn't begin it's life with loads of splits
 right from the start as you populate the file. You would size this similarly
 to the way you size a regular hashed file for your initial content: no point
 making work for yourself (or the database).
 
 As data gets added, because the content is allocated unevenly, some of that
 load will be in primary and some in overflow: that's just the way of the
 world. No hashing is perfect. Unlike a static file, the overflow can't be
 added to the end of the file as a linked list (* why nobody has done managed
 overflow is beyond me), it has to sit in a separate file.

I don't know what the definition of badly overflowed is, but assuming
that a badly overflowed group has two blocks of overflow, then those
file stats seem perfectly okay. As Brian has explained, the distribution
of records is lumpy and as a percentage of the file, there aren't many
badly overflowed groups.

You've got roughly 1/3 of groups overflowed - with an 80% split that
doesn't seem at all out of order - on average each group is 80% full so
1/3rd more than 100% full is fine.

You've got (in thousands) one and a half groups badly overflowed out of
eighty-three. That's less than two percent. That's nothing.

As for why no-one has done managed overflow, I think there are various
reasons. The first successful implementation (Prime INFORMATION) didn't
need it. It used a peculiar type of file called a Segmented Directory
and while I don't know for certain what PI did, I strongly suspect each
group had its own normal file so if a group overflowed, it just created
a new block at the end of the file. Same with large records, it
allocated a bunch of overflow blocks. This file structure was far more
evident with PI-Open - at the OS level a dynamic file was a OS directory
with lots of numbered files in it.

The UV implementation of one file for data, one file for overflow may
be unique to UV. I don't know. What little I know of UD tells me it's
different, and others like QM could well be different again. I wouldn't
actually be surprised if QM is like PI.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Rick Nuckolls
I believe PiOpen used a directory with two files in it ‘$0’ and ‘$1’ 
corresponding to DATA.30 and OVER.30.  If the numbers went up from there, I 
think that they corresponded to alternate keys, ie ‘$2’ and ‘$3’ represented 
DATA.30 and OVER.30 for the first alternate key.

I do not think that PiOpen supported statically hashed files.  (Pr1me 
Information did)

All of that is a few years ago

Unidata uses dat001 and over001 with the number increasing to allow for very 
large files (I think).

-Rick

On Jul 4, 2012, at 10:51 AM, Wols Lists wrote:

 On 04/07/12 11:26, Brian Leach wrote:
 All the other groups effectively get 1 added to their number
 Not exactly.
 
 Sorry to those who already know this, but maybe it's time to go over linear
 hashing in theory ..
 
 Linear hashing was a system devised by Litwin and originally only for
 in-memory lists. In fact there's some good implementations in C# that
 provide better handling of Dictionary types. Applying it to a file system
 adds some complexity but it's basically the same theory.
 
 Let's start with a file that has 100 groups initially defined (that's 0
 through 99). That is your minimum starting point and should ensure that it
 never shrinks below that, so it doesn't begin it's life with loads of splits
 right from the start as you populate the file. You would size this similarly
 to the way you size a regular hashed file for your initial content: no point
 making work for yourself (or the database).
 
 As data gets added, because the content is allocated unevenly, some of that
 load will be in primary and some in overflow: that's just the way of the
 world. No hashing is perfect. Unlike a static file, the overflow can't be
 added to the end of the file as a linked list (* why nobody has done managed
 overflow is beyond me), it has to sit in a separate file.
 
 I don't know what the definition of badly overflowed is, but assuming
 that a badly overflowed group has two blocks of overflow, then those
 file stats seem perfectly okay. As Brian has explained, the distribution
 of records is lumpy and as a percentage of the file, there aren't many
 badly overflowed groups.
 
 You've got roughly 1/3 of groups overflowed - with an 80% split that
 doesn't seem at all out of order - on average each group is 80% full so
 1/3rd more than 100% full is fine.
 
 You've got (in thousands) one and a half groups badly overflowed out of
 eighty-three. That's less than two percent. That's nothing.
 
 As for why no-one has done managed overflow, I think there are various
 reasons. The first successful implementation (Prime INFORMATION) didn't
 need it. It used a peculiar type of file called a Segmented Directory
 and while I don't know for certain what PI did, I strongly suspect each
 group had its own normal file so if a group overflowed, it just created
 a new block at the end of the file. Same with large records, it
 allocated a bunch of overflow blocks. This file structure was far more
 evident with PI-Open - at the OS level a dynamic file was a OS directory
 with lots of numbered files in it.
 
 The UV implementation of one file for data, one file for overflow may
 be unique to UV. I don't know. What little I know of UD tells me it's
 different, and others like QM could well be different again. I wouldn't
 actually be surprised if QM is like PI.
 
 Cheers,
 Wol
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Wols Lists
On 04/07/12 19:59, Rick Nuckolls wrote:
 I believe PiOpen used a directory with two files in it ‘$0’ and ‘$1’ 
 corresponding to DATA.30 and OVER.30.  If the numbers went up from there, I 
 think that they corresponded to alternate keys, ie ‘$2’ and ‘$3’ 
 represented DATA.30 and OVER.30 for the first alternate key.
 
And $2, and $3, and the rest, iirc ...

 I do not think that PiOpen supported statically hashed files.  (Pr1me 
 Information did)

I'd be very surprised if it didn't. I might look up the manuals in my
garage and check ...

Or try to boot my EXL7330 and actually see what it does -)
 
 All of that is a few years ago

Agreed. But I dug into that at the time, and I'm pretty certain there
were a lot more than just two files in most dynamic file directories...
I might even have a CD somewhere with a tape-dump on it ...
 
 Unidata uses dat001 and over001 with the number increasing to allow for very 
 large files (I think).
 
 -Rick
 
Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Wols Lists
On 04/07/12 17:44, Charles Stevenson wrote:
SMAT -d  (or ANALYZE.SHM -d)   see uv/bin/smat[.exe]
 uv/bin/analyze.shm[.exe]
 
 Dynamic Files:
 Slot #  Inode Device Ref Count Htype Split Merge Curmod Basemod
 Largerec   Filesp Selects Nextsplit
  0 1285128087 209307792516208050 4001   
 2048 3267  2782736   0  1954
  1  153221440 151542860060208050 397040
 262144 1628 58641084   0134897
  2 1155376080  317006236 6208050 81  64
 1628   133616   018
  3  924071961  976405761 2208050 957 512
 1628  1249180   0   446
  4  619894993 1297457141 1208050 1157   
 1024 1628  3837400   0   134
  5 1401440370  656655020 6218050 213429
 131072 1628 54052576   0 82358
  6 1053905064 1350670129 2208050 365 256
 1628   529956   0   110
  7  963519080 1084306943 2208050 2564   
 2048 1628  4019040   0   517
  8 1909033200  47372346598208050 3851   
 2048 3267 12775756   0  1804
etc.
 
 Because of the concurrency difficulties that Brian mentioned . . .
 
 On 7/4/2012 5:26 AM, Brian Leach wrote:
 What makes the implementation difficult is that Litwin et al were all
 assuming a single threaded access to an in-memory list. Concurrent
 access whilst maintaining the load factor, split pointer and splitting
 all add a lot more complexity, unless you lock the whole file for the
 duration of an IO operation and kill the performance.
 . . . is why UV reserves a table in shared memory for dynamic files, per
 SMAT -d.
 The 1st user that opens the file causes the control info in the file
 header to be loaded to shared memory, where it remains until Ref Count
 drops to 0.
 (It also get written to the file whenever there is a change.  At least
 on modern versions.)

Actually, thinking about it, why do you need to lock the entire file
when splitting or merging?

A merge actually could be done very quickly, to merge groups 10 and 2
you just chain 10 on to the end of 2 and don't bother actually
consolidating them.

But to split 2 into 2 and 10, you just need an exclusive lock on both of
them. Any attempt to access 1 or 3 or 9 can just sail right on by - only
if a process wants to access the group being split do you need to stall
it until you've finished. Although that is a problem if you're
sequentially scanning the file - which does block split/merge while
you're doing it.

I remember coming across a very badly sized dynamic file where that had
obviously happened - I guess someone had left a program half way through
a BASIC SELECT for a week or so and the file had grown somewhat
horrendously. It slowly corrected itself though. (I found it because our
client's system was horribly slow and I was looking for the cause. This
wasn't it though - it turned out to be some nasty code somewhere else,
can't remember exactly what.)

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Brett Callacher
Almost.  Though the file will look after itself, it may not do so very well.  
Dynamic files, for best performance, do sometimes need periodic resizing.  
Having said that it is true that some never resize Dynamic files.

If the minimum modulo is much lower than the actual, then this will cause 
constant splits to occur if the file is constantly growing.  The 80% actual 
load is further indication of this.  What can be even worse is if the file then 
shrinks dramatically in this case as very intensive merges will takes place - 
not desirable if you expect the file to grow again.

In this case I would choose a new modulo greater than the actual - how much 
bigger depends on the rate of growth expected.  That is with the current 
separation - the best separation you will only determine by examining the size 
of the records.

Martin Phillips martinphill...@ladybridge.com wrote in message 
news:00f601cd588c$cd3d1310$67b73930$@ladybridge.com...
 Hi Chris,
 
 The whole point of dynamic files is that you don't do RESIZE. The file will 
 look after itself, automatically responding to
 variations in the volume of data.
 
 There are knobs to twiddle but in most cases they can safely be left at 
 their defaults. A dynamic file will never perform as well
 as a perfectly tuned static file but they are a heck of a lot better than 
 typical static files that haven't been reconfigured for
 ages.
 
 
 Martin Phillips
 Ladybridge Systems Ltd
 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
 +44 (0)1604-709200
 
 
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: 02 July 2012 20:22
 To: u2-users@listserver.u2ug.org
 Subject: [U2] RESIZE - dynamic files
 
 
 I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
 example I have a file called 'TEST_FILE'
 with the following:
 
 01 ANALYZE.FILE TEST_FILE
 File name ..   TEST_FILE
 Pathname ...   TEST_FILE
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    83261 current ( minimum 31 )
 Large record size ..   3267 bytes
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   450613248 bytes
 
 How do you calculate what the modulus and separation should be? I can't use 
 HASH.HELP on a type 30 file to see the recommended
 settings
 so I was wondering how best you figure out the file RESIZE.
 
 Thanks,
 
 Chris
 
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
This message contains information that may be privileged or confidential and is 
the property of GPM Development Ltd. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient ,you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.

This e-mail was sent to you by GPM Development Ltd.  We are incorporated under 
the laws of England and Wales (company no. 2292156 and VAT registration no. 523 
5622 63).  Our registered office is 6th Floor, AMP House, Croydon, Surrey CR0 
2LX.
 

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

Using the record above, how would I calculate the following?

1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
current number)?
2) SPLIT - would 90% seem about right?
3) MERGE - would 20% seem about right? 
4) Large Record Size - does 3276 seem right? 
5) Group Size - should I be using 4096?

I'm just a bit confused as to how to set these, I saw the formula to calculate 
the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a 
lower number
than my current modulus.. 

I also saw where it said to simply take your current modulus # and add 10-20% 
and set the MINIMUM.MODULUS based on that..

Based on the table above I'm just trying to get an idea of what these should be 
set at.

Thanks,

Chris


 From: cjausti...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 10:28:17 -0500
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Doug,
 
 When I do the math I come up with a different # (see below):
 
 File name ..   TEST_FILE
 Pathname ...   TEST_FILE
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
 26225 overflowed, 1441 badly )
 Number of records ..   1157122
 Large record size ..   2036 bytes
 Number of large records    576
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   449605632 bytes
 Total size of record data ..   258687736 bytes
 Total size of record IDs ...   19283300 bytes
 Unused space ...   171626404 bytes
 Total space for records    449597440 bytes
 
 -- 
 258,687,736 bytes - Total size of record data
 19,283,300 bytes - Total size of record IDs
 ===
 277,971,036 bytes (record + id's)
 
 277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
 -- 
 
 68,063 is less than the current modulus of 82,850. Something with this 
 formula doesn't seem right because if I use that formula I always calculate a 
 minimum modulus of less than the current modulus.
 
 Thanks,
 
 Chris
 
 
 
  Date: Mon, 2 Jul 2012 16:08:16 -0600
  From: dave...@gmail.com
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  Hi Chris:
  
  You cannot get away with not resizing dynamic files in my experience.  The
  files do not split and merge like we are led to believe.  The separator is
  not used on dynamic files.  Your Universe file is badly sized.  The math
  below will get you reasonably file size.
  
  Let's do the math:
  
  258687736 (Record Size)
  192283300 (Key Size)
  
  450,971,036 (Data and Key Size)
  
  4096 (Group Size)
  - 12   (32 Bit Overhead)
  
  4084 Usable Space
  
  450971036/4084 = Minimum Modulo 110424 (Prime is 110431)
  
  
  [ad]
  I hate doing this math all of the time.  I have a reasonably priced resize
  program called XLr8Resizer for $99.00 to do this for me.
  [/ad]
  
  Regards,
  Doug
  www.u2logic.com/tools.html
  XLr8Resizer for the rest of us
  ___
  U2-Users mailing list
  U2-Users@listserver.u2ug.org
  http://listserver.u2ug.org/mailman/listinfo/u2-users
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Doug Averch
Yep, I added an extra 2 in the key value.  Oh, the perils of cut and
paste...
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

No worries Doug. I'm just wondering if the calculation makes sense (if we use 
the example below):

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

FORMULA - (287,035,391+21,508,449) / (4,084) = 75,549  MINIMUM.MODULUS

The question I have is whether 75,549 makes sense for this record. I thought 
the MINIMUM.MODULUS was supposed to be bigger than the number of groups (92,776 
in this case)?

Chris


 Date: Tue, 3 Jul 2012 11:04:53 -0600
 From: dave...@gmail.com
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 Yep, I added an extra 2 in the key value.  Oh, the perils of cut and
 paste...
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Doug Averch
See comment interspersed...

Using the record above, how would I calculate the following?

 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the
 current number)?

Should be less the the current size, if you want the file to merge


 2) SPLIT - would 90% seem about right?

Depends on the history of the file.  Is the data growing over time?  The
way the file looks now the split should be reduced because you have 31% in
overflow.

3) MERGE - would 20% seem about right?

Won't be used on a growth file, so the file history is important.

4) Large Record Size - does 3276 seem right?

Can be calculated with a lot of effort, but yield little gain.

5) Group Size - should I be using 4096?

You have two group sizes on dynamic files 2048 and 4096.  If you lower it
you need to double your modulo, roughly.  If you keep it the same you need
to increase your modulo because 31% of your file is in overflow.



___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
(record + id / 4096 or 2048)

You need to factor in overhead  the split factor:   (records + ids) * 1.1 * 
1.25  / 4096(for 80%) 

If you use a 20% merge factor and a 80% split factor, the file will start 
merging unless you delete 60 percent of your records.  If you use 90% split 
factor, you will have more overflowed groups.  These numbers refer to the total 
amount of data in the file, not to any individual group.

For records of the size that you have, I do not see any advantage to using a 
larger, 4096, group size. You will end up with twice the number of records per 
group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 9:48 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

Using the record above, how would I calculate the following?

1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
current number)?
2) SPLIT - would 90% seem about right?
3) MERGE - would 20% seem about right? 
4) Large Record Size - does 3276 seem right? 
5) Group Size - should I be using 4096?

I'm just a bit confused as to how to set these, I saw the formula to calculate 
the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a 
lower number
than my current modulus.. 

I also saw where it said to simply take your current modulus # and add 10-20% 
and set the MINIMUM.MODULUS based on that..

Based on the table above I'm just trying to get an idea of what these should be 
set at.

Thanks,

Chris


 From: cjausti...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 10:28:17 -0500
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Doug,
 
 When I do the math I come up with a different # (see below):
 
 File name ..   TEST_FILE
 Pathname ...   TEST_FILE
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
 26225 overflowed, 1441 badly )
 Number of records ..   1157122
 Large record size ..   2036 bytes
 Number of large records    576
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   449605632 bytes
 Total size of record data ..   258687736 bytes
 Total size of record IDs ...   19283300 bytes
 Unused space ...   171626404 bytes
 Total space for records    449597440 bytes
 
 -- 
 258,687,736 bytes - Total size of record data
 19,283,300 bytes - Total size of record IDs
 ===
 277,971,036 bytes (record + id's)
 
 277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
 -- 
 
 68,063 is less than the current modulus of 82,850. Something with this 
 formula doesn't seem right because if I use that formula I always calculate a 
 minimum modulus of less than the current modulus.
 
 Thanks,
 
 Chris
 
 
 
  Date: Mon, 2 Jul 2012 16:08:16 -0600
  From: dave...@gmail.com
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  Hi Chris:
  
  You cannot get away with not resizing dynamic files in my experience.  The
  files do not split and merge like we are led to believe.  The separator is
  not used on dynamic files.  Your Universe file is badly sized.  The math
  below will get you reasonably file size.
  
  Let's do the math:
  
  258687736 (Record Size)
  192283300 (Key Size)
  
  450,971,036 (Data and Key Size)
  
  4096 (Group Size)
  - 12   (32 Bit Overhead)
  
  4084 Usable Space
  
  450971036/4084 = Minimum Modulo 110424 (Prime is 110431)
  
  
  [ad]
  I hate doing this math all of the time.  I have a reasonably priced resize
  program called XLr8Resizer for $99.00 to do this for me.
  [/ad]
  
  Regards,
  Doug
  www.u2logic.com/tools.html
  XLr8Resizer for the rest of us
  ___
  U2-Users

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

So for this example what would be a good SPLIT level and what would be a good 
MERGE level to use? It was my understanding that I wanted to lower my merge to 
something below 50% and
increase the split to reduce splitting.

Chris


 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 10:21:16 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 (record + id / 4096 or 2048)
 
 You need to factor in overhead  the split factor:   (records + ids) * 1.1 * 
 1.25  / 4096(for 80%) 
 
 If you use a 20% merge factor and a 80% split factor, the file will start 
 merging unless you delete 60 percent of your records.  If you use 90% split 
 factor, you will have more overflowed groups.  These numbers refer to the 
 total amount of data in the file, not to any individual group.
 
 For records of the size that you have, I do not see any advantage to using a 
 larger, 4096, group size. You will end up with twice the number of records 
 per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
 
 -Rick
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 9:48 AM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
 28229 overflowed, 2485 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   500600832 bytes
 Total size of record data ..   287035391 bytes
 Total size of record IDs ...   21508449 bytes
 Unused space ...   192048800 bytes
 Total space for records    500592640 bytes
 
 Using the record above, how would I calculate the following?
 
 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
 current number)?
 2) SPLIT - would 90% seem about right?
 3) MERGE - would 20% seem about right? 
 4) Large Record Size - does 3276 seem right? 
 5) Group Size - should I be using 4096?
 
 I'm just a bit confused as to how to set these, I saw the formula to 
 calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I 
 always get a lower number
 than my current modulus.. 
 
 I also saw where it said to simply take your current modulus # and add 10-20% 
 and set the MINIMUM.MODULUS based on that..
 
 Based on the table above I'm just trying to get an idea of what these should 
 be set at.
 
 Thanks,
 
 Chris
 
 
  From: cjausti...@hotmail.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 10:28:17 -0500
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  Doug,
  
  When I do the math I come up with a different # (see below):
  
  File name ..   TEST_FILE
  Pathname ...   TEST_FILE
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
  26225 overflowed, 1441 badly )
  Number of records ..   1157122
  Large record size ..   2036 bytes
  Number of large records    576
  Group size .   4096 bytes
  Load factors ...   80% (split), 50% (merge) and 80% (actual)
  Total size .   449605632 bytes
  Total size of record data ..   258687736 bytes
  Total size of record IDs ...   19283300 bytes
  Unused space ...   171626404 bytes
  Total space for records    449597440 bytes
  
  -- 
  258,687,736 bytes - Total size of record data
  19,283,300 bytes - Total size of record IDs
  ===
  277,971,036 bytes (record + id's)
  
  277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
  -- 
  
  68,063 is less than the current modulus of 82,850. Something with this 
  formula doesn't seem right because if I use that formula I always calculate 
  a 
  minimum modulus of less than the current modulus.
  
  Thanks,
  
  Chris
  
  
  
   Date: Mon, 2 Jul 2012 16:08:16 -0600
   From: dave...@gmail.com
   To: u2-users@listserver.u2ug.org
   Subject: Re: [U2] RESIZE - dynamic files
   
   Hi Chris:
   
   You cannot get away with not resizing dynamic files in my experience.  The
   files do not split and merge like we are led to believe.  The separator is
   not used on dynamic files.  Your Universe file is badly sized.  The math
   below will get you reasonably file size.
   
   Let's do the math:
   
   258687736 (Record Size

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin


Doug,

The data is growing over time with this file. Does that mean I should ignore 
the formula? Or should I still use a lower MINIMUM.MODULO than the
actual modulo #..

Is the idea to reduce overflow by lowering the split? What is this 'overflow' 
referring to?

  2) SPLIT - would 90% seem about right?
 
 Depends on the history of the file.  Is the data growing over time?  The
 way the file looks now the split should be reduced because you have 31% in
 overflow.

So basically don't spend much time worrying about large record size?

 4) Large Record Size - does 3276 seem right?
 
 Can be calculated with a lot of effort, but yield little gain.

This seems like a moot point as well, as long as the ratio in regards to the 
MINIMUM.MODULO is set proportionally?

 5) Group Size - should I be using 4096?
 
 You have two group sizes on dynamic files 2048 and 4096.  If you lower it
 you need to double your modulo, roughly.  If you keep it the same you need
 to increase your modulo because 31% of your file is in overflow.

Chris


  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

Using the formula below, and changing the split to 90% I get the following:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
22249 overflowed, 1764 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 72% (actual)
Total size .   519921664 bytes
Total size of record data ..   287400591 bytes
Total size of record IDs ...   21508497 bytes
Unused space ...   211004384 bytes
Total space for records    519913472 bytes

How does this look in terms of performance? 

My Actual load went down 8% as well as some overflow but it looks like my load 
% is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even 
more 
since I still have a decent amount of overflow and not many large records. 

Chris


 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 10:21:16 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 (record + id / 4096 or 2048)
 
 You need to factor in overhead  the split factor:   (records + ids) * 1.1 * 
 1.25  / 4096(for 80%) 
 
 If you use a 20% merge factor and a 80% split factor, the file will start 
 merging unless you delete 60 percent of your records.  If you use 90% split 
 factor, you will have more overflowed groups.  These numbers refer to the 
 total amount of data in the file, not to any individual group.
 
 For records of the size that you have, I do not see any advantage to using a 
 larger, 4096, group size. You will end up with twice the number of records 
 per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
 
 -Rick
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 9:48 AM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
 28229 overflowed, 2485 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   500600832 bytes
 Total size of record data ..   287035391 bytes
 Total size of record IDs ...   21508449 bytes
 Unused space ...   192048800 bytes
 Total space for records    500592640 bytes
 
 Using the record above, how would I calculate the following?
 
 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
 current number)?
 2) SPLIT - would 90% seem about right?
 3) MERGE - would 20% seem about right? 
 4) Large Record Size - does 3276 seem right? 
 5) Group Size - should I be using 4096?
 
 I'm just a bit confused as to how to set these, I saw the formula to 
 calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I 
 always get a lower number
 than my current modulus.. 
 
 I also saw where it said to simply take your current modulus # and add 10-20% 
 and set the MINIMUM.MODULUS based on that..
 
 Based on the table above I'm just trying to get an idea of what these should 
 be set at.
 
 Thanks,
 
 Chris
 
 
  From: cjausti...@hotmail.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 10:28:17 -0500
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  Doug,
  
  When I do the math I come up with a different # (see below):
  
  File name ..   TEST_FILE
  Pathname ...   TEST_FILE
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
  26225 overflowed, 1441 badly )
  Number of records ..   1157122
  Large record size ..   2036 bytes
  Number of large records    576
  Group size .   4096 bytes
  Load factors ...   80% (split), 50% (merge) and 80% (actual)
  Total size .   449605632 bytes
  Total size of record data ..   258687736 bytes
  Total size of record IDs ...   19283300 bytes
  Unused space ...   171626404 bytes
  Total space for records    449597440

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Jeff Schasny
I would recommend that if you intend to do resizing on a regular basis 
an you want to improve the performance of the file you might consider 
resizing the file to a static file type so that you can have more 
control over the hashing algorithm, separation and modulo.


Chris Austin wrote:

Using the formula below, and changing the split to 90% I get the following:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
22249 overflowed, 1764 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 72% (actual)
Total size .   519921664 bytes
Total size of record data ..   287400591 bytes
Total size of record IDs ...   21508497 bytes
Unused space ...   211004384 bytes
Total space for records    519913472 bytes

How does this look in terms of performance? 

My Actual load went down 8% as well as some overflow but it looks like my load % is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even more 
since I still have a decent amount of overflow and not many large records. 


Chris


  

From: r...@lynden.com
To: u2-users@listserver.u2ug.org
Date: Tue, 3 Jul 2012 10:21:16 -0700
Subject: Re: [U2] RESIZE - dynamic files

(record + id / 4096 or 2048)

You need to factor in overhead  the split factor:   (records + ids) * 1.1 * 1.25  / 4096(for 80%) 


If you use a 20% merge factor and a 80% split factor, the file will start 
merging unless you delete 60 percent of your records.  If you use 90% split 
factor, you will have more overflowed groups.  These numbers refer to the total 
amount of data in the file, not to any individual group.

For records of the size that you have, I do not see any advantage to using a 
larger, 4096, group size. You will end up with twice the number of records per 
group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 9:48 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

Using the record above, how would I calculate the following?

1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
current number)?
2) SPLIT - would 90% seem about right?
3) MERGE - would 20% seem about right? 
4) Large Record Size - does 3276 seem right? 
5) Group Size - should I be using 4096?


I'm just a bit confused as to how to set these, I saw the formula to calculate 
the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a 
lower number
than my current modulus.. 


I also saw where it said to simply take your current modulus # and add 10-20% 
and set the MINIMUM.MODULUS based on that..

Based on the table above I'm just trying to get an idea of what these should be 
set at.

Thanks,

Chris




From: cjausti...@hotmail.com
To: u2-users@listserver.u2ug.org
Date: Tue, 3 Jul 2012 10:28:17 -0500
Subject: Re: [U2] RESIZE - dynamic files


Doug,

When I do the math I come up with a different # (see below):

File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
26225 overflowed, 1441 badly )
Number of records ..   1157122
Large record size ..   2036 bytes
Number of large records    576
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   449605632 bytes
Total

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
The split load is not affecting anything here, since it is more than the actual 
load.  What your overflow suggests is that you lower the split.load value to 
70$% or below.  You could go ahead and set the merge.load to an arbitrarily low 
number (1), and it will probably never do a merge, which would be the same as 
specifying a minimum.modulus equal to as large as it ever gets.  The 
exception to this is during file creation  clear.file,  when the 
minimum.modulus value determines the initial disk allocation.  Short of going 
to an arbitrarily large minimum.modulus, and a very low split.load, you are 
going to have some overflow (unless you have sequential keys  like sized 
records).

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 12:54 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Using the formula below, and changing the split to 90% I get the following:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
22249 overflowed, 1764 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 72% (actual)
Total size .   519921664 bytes
Total size of record data ..   287400591 bytes
Total size of record IDs ...   21508497 bytes
Unused space ...   211004384 bytes
Total space for records    519913472 bytes

How does this look in terms of performance? 

My Actual load went down 8% as well as some overflow but it looks like my load 
% is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even 
more 
since I still have a decent amount of overflow and not many large records. 

Chris


 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 10:21:16 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 (record + id / 4096 or 2048)
 
 You need to factor in overhead  the split factor:   (records + ids) * 1.1 * 
 1.25  / 4096(for 80%) 
 
 If you use a 20% merge factor and a 80% split factor, the file will start 
 merging unless you delete 60 percent of your records.  If you use 90% split 
 factor, you will have more overflowed groups.  These numbers refer to the 
 total amount of data in the file, not to any individual group.
 
 For records of the size that you have, I do not see any advantage to using a 
 larger, 4096, group size. You will end up with twice the number of records 
 per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
 
 -Rick
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 9:48 AM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
 28229 overflowed, 2485 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   500600832 bytes
 Total size of record data ..   287035391 bytes
 Total size of record IDs ...   21508449 bytes
 Unused space ...   192048800 bytes
 Total space for records    500592640 bytes
 
 Using the record above, how would I calculate the following?
 
 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
 current number)?
 2) SPLIT - would 90% seem about right?
 3) MERGE - would 20% seem about right? 
 4) Large Record Size - does 3276 seem right? 
 5) Group Size - should I be using 4096?
 
 I'm just a bit confused as to how to set these, I saw the formula to 
 calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I 
 always get a lower number
 than my current modulus.. 
 
 I also saw where it said to simply take your current modulus # and add 10-20% 
 and set the MINIMUM.MODULUS based on that..
 
 Based on the table above I'm just trying to get an idea of what these should 
 be set at.
 
 Thanks,
 
 Chris
 
 
  From: cjausti...@hotmail.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 10:28:17 -0500
  Subject: Re

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

I guess what I need to know is what's an acceptable % of overflow for a dynamic 
file? For example, when I change the SPLIT LOAD to 90% (while using the 
calculated min modulus)
I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered 
acceptable on average or should I keep tinkering with it to reach a lower 
overflow %?

Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow % 
while not creating too many modulus (groups). 

Chris


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
21092 overflowed, 1452 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 70% (actual)
Total size .   522260480 bytes
Total size of record data ..   287400239 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   213343528 bytes
Total space for records    522252288 bytes

 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 13:10:43 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 The split load is not affecting anything here, since it is more than the 
 actual load.  What your overflow suggests is that you lower the split.load 
 value to 70$% or below.  You could go ahead and set the merge.load to an 
 arbitrarily low number (1), and it will probably never do a merge, which 
 would be the same as specifying a minimum.modulus equal to as large as it 
 ever gets.  The exception to this is during file creation  clear.file,  
 when the minimum.modulus value determines the initial disk allocation.  Short 
 of going to an arbitrarily large minimum.modulus, and a very low split.load, 
 you are going to have some overflow (unless you have sequential keys  like 
 sized records).
 
 -Rick
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 12:54 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Using the formula below, and changing the split to 90% I get the following:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
 22249 overflowed, 1764 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 72% (actual)
 Total size .   519921664 bytes
 Total size of record data ..   287400591 bytes
 Total size of record IDs ...   21508497 bytes
 Unused space ...   211004384 bytes
 Total space for records    519913472 bytes
 
 How does this look in terms of performance? 
 
 My Actual load went down 8% as well as some overflow but it looks like my 
 load % is still high at 72% I'm wondering if I should raise the 
 MINIMUM.MODULUS even more 
 since I still have a decent amount of overflow and not many large records. 
 
 Chris
 
 
  From: r...@lynden.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 10:21:16 -0700
  Subject: Re: [U2] RESIZE - dynamic files
  
  (record + id / 4096 or 2048)
  
  You need to factor in overhead  the split factor:   (records + ids) * 1.1 
  * 1.25  / 4096(for 80%) 
  
  If you use a 20% merge factor and a 80% split factor, the file will start 
  merging unless you delete 60 percent of your records.  If you use 90% split 
  factor, you will have more overflowed groups.  These numbers refer to the 
  total amount of data in the file, not to any individual group.
  
  For records of the size that you have, I do not see any advantage to using 
  a larger, 4096, group size. You will end up with twice the number of 
  records per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
  
  -Rick
  
  -Original Message-
  From: u2-users-boun...@listserver.u2ug.org 
  [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
  Sent: Tuesday, July 03, 2012 9:48 AM
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
The actual load is 70% on your file. The split.load of 90 was set after the 
file was loaded. If you leave it at that value, and add another 100,000 
records, your modulus will not grow, but the number of overflowed groups will. 

Perhaps you need to look at is as 80% not overflowed.  Despite the output, I 
doubt that any of those overflows are that bad. 




-Rick

On Jul 3, 2012, at 1:23 PM, Chris Austin cjausti...@hotmail.com wrote:

 
 I guess what I need to know is what's an acceptable % of overflow for a 
 dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
 the calculated min modulus)
 I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered 
 acceptable on average or should I keep tinkering with it to reach a lower 
 overflow %?
 
 Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow 
 % while not creating too many modulus (groups).
 
 Chris
 
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
21092 overflowed, 1452 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 70% (actual)
 Total size .   522260480 bytes
 Total size of record data ..   287400239 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   213343528 bytes
 Total space for records    522252288 bytes
 
 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 13:10:43 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 The split load is not affecting anything here, since it is more than the 
 actual load.  What your overflow suggests is that you lower the split.load 
 value to 70$% or below.  You could go ahead and set the merge.load to an 
 arbitrarily low number (1), and it will probably never do a merge, which 
 would be the same as specifying a minimum.modulus equal to as large as it 
 ever gets.  The exception to this is during file creation  clear.file,  
 when the minimum.modulus value determines the initial disk allocation.  
 Short of going to an arbitrarily large minimum.modulus, and a very low 
 split.load, you are going to have some overflow (unless you have sequential 
 keys  like sized records).
 
 -Rick
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 12:54 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Using the formula below, and changing the split to 90% I get the following:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
22249 overflowed, 1764 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 72% (actual)
 Total size .   519921664 bytes
 Total size of record data ..   287400591 bytes
 Total size of record IDs ...   21508497 bytes
 Unused space ...   211004384 bytes
 Total space for records    519913472 bytes
 
 How does this look in terms of performance?
 
 My Actual load went down 8% as well as some overflow but it looks like my 
 load % is still high at 72% I'm wondering if I should raise the 
 MINIMUM.MODULUS even more
 since I still have a decent amount of overflow and not many large records.
 
 Chris
 
 
 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 10:21:16 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 (record + id / 4096 or 2048)
 
 You need to factor in overhead  the split factor:   (records + ids) * 1.1 
 * 1.25  / 4096(for 80%)
 
 If you use a 20% merge factor and a 80% split factor, the file will start 
 merging unless you delete 60 percent of your records.  If you use 90% split 
 factor, you will have more overflowed groups.  These numbers refer to the 
 total amount of data in the file, not to any individual group.
 
 For records of the size that you have, I do not see any advantage to using 
 a larger, 4096, group size. You will end up with twice the number of 
 records per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
 
 -Rick
 
 -Original Message-
 From

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Dan Fitzgerald

One rule of thumb is to make sure that you have an average of 10 or less items 
in each group. Going by that, you'd want a minimum mod of 130k or more. I've 
also noticed that files approach the sweet spot for minimizing overflow 
without having excessive empty groups when the total size is pretty nearly 
twice the data size.
 
The goal can vary according to your situation. I'm personally not all that 
afraid of making the modulus a little too large, as overflow is a pretty bad 
performance hit (overflow means at least two disk reads to retrieve your data, 
badly means at least 2 extra disk reads, and I've seen files where that was 
thousands (this file isn't that bad, but 20% of your data is forcing at least 
one extra disk read). Empty groups contribute to overhead on a sequential 
search, so you'd want to consider how often you do a sequential search on a 
file - usually, that's a pretty inefficient way to retrieve data, but, again, 
your mileage may vary. 
 
To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what it 
looks like there. That'll give you an average of 6 records per group, not 
unreasonably shallow, and it's likely to be a while before you have to resize 
again.
 
 From: cjausti...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 15:23:23 -0500
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 I guess what I need to know is what's an acceptable % of overflow for a 
 dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
 the calculated min modulus)
 I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered 
 acceptable on average or should I keep tinkering with it to reach a lower 
 overflow %?
 
 Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow 
 % while not creating too many modulus (groups). 
 
 Chris
 
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
 21092 overflowed, 1452 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 70% (actual)
 Total size .   522260480 bytes
 Total size of record data ..   287400239 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   213343528 bytes
 Total space for records    522252288 bytes
 
  From: r...@lynden.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 13:10:43 -0700
  Subject: Re: [U2] RESIZE - dynamic files
  
  The split load is not affecting anything here, since it is more than the 
  actual load.  What your overflow suggests is that you lower the split.load 
  value to 70$% or below.  You could go ahead and set the merge.load to an 
  arbitrarily low number (1), and it will probably never do a merge, which 
  would be the same as specifying a minimum.modulus equal to as large as it 
  ever gets.  The exception to this is during file creation  clear.file,  
  when the minimum.modulus value determines the initial disk allocation.  
  Short of going to an arbitrarily large minimum.modulus, and a very low 
  split.load, you are going to have some overflow (unless you have sequential 
  keys  like sized records).
  
  -Rick
  
  -Original Message-
  From: u2-users-boun...@listserver.u2ug.org 
  [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
  Sent: Tuesday, July 03, 2012 12:54 PM
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  Using the formula below, and changing the split to 90% I get the following:
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
  22249 overflowed, 1764 badly )
  Number of records ..   1290469
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   90% (split), 50% (merge) and 72% (actual)
  Total size .   519921664 bytes
  Total size of record data ..   287400591 bytes
  Total size of record IDs ...   21508497 bytes
  Unused space ...   211004384 bytes
  Total space for records    519913472 bytes
  
  How does this look in terms of performance? 
  
  My Actual load went down 8% as well as some

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

Dan,

I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
Actual Load has really gone down (as well as overflow). See below for the 
results:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
3957 overflowed, 207 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 37% (actual)
Total size .   836235264 bytes
Total size of record data ..   287394719 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   527323832 bytes
Total space for records    836227072 bytes

My overflow is now @ 2% 
My Load is @ 37% (actual)

granted my empty groups are now up to almost 3% but I hope that won't be a big 
factor. How does this look?

Chris


 From: dangf...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 16:57:34 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 One rule of thumb is to make sure that you have an average of 10 or less 
 items in each group. Going by that, you'd want a minimum mod of 130k or more. 
 I've also noticed that files approach the sweet spot for minimizing 
 overflow without having excessive empty groups when the total size is pretty 
 nearly twice the data size.
  
 The goal can vary according to your situation. I'm personally not all that 
 afraid of making the modulus a little too large, as overflow is a pretty bad 
 performance hit (overflow means at least two disk reads to retrieve your 
 data, badly means at least 2 extra disk reads, and I've seen files where 
 that was thousands (this file isn't that bad, but 20% of your data is forcing 
 at least one extra disk read). Empty groups contribute to overhead on a 
 sequential search, so you'd want to consider how often you do a sequential 
 search on a file - usually, that's a pretty inefficient way to retrieve data, 
 but, again, your mileage may vary. 
  
 To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
 than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what 
 it looks like there. That'll give you an average of 6 records per group, not 
 unreasonably shallow, and it's likely to be a while before you have to resize 
 again.
  
  From: cjausti...@hotmail.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 15:23:23 -0500
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  I guess what I need to know is what's an acceptable % of overflow for a 
  dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
  the calculated min modulus)
  I'm still left with ~ 20% of overflow (see below). Is 20% overflow 
  considered acceptable on average or should I keep tinkering with it to 
  reach a lower overflow %?
  
  Correct me if I'm wrong but it seems the goal here is to REDUCE the 
  overflow % while not creating too many modulus (groups). 
  
  Chris
  
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
  21092 overflowed, 1452 badly )
  Number of records ..   1290469
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   90% (split), 50% (merge) and 70% (actual)
  Total size .   522260480 bytes
  Total size of record data ..   287400239 bytes
  Total size of record IDs ...   21508521 bytes
  Unused space ...   213343528 bytes
  Total space for records    522252288 bytes
  
   From: r...@lynden.com
   To: u2-users@listserver.u2ug.org
   Date: Tue, 3 Jul 2012 13:10:43 -0700
   Subject: Re: [U2] RESIZE - dynamic files
   
   The split load is not affecting anything here, since it is more than the 
   actual load.  What your overflow suggests is that you lower the 
   split.load value to 70$% or below.  You could go ahead and set the 
   merge.load to an arbitrarily low number (1), and it will probably never 
   do a merge, which would be the same as specifying a minimum.modulus equal 
   to as large as it ever gets.  The exception to this is during file 
   creation  clear.file,  when the minimum.modulus value determines the 
   initial disk allocation.  Short of going to an arbitrarily large 
   minimum.modulus, and a very low split.load, you are going to have some 
   overflow

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
But the total size of your file is up 60%.  Reading in 60% more records in a 
full select of the file is going to be much slower than a few more overflows.


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:15 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Dan,

I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
Actual Load has really gone down (as well as overflow). See below for the 
results:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
3957 overflowed, 207 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 37% (actual)
Total size .   836235264 bytes
Total size of record data ..   287394719 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   527323832 bytes
Total space for records    836227072 bytes

My overflow is now @ 2%
My Load is @ 37% (actual)

granted my empty groups are now up to almost 3% but I hope that won't be a big 
factor. How does this look?

Chris


 From: dangf...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 16:57:34 -0400
 Subject: Re: [U2] RESIZE - dynamic files


 One rule of thumb is to make sure that you have an average of 10 or less 
 items in each group. Going by that, you'd want a minimum mod of 130k or more. 
 I've also noticed that files approach the sweet spot for minimizing 
 overflow without having excessive empty groups when the total size is pretty 
 nearly twice the data size.

 The goal can vary according to your situation. I'm personally not all that 
 afraid of making the modulus a little too large, as overflow is a pretty bad 
 performance hit (overflow means at least two disk reads to retrieve your 
 data, badly means at least 2 extra disk reads, and I've seen files where 
 that was thousands (this file isn't that bad, but 20% of your data is forcing 
 at least one extra disk read). Empty groups contribute to overhead on a 
 sequential search, so you'd want to consider how often you do a sequential 
 search on a file - usually, that's a pretty inefficient way to retrieve data, 
 but, again, your mileage may vary.

 To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
 than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what 
 it looks like there. That'll give you an average of 6 records per group, not 
 unreasonably shallow, and it's likely to be a while before you have to resize 
 again.

  From: cjausti...@hotmail.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 15:23:23 -0500
  Subject: Re: [U2] RESIZE - dynamic files
 
 
  I guess what I need to know is what's an acceptable % of overflow for a 
  dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
  the calculated min modulus)
  I'm still left with ~ 20% of overflow (see below). Is 20% overflow 
  considered acceptable on average or should I keep tinkering with it to 
  reach a lower overflow %?
 
  Correct me if I'm wrong but it seems the goal here is to REDUCE the 
  overflow % while not creating too many modulus (groups).
 
  Chris
 
 
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
  21092 overflowed, 1452 badly )
  Number of records ..   1290469
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   90% (split), 50% (merge) and 70% (actual)
  Total size .   522260480 bytes
  Total size of record data ..   287400239 bytes
  Total size of record IDs ...   21508521 bytes
  Unused space ...   213343528 bytes
  Total space for records    522252288 bytes
 
   From: r...@lynden.com
   To: u2-users@listserver.u2ug.org
   Date: Tue, 3 Jul 2012 13:10:43 -0700
   Subject: Re: [U2] RESIZE - dynamic files
  
   The split load is not affecting anything here, since it is more than the 
   actual load.  What your overflow suggests is that you lower the 
   split.load value to 70$% or below.  You could go ahead and set the 
   merge.load to an arbitrarily low number (1

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
I should have said 60% more disk records, to be clear.

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Rick Nuckolls
Sent: Tuesday, July 03, 2012 2:24 PM
To: 'U2 Users List'
Subject: Re: [U2] RESIZE - dynamic files

But the total size of your file is up 60%.  Reading in 60% more records in a 
full select of the file is going to be much slower than a few more overflows.


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:15 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Dan,

I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
Actual Load has really gone down (as well as overflow). See below for the 
results:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
3957 overflowed, 207 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 37% (actual)
Total size .   836235264 bytes
Total size of record data ..   287394719 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   527323832 bytes
Total space for records    836227072 bytes

My overflow is now @ 2%
My Load is @ 37% (actual)

granted my empty groups are now up to almost 3% but I hope that won't be a big 
factor. How does this look?

Chris


 From: dangf...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 16:57:34 -0400
 Subject: Re: [U2] RESIZE - dynamic files


 One rule of thumb is to make sure that you have an average of 10 or less 
 items in each group. Going by that, you'd want a minimum mod of 130k or more. 
 I've also noticed that files approach the sweet spot for minimizing 
 overflow without having excessive empty groups when the total size is pretty 
 nearly twice the data size.

 The goal can vary according to your situation. I'm personally not all that 
 afraid of making the modulus a little too large, as overflow is a pretty bad 
 performance hit (overflow means at least two disk reads to retrieve your 
 data, badly means at least 2 extra disk reads, and I've seen files where 
 that was thousands (this file isn't that bad, but 20% of your data is forcing 
 at least one extra disk read). Empty groups contribute to overhead on a 
 sequential search, so you'd want to consider how often you do a sequential 
 search on a file - usually, that's a pretty inefficient way to retrieve data, 
 but, again, your mileage may vary.

 To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
 than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what 
 it looks like there. That'll give you an average of 6 records per group, not 
 unreasonably shallow, and it's likely to be a while before you have to resize 
 again.

  From: cjausti...@hotmail.com
  To: u2-users@listserver.u2ug.org
  Date: Tue, 3 Jul 2012 15:23:23 -0500
  Subject: Re: [U2] RESIZE - dynamic files
 
 
  I guess what I need to know is what's an acceptable % of overflow for a 
  dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
  the calculated min modulus)
  I'm still left with ~ 20% of overflow (see below). Is 20% overflow 
  considered acceptable on average or should I keep tinkering with it to 
  reach a lower overflow %?
 
  Correct me if I'm wrong but it seems the goal here is to REDUCE the 
  overflow % while not creating too many modulus (groups).
 
  Chris
 
 
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
  21092 overflowed, 1452 badly )
  Number of records ..   1290469
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   90% (split), 50% (merge) and 70% (actual)
  Total size .   522260480 bytes
  Total size of record data ..   287400239 bytes
  Total size of record IDs ...   21508521 bytes
  Unused space ...   213343528 bytes
  Total space for records    522252288 bytes
 
   From: r...@lynden.com
   To: u2-users@listserver.u2ug.org
   Date: Tue, 3 Jul 2012 13:10:43 -0700
   Subject: Re: [U2

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
keep the 'Total size' of the disk down? If the goal is to keep the total
 disk size down then it would appear
you would want your actual load % a lot higher than 37%.. and then ignore 
'some' of the overflow..

Chris


 But the total size of your file is up 60%.  Reading in 60% more records in a 
 full select of the file is going to be much slower than a few more overflows.
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 2:15 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Dan,
 
 I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
 Actual Load has really gone down (as well as overflow). See below for the 
 results:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
 3957 overflowed, 207 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 37% (actual)
 Total size .   836235264 bytes
 Total size of record data ..   287394719 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   527323832 bytes
 Total space for records    836227072 bytes
 
 My overflow is now @ 2%
 My Load is @ 37% (actual)
 
 granted my empty groups are now up to almost 3% but I hope that won't be a 
 big factor. How does this look?
 
 Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Wjhonson

Disks get bigger much faster than the rate they get faster.
So the overflow is the thing to minimize.



-Original Message-
From: Chris Austin cjausti...@hotmail.com
To: u2-users u2-users@listserver.u2ug.org
Sent: Tue, Jul 3, 2012 2:38 pm
Subject: Re: [U2] RESIZE - dynamic files



his is why I'm confused.. Is the goal here to reduce 'overflow' or to 
eep the 'Total size' of the disk down? If the goal is to keep the total
disk size down then it would appear
ou would want your actual load % a lot higher than 37%.. and then ignore 'some' 
f the overflow..
Chris

 But the total size of your file is up 60%.  Reading in 60% more records in a 
ull select of the file is going to be much slower than a few more overflows.
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 2:15 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Dan,
 
 I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
ctual Load has really gone down (as well as overflow). See below for the 
esults:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
 3957 overflowed, 207 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 37% (actual)
 Total size .   836235264 bytes
 Total size of record data ..   287394719 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   527323832 bytes
 Total space for records    836227072 bytes
 
 My overflow is now @ 2%
 My Load is @ 37% (actual)
 
 granted my empty groups are now up to almost 3% but I hope that won't be a big 
actor. How does this look?
 
 Chris
  
__
2-Users mailing list
2-us...@listserver.u2ug.org
ttp://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
37% is a very low load.  Reading disk records takes much longer than parsing 
the records out of a disk record.  With variable record size and moderately 
poor hashing, overflow is inevitable.  So, do you want 80,000 extra groups, or 
20,000 overflow buffers? I would go with the smaller number.  But for the love 
of Knuth, do not set your split.load to 90% unless you have a perfectly hashed 
file with uniformly sized records.

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:38 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
keep the 'Total size' of the disk down? If the goal is to keep the total
 disk size down then it would appear
you would want your actual load % a lot higher than 37%.. and then ignore 
'some' of the overflow..

Chris


 But the total size of your file is up 60%.  Reading in 60% more records in a 
 full select of the file is going to be much slower than a few more overflows.
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 2:15 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Dan,
 
 I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
 Actual Load has really gone down (as well as overflow). See below for the 
 results:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
 3957 overflowed, 207 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 37% (actual)
 Total size .   836235264 bytes
 Total size of record data ..   287394719 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   527323832 bytes
 Total space for records    836227072 bytes
 
 My overflow is now @ 2%
 My Load is @ 37% (actual)
 
 granted my empty groups are now up to almost 3% but I hope that won't be a 
 big factor. How does this look?
 
 Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Susan Lynch
Chris,

This is why file-sizing is something that requires careful thought.  As
some of the other responders have indicated, sometimes you want to keep
overflow to a minimum (because accessing individual records that are in
overflow takes extra disk reads, which slow down your system, and adding
new records to a group that is already in overflow will inevitably be
slower than adding a new record to a group which is not in overflow), and
sometimes you don't (eg if you have a file that is primarily read in a
sequential fashion where you do a Basic SELECT, and then loop through the
file reading every single record).   Because most of the files that I have
supported in my career have been read and written primarily as
single-record reads, I have always chosen to minimize overflow as my
default criteria, and only sized things for sequential reads when the file
is rarely written, rarely read as anything but a 'read them all in no
particular order' fashion, and that happens rarely in my experience.
However, as other responders have written, 'your mileage may vary'!

Look at how the file is used.  Look at what resources you have.  Then
decide...


Susan M. Lynch
F. W. Davison  Company, Inc.
-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 07/03/2012 5:38 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


This is why I'm confused.. Is the goal here to reduce 'overflow' or to
keep the 'Total size' of the disk down? If the goal is to keep the total
 disk size down then it would appear
you would want your actual load % a lot higher than 37%.. and then ignore
'some' of the overflow..

Chris


 But the total size of your file is up 60%.  Reading in 60% more records
in a full select of the file is going to be much slower than a few more
overflows.


 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 2:15 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files


 Dan,

 I changed the MINIMUM.MODULUS to the value of 23 as you suggested
and my Actual Load has really gone down (as well as overflow). See below
for the results:

 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    23 current ( minimum 23, 5263
empty,
 3957 overflowed, 207 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 37% (actual)
 Total size .   836235264 bytes
 Total size of record data ..   287394719 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   527323832 bytes
 Total space for records    836227072 bytes

 My overflow is now @ 2%
 My Load is @ 37% (actual)

 granted my empty groups are now up to almost 3% but I hope that won't be
a big factor. How does this look?

 Chris


___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

I set the split load based on what Dan suggested:

I'd take the merge down a little, to maybe 30% or even less, and maybe knock 
the split up a bit - say, 90% - to cut down on the splitting.

I thought this would cut down on splitting. Is there a certain formula, or way 
to calculate the split.load? What should my SPLIT.LOAD be around,
and how do you come up with that %?

Chris

 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 14:45:28 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 37% is a very low load.  Reading disk records takes much longer than parsing 
 the records out of a disk record.  With variable record size and moderately 
 poor hashing, overflow is inevitable.  So, do you want 80,000 extra groups, 
 or 20,000 overflow buffers? I would go with the smaller number.  But for the 
 love of Knuth, do not set your split.load to 90% unless you have a perfectly 
 hashed file with uniformly sized records.
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 2:38 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
 keep the 'Total size' of the disk down? If the goal is to keep the total
  disk size down then it would appear
 you would want your actual load % a lot higher than 37%.. and then ignore 
 'some' of the overflow..
 
 Chris
 
 
  But the total size of your file is up 60%.  Reading in 60% more records in 
  a full select of the file is going to be much slower than a few more 
  overflows.
  
  
  -Original Message-
  From: u2-users-boun...@listserver.u2ug.org 
  [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
  Sent: Tuesday, July 03, 2012 2:15 PM
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  Dan,
  
  I changed the MINIMUM.MODULUS to the value of 23 as you suggested and 
  my Actual Load has really gone down (as well as overflow). See below for 
  the results:
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
  3957 overflowed, 207 badly )
  Number of records ..   1290469
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   90% (split), 50% (merge) and 37% (actual)
  Total size .   836235264 bytes
  Total size of record data ..   287394719 bytes
  Total size of record IDs ...   21508521 bytes
  Unused space ...   527323832 bytes
  Total space for records    836227072 bytes
  
  My overflow is now @ 2%
  My Load is @ 37% (actual)
  
  granted my empty groups are now up to almost 3% but I hope that won't be a 
  big factor. How does this look?
  
  Chris
 
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
Unless the minimum modulus is configured high enough to artificially lower the 
actual load, the actual load will rise to the designated split.load as the file 
grows. The split.load indicates nothing about the specific load of any given 
group; so if it is set to 90%, then on average, each group will be 90% full, 
and adding a (400byte) record to a group will send it into overflow, but since 
400 bytes is a trivial percentage of your overall file load, many groups will 
be overflowed before the total load factor exceeds 90%.  

Okay, there is a slight distortion with the numbers there, but the idea is that 
all buckets are not loaded equally, so if the average is almost full the 
reality is many overflowed.

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:52 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


I set the split load based on what Dan suggested:

I'd take the merge down a little, to maybe 30% or even less, and maybe knock 
the split up a bit - say, 90% - to cut down on the splitting.

I thought this would cut down on splitting. Is there a certain formula, or way 
to calculate the split.load? What should my SPLIT.LOAD be around,
and how do you come up with that %?

Chris

 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Tue, 3 Jul 2012 14:45:28 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 37% is a very low load.  Reading disk records takes much longer than parsing 
 the records out of a disk record.  With variable record size and moderately 
 poor hashing, overflow is inevitable.  So, do you want 80,000 extra groups, 
 or 20,000 overflow buffers? I would go with the smaller number.  But for the 
 love of Knuth, do not set your split.load to 90% unless you have a perfectly 
 hashed file with uniformly sized records.
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 2:38 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
 keep the 'Total size' of the disk down? If the goal is to keep the total
  disk size down then it would appear
 you would want your actual load % a lot higher than 37%.. and then ignore 
 'some' of the overflow..
 
 Chris
 
 
  But the total size of your file is up 60%.  Reading in 60% more records in 
  a full select of the file is going to be much slower than a few more 
  overflows.
  
  
  -Original Message-
  From: u2-users-boun...@listserver.u2ug.org 
  [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
  Sent: Tuesday, July 03, 2012 2:15 PM
  To: u2-users@listserver.u2ug.org
  Subject: Re: [U2] RESIZE - dynamic files
  
  
  Dan,
  
  I changed the MINIMUM.MODULUS to the value of 23 as you suggested and 
  my Actual Load has really gone down (as well as overflow). See below for 
  the results:
  
  File name ..   GENACCTRN_POSTED
  Pathname ...   GENACCTRN_POSTED
  File type ..   DYNAMIC
  File style and revision    32BIT Revision 12
  Hashing Algorithm ..   GENERAL
  No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
  3957 overflowed, 207 badly )
  Number of records ..   1290469
  Large record size ..   3267 bytes
  Number of large records    180
  Group size .   4096 bytes
  Load factors ...   90% (split), 50% (merge) and 37% (actual)
  Total size .   836235264 bytes
  Total size of record data ..   287394719 bytes
  Total size of record IDs ...   21508521 bytes
  Unused space ...   527323832 bytes
  Total space for records    836227072 bytes
  
  My overflow is now @ 2%
  My Load is @ 37% (actual)
  
  granted my empty groups are now up to almost 3% but I hope that won't be a 
  big factor. How does this look?
  
  Chris
 
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Charles Stevenson

Chris,
Let's back way up.   I take it your original question is a general one,  
not specific to one poorly performing problematic file.  Is that right?


If so, generally speaking, you just don't get a lot out of fine-tuning 
dynamic files.
Tweaking the default parameters doesn't usually make a whole lot of 
difference.

Several people have said something similar in this thread.

Other than deciding which hashing algorithm,  I generally use the 
defaults and only tweak things once the file proves problematic, which 
usually means slow I/O.


When a problem erupts, look carefully at how that specific file is used, 
as Susan  others have said.   You might get hold of FitzgeraldLong's 
paper on how dynamic files work.  If you understand the fundamentals, 
you'll understand how to attack your problem file, applying the ideas 
Rick  others have talked about here.


You may go several years without having to resort to that.

Chuck Stevenson


On 7/2/2012 2:22 PM, Chris Austin wrote:

I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How do you calculate what the modulus and separation should be? I can't use 
HASH.HELP on a type 30 file to see the recommended settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris



___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
From the System Description manual:

Important Considerations

Dynamic files are meant to make file management easier for users. The default
parameters are set so that most dynamic files work efficiently. If you decide 
to change
the parameters of a dynamic file, keep the following considerations in mind:

„ Use the SEQ.NUM hashing algorithm only when your record IDs are
numeric, sequential, and consecutive. Nonconsecutive numbers should not
be hashed using the SEQ.NUM hashing algorithm.

„ Use a group size of 2 only if you expect the average record size to be larger
than 1000 bytes. If your record size is larger than 2000 bytes, consider using
a nonhashed file—type 1 or 19.

„ Large record size should generally not be changed. Storing the data of a
large record in the overflow buffer causes that data not to be included in the
split and merge calculations. Also, the extra data length does not slow access
to subsequent records. By choosing a large record size of 0%, all the records
are considered large. In this case, record IDs can be accessed extremely
quickly by commands such as SELECT, but access to the actual data is
much less efficient.

„ A small split load causes less data to be stored in each group buffer, 
resulting
in faster access time and less overflow at the expense of requiring extra
memory. A large split load causes more data to be stored in each group
buffer, resulting in better use of memory at the expense of slower access
time and more overflow. A split load of 100% disables splits.

„ The gap between merge load and split load should be large enough so that
splits and merges do not occur too frequently. The split and merge processes
take a significant amount of processing time. If you make the merge load too
small, memory usage can be very poor. Also, selection time is increased
when record IDs are distributed in more groups than are needed. A merge
load of 0% disables merges.

„ Consider increasing the minimum modulo if you intend to add a lot of initial
data to the file. Much data-entry time can be saved by avoiding the initial
splits that can occur if you enter a lot of initial data. You may want to
readjust this value after

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Charles Stevenson
Sent: Tuesday, July 03, 2012 3:34 PM
To: U2 Users List
Subject: Re: [U2] RESIZE - dynamic files

Chris,
Let's back way up.   I take it your original question is a general one,  
not specific to one poorly performing problematic file.  Is that right?

If so, generally speaking, you just don't get a lot out of fine-tuning 
dynamic files.
Tweaking the default parameters doesn't usually make a whole lot of 
difference.
Several people have said something similar in this thread.

Other than deciding which hashing algorithm,  I generally use the 
defaults and only tweak things once the file proves problematic, which 
usually means slow I/O.

When a problem erupts, look carefully at how that specific file is used, 
as Susan  others have said.   You might get hold of FitzgeraldLong's 
paper on how dynamic files work.  If you understand the fundamentals, 
you'll understand how to attack your problem file, applying the ideas 
Rick  others have talked about here.

You may go several years without having to resort to that.

Chuck Stevenson


On 7/2/2012 2:22 PM, Chris Austin wrote:
 I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
 example I have a file called 'TEST_FILE'
 with the following:

 01 ANALYZE.FILE TEST_FILE
 File name ..   TEST_FILE
 Pathname ...   TEST_FILE
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    83261 current ( minimum 31 )
 Large record size ..   3267 bytes
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   450613248 bytes

 How do you calculate what the modulus and separation should be? I can't use 
 HASH.HELP on a type 30 file to see the recommended settings
 so I was wondering how best you figure out the file RESIZE.

 Thanks,

 Chris


___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Ross Ferris
I would suggest that then actual goal is to achieve maximum performance for 
your system, so knowing HOW the file is used on a daily basis can also 
influence decisions. Disk is a cheap commodity, so having some wastage in 
file utilization shouldn't factor. 


Ross Ferris
Stamina Software
Visage  Better by Design!


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Wednesday, 4 July 2012 7:38 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


This is why I'm confused.. Is the goal here to reduce 'overflow' or to keep the 
'Total size' of the disk down? If the goal is to keep the total  disk size down 
then it would appear you would want your actual load % a lot higher than 37%.. 
and then ignore 'some' of the overflow..

Chris


 But the total size of your file is up 60%.  Reading in 60% more records in a 
 full select of the file is going to be much slower than a few more overflows.
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris 
 Austin
 Sent: Tuesday, July 03, 2012 2:15 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Dan,
 
 I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
 Actual Load has really gone down (as well as overflow). See below for the 
 results:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
 3957 overflowed, 207 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 37% (actual)
 Total size .   836235264 bytes
 Total size of record data ..   287394719 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   527323832 bytes
 Total space for records    836227072 bytes
 
 My overflow is now @ 2%
 My Load is @ 37% (actual)
 
 granted my empty groups are now up to almost 3% but I hope that won't be a 
 big factor. How does this look?
 
 Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Keith Johnson [DATACOM]
Doug may have had a key bounce in his input

 Let's do the math:

 258687736 (Record Size)
 192283300 (Key Size)
 

The key size is actually 19283300 in Chris' figures

Regarding 68,063 being less than the current modulus of 82,850.  I think the 
answer may lie in the splitting process.

As I understand it, the first time a split occurs group 1 is split and its 
contents are split between new group 1 and new group 2. All the other groups 
effectively get 1 added to their number. The next split is group 3 (which was 
2) into 3 and 4 and so forth. A pointer is kept to say where the next split 
will take place and also to help sort out how to adjust the algorithm to 
identify which group matches a given key.

Based on this, if you started with 1000 groups, by the time you have split the 
500th time you will have 1500 groups.  The first 1000 will be relatively empty, 
the last 500 will probably be overflowed, but not terribly badly.  By the time 
you get to the 1000th split, you will have 2000 groups and they will, one 
hopes, be quite reasonably spread with very little overflow.

So I expect the average access times would drift up and down in a cycle.  The 
cycle time would get longer as the file gets bigger but the worst time would be 
roughly the the same each cycle.

Given the power of two introduced into the algorithm by the before/after the 
split thing, I wonder if there is such a need to start off with a prime?

Regards, Keith

PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


[U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How do you calculate what the modulus and separation should be? I can't use 
HASH.HELP on a type 30 file to see the recommended settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


[U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How
 do you calculate what the modulus and separation should be? I can't use
 HASH.HELP on a type 30 file to see the recommended settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris 
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Martin Phillips
Hi Chris,

The whole point of dynamic files is that you don't do RESIZE. The file will 
look after itself, automatically responding to
variations in the volume of data.

There are knobs to twiddle but in most cases they can safely be left at their 
defaults. A dynamic file will never perform as well
as a perfectly tuned static file but they are a heck of a lot better than 
typical static files that haven't been reconfigured for
ages.


Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
+44 (0)1604-709200




-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 02 July 2012 20:22
To: u2-users@listserver.u2ug.org
Subject: [U2] RESIZE - dynamic files


I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How do you calculate what the modulus and separation should be? I can't use 
HASH.HELP on a type 30 file to see the recommended
settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

The dynamic file I'm working with is below. What do 'overflowed' and 'badly' 
refer to under MODULUS? Is the goal of the RESIZE to eliminate that
overflow? Any ideas what I should change to achieve this?


File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
26225 overflowed, 1441 badly )
Number of records ..   1157122
Large record size ..   2036 bytes
Number of large records    576
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   449605632 bytes
Total size of record data ..   258687736 bytes
Total size of record IDs ...   19283300 bytes
Unused space ...   171626404 bytes
Total space for records    449597440 bytes

Thanks,

Chris


 From: cjausti...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Mon, 2 Jul 2012 14:55:21 -0500
 Subject: [U2] RESIZE - dynamic files
 
 
 I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
 example I have a file called 'TEST_FILE'
 with the following:
 
 01 ANALYZE.FILE TEST_FILE
 File name ..   TEST_FILE
 Pathname ...   TEST_FILE
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    83261 current ( minimum 31 )
 Large record size ..   3267 bytes
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   450613248 bytes
 
 How
  do you calculate what the modulus and separation should be? I can't use
  HASH.HELP on a type 30 file to see the recommended settings
 so I was wondering how best you figure out the file RESIZE.
 
 Thanks,
 
 Chris   
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Dan Fitzgerald

Group size appears adequate (although anytime anything hashes into the group(s) 
with the largest record [3267b], you'll split: 3267 is 79.8% of 4096, so if you 
have a lot of records up in the 3K range, you may want to increase group size 
and decrease min modulus accordingly), but the minimum modulus should be a 
prime north of the current modulus, with a padding factor based on growth 
expectations. The sweet spot is where you have enough data in each group to 
avoid merging (I'd argue that 50% is a bit high for the merge; but that's 
because I'm unafraid of unused space, while I'm averse to file maintenance 
overhead), but not so much that you do a lot of splitting. You should do a 
count on the number of records, too. It almost never makes sense to have the 
modulus exceed the number of records by a substantial percentage.
 
So, you should increase minimum modulus to 83267 or higher, unless you double 
the group size to 8K, in which case something around 50K as a modulus sounds 
good. I'd take the merge down a little, to maybe 30% or even less, and maybe 
knock the split up a bit - say, 90% - to cut down on the splitting.
 
 From: cjausti...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Mon, 2 Jul 2012 14:55:21 -0500
 Subject: [U2] RESIZE - dynamic files
 
 
 I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
 example I have a file called 'TEST_FILE'
 with the following:
 
 01 ANALYZE.FILE TEST_FILE
 File name ..   TEST_FILE
 Pathname ...   TEST_FILE
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    83261 current ( minimum 31 )
 Large record size ..   3267 bytes
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   450613248 bytes
 
 How
  do you calculate what the modulus and separation should be? I can't use
  HASH.HELP on a type 30 file to see the recommended settings
 so I was wondering how best you figure out the file RESIZE.
 
 Thanks,
 
 Chris   
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

I guess my main question is regarding the 'overflow' and 'badly' #'s which you 
can see when you do an ANALYZE.FILE filename STATISTICS. 
Is the goal not to have any overflow #? And what is 'badly'?

After playing around with RESIZE on this file, I was able to come up with the 
following:

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 24
 82850 current ( minimum 24, 104 empty,26225 
overflowed, 1441 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 1000
   82850 current ( minimum 1000, 104 empty,26225 
overflowed, 1441 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 99420
   99420 current ( minimum 99420, 182 empty,18725 
overflowed, 1054 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 119304
  119304 current ( minimum 119304, 247 empty, 9511 
overflowed, 406 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 143165
  143165 current ( minimum 143165, 1328 empty,4333 
overflowed, 259 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 171799
  171799 current ( minimum 171799, 3814 empty,3063 
overflowed, 237 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 223339
  223339 current ( minimum 223339, 9215 empty,1810 
overflowed, 222 badly )

As you can see as I increase my MINIMUM.MODULUS, my 'overflowed' and 'badly' 
#'s go down. Is this the goal when tuning a dynamic file?

Chris


 From: martinphill...@ladybridge.com
 To: u2-users@listserver.u2ug.org
 Date: Mon, 2 Jul 2012 20:56:40 +0100
 Subject: Re: [U2] RESIZE - dynamic files
 
 Hi Chris,
 
 The whole point of dynamic files is that you don't do RESIZE. The file will 
 look after itself, automatically responding to
 variations in the volume of data.
 
 There are knobs to twiddle but in most cases they can safely be left at 
 their defaults. A dynamic file will never perform as well
 as a perfectly tuned static file but they are a heck of a lot better than 
 typical static files that haven't been reconfigured for
 ages.
 
 
 Martin Phillips
 Ladybridge Systems Ltd
 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
 +44 (0)1604-709200
 
 
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
 [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
 Sent: 02 July 2012 20:22
 To: u2-users@listserver.u2ug.org
 Subject: [U2] RESIZE - dynamic files
 
 
 I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
 example I have a file called 'TEST_FILE'
 with the following:
 
 01 ANALYZE.FILE TEST_FILE
 File name ..   TEST_FILE
 Pathname ...   TEST_FILE
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    83261 current ( minimum 31 )
 Large record size ..   3267 bytes
 Group size .   4096 bytes
 Load factors ...   80% (split), 50% (merge) and 80% (actual)
 Total size .   450613248 bytes
 
 How do you calculate what the modulus and separation should be? I can't use 
 HASH.HELP on a type 30 file to see the recommended
 settings
 so I was wondering how best you figure out the file RESIZE.
 
 Thanks,
 
 Chris
 
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
 
 ___
 U2-Users mailing list
 U2-Users@listserver.u2ug.org
 http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Doug Averch
Hi Chris:

You cannot get away with not resizing dynamic files in my experience.  The
files do not split and merge like we are led to believe.  The separator is
not used on dynamic files.  Your Universe file is badly sized.  The math
below will get you reasonably file size.

Let's do the math:

258687736 (Record Size)
192283300 (Key Size)

450,971,036 (Data and Key Size)

4096 (Group Size)
- 12   (32 Bit Overhead)

4084 Usable Space

450971036/4084 = Minimum Modulo 110424 (Prime is 110431)


[ad]
I hate doing this math all of the time.  I have a reasonably priced resize
program called XLr8Resizer for $99.00 to do this for me.
[/ad]

Regards,
Doug
www.u2logic.com/tools.html
XLr8Resizer for the rest of us
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Rick Nuckolls
Chris,

I second the thought that, because of the splitting and merging of groups, it 
can be a waste of effort to overwork the sizing of a dynamic file.

One problem with your TEST_FILE below is that the Large Record Size is spec'ed 
at less than 50% of the group size.  Each record that is larger than the large 
record size is given at least one full sized buffer in the overflow file, so a 
record of 2037 bytes, in your example, would occupy 4096 bytes of space.  The 
ID and pointer is left in the primary data group.  It appears that your records 
average 250 bytes, so this probably is not a large factor, but that would also 
suggest that you stick to a GROUP.SIZE of 1 (2048 bytes) rather than 2.  Btw, 
each of your 576 large records probably counts towards the overflowed badly 
column, though, from an access point of view, the group might be in optimal 
shape.

The effective modulo of a dynamic file is based on the space used by the 
not(large records), but the  Total size of record data includes the full 
buffer size of the overflow records, I believe, and so should not be used to 
compute the total size of your data. For record sizes like you have, I would 
compute the total of the ids+records, add about 10% for overhead, divide by the 
group size (2048, if you use the default), multiply by 1.25 (allow for the 80% 
splitting factor), and then set the minimum modulus to the next larger prime 
number.

In the example below, you can see 50 large records in a single group of a 
dynamic file, but only the id's are in the primary buffer.  If you do the math, 
you will find that each 1001 record is using up a  4096 byte overflow buffer. 


File name ..   BIGD
Pathname ...   BIGD
File type ..   DYNAMIC
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    1 current ( minimum 1, 0 empty,
1 overflowed, 1 badly )
Number of records ..   50
Large record size ..   1000 bytes
Number of large records    50
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 30% (actual)
Total size .   217088 bytes
Total size of record data ..   205466 bytes
Total size of record IDs ...   534 bytes
Unused space ...   2896 bytes
Total space for records    208896 bytes

LIST BIGD TOTAL EVAL LEN(@ID) TOTAL EVAL LEN(@RECORD) DET.SUP 18:03:29  07-
02-12  PAGE 1
LEN(@ID)..LEN(@RECORD)

==
134   50050

50 records listed.

Note that if I stuck to the defaults and used sequential ids, I would have 
saved more than 1/2 of the disk space, but still have used 150% of the total 
id+record size.

File name ..   BIGD
Pathname ...   BIGD
File type ..   DYNAMIC
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    31 current ( minimum 1, 3 empty,
4 overflowed, 0 badly )
Number of records ..   50
Large record size ..   1628 bytes
Number of large records    0
Group size .   2048 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   79872 bytes
Total size of record data ..   50709 bytes
Total size of record IDs ...   91 bytes
Unused space ...   24976 bytes
Total space for records    75776 bytes

Rick Nuckolls
Lynden Inc


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Monday, July 02, 2012 2:07 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


The dynamic file I'm working with is below. What do 'overflowed' and 'badly' 
refer to under MODULUS? Is the goal of the RESIZE to eliminate that
overflow? Any ideas what I should change to achieve this?


File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
26225 overflowed, 1441 badly )
Number of records ..   1157122
Large record size ..   2036 bytes
Number of large records    576
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   449605632 bytes
Total size of record data ..   258687736 bytes
Total size of record IDs ...   19283300 bytes
Unused space ...   171626404 bytes
Total space for records    449597440 bytes

Thanks,

Chris


 From: cjausti...@hotmail.com
 To: u2-users@listserver.u2ug.org
 Date: Mon, 2 Jul 2012 14:55:21 -0500
 Subject: [U2] RESIZE - dynamic files
 
 
 I was wondering if anyone had instructions on RESIZE

[U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Dave S
Does anyone have any tech tips on how to select parameters when resizing 
dynamic files ?
 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


RE: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Hennessey, Mark F.
Should I put [AD] in the subject line for an unsolicited testimonial?  :)

The best advice I can give you is to buy a product called FAST:

http://www.fitzlong.com/

A great tool for analyzing and resizing files, be they dynamic or standard 
hashed files. Excellent support from excellent people at a great price.

There might be more expensive utilities out there, but I can't imagine that 
there is anything better.

Mark Hennessey
(a FAST Customer since 2002)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Dave S
Sent: Monday, June 12, 2006 10:25 AM
To: u2-users@listserver.u2ug.org
Subject: [U2] RESIZE DYNAMIC FILES


Does anyone have any tech tips on how to select parameters when resizing 
dynamic files ?
 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


Re: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Timothy Snyder
[EMAIL PROTECTED] wrote on 06/12/2006 10:24:51 AM:

 Does anyone have any tech tips on how to select parameters when
 resizing dynamic files ?

The following is from a published tech tip.  It provides guidelines, but
of course the nature of MV files makes it difficult to predict optimal
sizing.  To get the appropriate input data, run guide with the -r option
to send the output to a hashed file.  Point the dictionary of that file as
directed, and you'll have what you need.  It's important to note that this
only applies to KEYONLY files.
===
Formula for determining base modulo, block size, SPLIT_LOAD,
and MERGE_LOAD for UniData KEYONLY Dynamic Files

Note that the variables used are the same as the DICT items in
$UDTHOME/sys/D_UDT_GUIDE.

Considerations:
a) The following does not take into account the Unix disk
   record (frame) size so it is best to select a block
   size based on the number of items you?d like in a group.
b) No one method will provide absolute results but these
   calculations will minimize level one overflow caused
   by a high SPLIT_LOAD value.
c) Type 0 works best for most Dynamic Files but it is best
   to check a small sample via the GROUP.STAT command.

Step 1: Determine the blocksize.  (Use 4096 unless the Items
per group is larger then 35 or less then 2)
  A) If the MAXSIZ  1K ITEMSIZE = 10 * MAXSIZ
  B) If  1 K  MAXSIZ  3 K ITEMSIZE = 5 * MAXSIZ
  C) If  MAXSIZ  3 K ITEMSIZE = 5 * (AVGSIZ + DEVSIZ)

Once you determine the item size, use it to determine the NEWBLOCKSIZE.
  A) ITEMSIZE  1024;  NEWBLOCKSIZE = 1024
  B) 1024  ITEMSIZE  2048;   NEWBLOCKSIZE = 2048
  C) 2048  ITEMSIZE  4096;   NEWBLOCKSIZE = 4096
  D) 4096  ITEMSIZE  8192;   NEWBLOCKSIZE = 8192
  E) 8192  ITEMSIZE  16384;  NEWBLOCKSIZE = 16384

Step 2: Determine the actual number of items per group.
  ITEMS_PER_GROUP = NEWBLOCKSIZE-32 / AVGSIZ

Step 3: Determine the base modulo
  BASEMODULO = COUNT / ITEMS_PER_GROUP

Step 4: Determine SPLIT_LOAD
  SPLIT_LOAD=INTAVGKEY + 9)*ITEMS_PER_GROUP)/NEW_BLOCKSIZE)*100)+1
  If the SPLIT_LOAD is less then ten then: SPLIT_LOAD = 10

Step 5: Determine MERGE_LOAD
  MERGE_LOAD = SPLIT_LOAD / 2 (Rounded up)


Tim Snyder
Consulting I/T Specialist , U2 Professional Services
North American Lab Services
DB2 Information Management, IBM Software Group
717-545-6403
[EMAIL PROTECTED]
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


Re: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Dave S
What does the guide -r option do ?
   
  We have been using the -a option.

Timothy Snyder [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] wrote on 06/12/2006 10:24:51 AM:

 Does anyone have any tech tips on how to select parameters when
 resizing dynamic files ?

The following is from a published tech tip. It provides guidelines, but
of course the nature of MV files makes it difficult to predict optimal
sizing. To get the appropriate input data, run guide with the -r option
to send the output to a hashed file. Point the dictionary of that file as
directed, and you'll have what you need. It's important to note that this
only applies to KEYONLY files.
===
Formula for determining base modulo, block size, SPLIT_LOAD,
and MERGE_LOAD for UniData KEYONLY Dynamic Files

Note that the variables used are the same as the DICT items in
$UDTHOME/sys/D_UDT_GUIDE.

Considerations:
a) The following does not take into account the Unix disk
record (frame) size so it is best to select a block
size based on the number of items you?d like in a group.
b) No one method will provide absolute results but these
calculations will minimize level one overflow caused
by a high SPLIT_LOAD value.
c) Type 0 works best for most Dynamic Files but it is best
to check a small sample via the GROUP.STAT command.

Step 1: Determine the blocksize. (Use 4096 unless the Items
per group is larger then 35 or less then 2)
A) If the MAXSIZ  1K ITEMSIZE = 10 * MAXSIZ
B) If 1 K  MAXSIZ  3 K ITEMSIZE = 5 * MAXSIZ
C) If MAXSIZ  3 K ITEMSIZE = 5 * (AVGSIZ + DEVSIZ)

Once you determine the item size, use it to determine the NEWBLOCKSIZE.
A) ITEMSIZE  1024; NEWBLOCKSIZE = 1024
B) 1024  ITEMSIZE  2048; NEWBLOCKSIZE = 2048
C) 2048  ITEMSIZE  4096; NEWBLOCKSIZE = 4096
D) 4096  ITEMSIZE  8192; NEWBLOCKSIZE = 8192
E) 8192  ITEMSIZE  16384; NEWBLOCKSIZE = 16384

Step 2: Determine the actual number of items per group.
ITEMS_PER_GROUP = NEWBLOCKSIZE-32 / AVGSIZ

Step 3: Determine the base modulo
BASEMODULO = COUNT / ITEMS_PER_GROUP

Step 4: Determine SPLIT_LOAD
SPLIT_LOAD=INTAVGKEY + 9)*ITEMS_PER_GROUP)/NEW_BLOCKSIZE)*100)+1
If the SPLIT_LOAD is less then ten then: SPLIT_LOAD = 10

Step 5: Determine MERGE_LOAD
MERGE_LOAD = SPLIT_LOAD / 2 (Rounded up)


Tim Snyder
Consulting I/T Specialist , U2 Professional Services
North American Lab Services
DB2 Information Management, IBM Software Group
717-545-6403
[EMAIL PROTECTED]
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


RE: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Dave S
We have used the product here before.
   
  I think our license on it lapsed.
   
  I have been using the guide for several years instead of using fast.

Hennessey, Mark F. [EMAIL PROTECTED] wrote:
  Should I put [AD] in the subject line for an unsolicited testimonial? :)

The best advice I can give you is to buy a product called FAST:

http://www.fitzlong.com/

A great tool for analyzing and resizing files, be they dynamic or standard 
hashed files. Excellent support from excellent people at a great price.

There might be more expensive utilities out there, but I can't imagine that 
there is anything better.

Mark Hennessey
(a FAST Customer since 2002)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Dave S
Sent: Monday, June 12, 2006 10:25 AM
To: u2-users@listserver.u2ug.org
Subject: [U2] RESIZE DYNAMIC FILES


Does anyone have any tech tips on how to select parameters when resizing 
dynamic files ?
__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


Re: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Timothy Snyder
[EMAIL PROTECTED] wrote on 06/12/2006 12:57:03 PM:

 What does the guide -r option do ?
 
   We have been using the -a option.

The -r option sends guide output to a hashed file. This makes it very easy 
to select for files that are undersized, or that have corruption.  So I'll 
often do a CREATE.FILE DATA UDT_GUIDE 101, then edit the VOC entry of 
UDT_GUIDE so attribute 3 points to @UDTHOME/sys/D_UDT_GUIDE.  Then I can 
do something like this from ECL:
  !guide /some_dir/some_file -na -ne -ns -r UDT_GUIDE

This will create a record in UDT_GUIDE keyed as /some_dir/some_file.  With 
that information for all of your files, you can do something like this:
  list UDT_GUIDE WITH STATUS LIKE ...2 (to find files with level 2 
overflow)
  list UDT_GUIDE WITH STATUS LIKE Err... (to file files with corruption)
  list UDT_GUIDE MAXSIZ AVGSIZ DEVSIZ COUNT AVGKEY (to get the info for 
the dynamic file sizing calculations)

It's SO much easier than writing code to parse through the text output of 
guide.

Tim Snyder
Consulting I/T Specialist , U2 Professional Services
North American Lab Services
DB2 Information Management, IBM Software Group
717-545-6403
[EMAIL PROTECTED]
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/