Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Wjhonson

You forgot the need to defragment, since someone suggested that my idea of 
using the intrinsic look-ahead ability is hampered by hard fragmentation.




-Original Message-
From: Rick Nuckolls 
To: 'U2 Users List' 
Sent: Fri, Jul 6, 2012 11:20 am
Subject: Re: [U2] RESIZE - dynamic files


Logically, the graphed solution to varying the split.load value with an 
-axis=modulus, y-axis=time_to_select_&_read_the_whole_file is going to be 
arabolic, having very slow performance at modulus=1 and modulus = # of records.
If you actually want to find the precise low point, ignore all this bs, create 
a 
unch of files with copies of the same data, but different moduli, restart your 
ystem (including all disk drives & raid devices) in order to purge all buffers, 
nd then run the same program against each file.  I think that we would all be 
urious about the results.
Easier yet, just ignore the bs and use the defaults. :)
-Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
ent: Friday, July 06, 2012 9:56 AM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

o is there a performance increase in BASIC SELECTS by reducing overflow? Some 
eople are saying to reduce disk space to speed up the BASIC SELECT
hile others say to reduce overflow.. I'm a bit confused. All of our programs 
hat read that table use a BASIC SELECT WITH.. 
for a BASIC select do you gain anything by reducing overflow?
Chris

 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole "in order" business.
 The disk drive controller has no clue on linked frames, but it *will* do 
ptimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in overflow, at 
ll. :)
 That way, when you go to ask for the *next* frame, it will always be 
ontiguous, and already sitting in memory.
 
 
 
 
 
 
 
 
 -Original Message-
 From: Rick Nuckolls 
 To: 'U2 Users List' 
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Most disks and disk systems cache huge amounts of information these days, and, 
> epending on 20 factors or so, one solution will be better than another for a 
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will almost 
lways 
 in. For files that have ~10 records/group and have ~10% of the groups 
 verflowed, then perhaps 1% of record reads will do a second read for the 
 verflow buffer because the target key was not in the primary group.  Writing a 
> ew record would possibly hit the 10% mark for reading overflow buffers. But 
 owering the split.load will increase the number of splits slightly, and 
 ncrease the total number of groups considerably.  What you have shown is that 
 ou need to increase the the modulus (and select time) of a large file more 
han 
 0% in order to decrease the read and update times for you records 0.5% of the 
 ime (assuming, that you have only reduced the number of overflow groups by 
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual results 
 ill rapidly change if you actually add /remove records from your file, change 
 he load or number of files on your system, put in a new drive, cpu, memory 
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files
 
 he hardward "look ahead" of the disk drive reader will grab consecutive 
 frames" into memory, since it assumes you'll want the "next" frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)
 
 
 Original Message-----
 rom: Rick Nuckolls 
 o: 'U2 Users List' 
 ent: Thu, Jul 5, 2012 2:29 pm
 ubject: Re: [U2] RESIZE - dynamic files
 
 hris,
 or the type of use that you described earlier; BASIC selects and reads, 
 ducing overflow will have negligible performance benefit, especially compared 
  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
 latively small percentages, then it will never merge anyway (because you will 
 ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
  your optimum minimum modulus solution will probably be "how ever large it 
 ows"  The overhead for a group split is not as bad as it sounds unless your 
 dates/sec count is extremely high, such as during a copy.
 f you do regular SELECT and SCANS o

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Susan Lynch
Chris,

10 years ago, when I was administering a UniVerse system, the answer would
have been "minimize both to the best of your ability".  But I don't know
how UniVerse has changed in the interim, during which time I have been
working on UniData systems, which are enormously different in their
handling of records in groups from any other Pick-type system I have ever
worked on (all of which were much more similar to UniVerse at that time).
And when last I administered a UniVerse system, there were no dynamic
files..

With that caveat, here are the factors:

1) a record in a UniVerse file that is stored in overflow is going to take
2 or more disk reads to retrieve if you are retrieving it by id.  However,
in a Basic select (structured as in Will's example, with no quotes, no
"WITH" criteria), the system will walk through the file group by group,
and will read each record, so yes, it will take 2 (or more, depending on
how deeply that group is in overflow) reads to get the data, but it will
have done the first read anyway to read those records - so for the Basic
SELECT, you probably want to minimize the number of groups read to the
extent that you can do so without putting many of the groups into
overflow.

2) to add records to the file, you have to access the file by the record
id, which means hashing the id to the group, then walking through the
group to see if the id is already in use, and if not, adding the record to
the end of the data area in use.  So for that, you absolutely want to
minimize the amount of overflow, because overflow slows you down on the
'adds'.

3) any sort/select or query read of the database will be slowed down
significantly by overflow, but you said you don't do much of that anyway.

Susan M. Lynch
F. W. Davison & Company, Inc.
-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 07/06/2012 12:56 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


So is there a performance increase in BASIC SELECTS by reducing overflow?
Some people are saying to reduce disk space to speed up the BASIC SELECT
while others say to reduce overflow.. I'm a bit confused. All of our
programs that read that table use a BASIC SELECT WITH..

for a BASIC select do you gain anything by reducing overflow?

Chris


> To: u2-users@listserver.u2ug.org
> From: wjhon...@aol.com
> Date: Thu, 5 Jul 2012 20:12:21 -0400
> Subject: Re: [U2] RESIZE - dynamic files
>
>
> A BASIC SELECT cannot use criteria at all.
> It is going to walk through every record in the file, in order.
> And that's the sticky wicket. That whole "in order" business.
> The disk drive controller has no clue on linked frames, but it *will* do
optimistic look aheads for you.
> So you are much better off, for BASIC SELECTs having nothing in
overflow, at all. :)
> That way, when you go to ask for the *next* frame, it will always be
contiguous, and already sitting in memory.
>
>
>
>
>
>
>
>
> -----Original Message-
> From: Rick Nuckolls 
> To: 'U2 Users List' 
> Sent: Thu, Jul 5, 2012 4:43 pm
> Subject: Re: [U2] RESIZE - dynamic files
>
>
> Most disks and disk systems cache huge amounts of information these
days, and,
> epending on 20 factors or so, one solution will be better than another
for a
> iven file.
> For the wholesale, SELECT F WITH, The fewest disk records will
almost always
> in. For files that have ~10 records/group and have ~10% of the groups
> verflowed, then perhaps 1% of record reads will do a second read for the

> verflow buffer because the target key was not in the primary group.
Writing a
> ew record would possibly hit the 10% mark for reading overflow buffers.
But
> owering the split.load will increase the number of splits slightly, and
> ncrease the total number of groups considerably.  What you have shown is
that
> ou need to increase the the modulus (and select time) of a large file
more than
> 0% in order to decrease the read and update times for you records 0.5%
of the
> ime (assuming, that you have only reduced the number of overflow groups
by
> 50%.)
> As Charles suggests, this is an interesting exercise, but your actual
results
> ill rapidly change if you actually add /remove records from your file,
change
> he load or number of files on your system, put in a new drive, cpu,
memory
> oard, or install a new release of Universe, move to raid, etc.
> -Rick
> -Original Message-
> rom: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org]
> n Behalf Of Wjhonson
> ent: Thursday, July 05, 2012 2:38 PM
> o: u2-users@listserver.u2ug.org
> ubject: Re: [U2] RESIZE - dynamic files
>
> he hardward "look ahead" of the disk drive

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Rick Nuckolls
Logically, the graphed solution to varying the split.load value with an 
x-axis=modulus, y-axis=time_to_select_&_read_the_whole_file is going to be 
parabolic, having very slow performance at modulus=1 and modulus = # of records.

If you actually want to find the precise low point, ignore all this bs, create 
a bunch of files with copies of the same data, but different moduli, restart 
your system (including all disk drives & raid devices) in order to purge all 
buffers, and then run the same program against each file.  I think that we 
would all be curious about the results.

Easier yet, just ignore the bs and use the defaults. :)

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Friday, July 06, 2012 9:56 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


So is there a performance increase in BASIC SELECTS by reducing overflow? Some 
people are saying to reduce disk space to speed up the BASIC SELECT
while others say to reduce overflow.. I'm a bit confused. All of our programs 
that read that table use a BASIC SELECT WITH.. 

for a BASIC select do you gain anything by reducing overflow?

Chris


> To: u2-users@listserver.u2ug.org
> From: wjhon...@aol.com
> Date: Thu, 5 Jul 2012 20:12:21 -0400
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> A BASIC SELECT cannot use criteria at all.
> It is going to walk through every record in the file, in order.
> And that's the sticky wicket. That whole "in order" business.
> The disk drive controller has no clue on linked frames, but it *will* do 
> optimistic look aheads for you.
> So you are much better off, for BASIC SELECTs having nothing in overflow, at 
> all. :)
> That way, when you go to ask for the *next* frame, it will always be 
> contiguous, and already sitting in memory.
> 
> 
> 
> 
> 
> 
> 
> 
> -Original Message-----
> From: Rick Nuckolls 
> To: 'U2 Users List' 
> Sent: Thu, Jul 5, 2012 4:43 pm
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Most disks and disk systems cache huge amounts of information these days, 
> and, 
> epending on 20 factors or so, one solution will be better than another for a 
> iven file.
> For the wholesale, SELECT F WITH, The fewest disk records will almost 
> always 
> in. For files that have ~10 records/group and have ~10% of the groups 
> verflowed, then perhaps 1% of record reads will do a second read for the 
> verflow buffer because the target key was not in the primary group.  Writing 
> a 
> ew record would possibly hit the 10% mark for reading overflow buffers. But 
> owering the split.load will increase the number of splits slightly, and 
> ncrease the total number of groups considerably.  What you have shown is that 
> ou need to increase the the modulus (and select time) of a large file more 
> than 
> 0% in order to decrease the read and update times for you records 0.5% of the 
> ime (assuming, that you have only reduced the number of overflow groups by 
> 50%.)
> As Charles suggests, this is an interesting exercise, but your actual results 
> ill rapidly change if you actually add /remove records from your file, change 
> he load or number of files on your system, put in a new drive, cpu, memory 
> oard, or install a new release of Universe, move to raid, etc.
> -Rick
> -Original Message-
> rom: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Wjhonson
> ent: Thursday, July 05, 2012 2:38 PM
> o: u2-users@listserver.u2ug.org
> ubject: Re: [U2] RESIZE - dynamic files
> 
> he hardward "look ahead" of the disk drive reader will grab consecutive 
> frames" into memory, since it assumes you'll want the "next" frame next.
> o the less overflow you have, the faster a full file scan will become.
> t least that's my theory ;)
> 
> 
> Original Message-
> rom: Rick Nuckolls 
> o: 'U2 Users List' 
> ent: Thu, Jul 5, 2012 2:29 pm
> ubject: Re: [U2] RESIZE - dynamic files
> 
> hris,
> or the type of use that you described earlier; BASIC selects and reads, 
> ducing overflow will have negligible performance benefit, especially compared 
>  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
> latively small percentages, then it will never merge anyway (because you will 
> ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
>  your optimum minimum modulus solution will probably be "how ever large it 
> ows"  The overhead for a group split is not as bad as it sounds unless your 
> dates/sec count is extremely high, such as during

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Wjhonson

What do you mean a BASIC SELECT WITH...

If you mean you are EXECUTE "SELECT CUSTOMER WITH..."
that is not a BASIC SELECT whose syntax is only

OPEN "CUSTOMER" TO F.CUSTOMER 
SELECT F.CUSTOMER

no WITH









-Original Message-
From: Chris Austin 
To: u2-users 
Sent: Fri, Jul 6, 2012 10:23 am
Subject: Re: [U2] RESIZE - dynamic files



o is there a performance increase in BASIC SELECTS by reducing overflow? Some 
eople are saying to reduce disk space to speed up the BASIC SELECT
hile others say to reduce overflow.. I'm a bit confused. All of our programs 
hat read that table use a BASIC SELECT WITH.. 
for a BASIC select do you gain anything by reducing overflow?
Chris

 To: u2-users@listserver.u2ug.org
 From: wjhon...@aol.com
 Date: Thu, 5 Jul 2012 20:12:21 -0400
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 A BASIC SELECT cannot use criteria at all.
 It is going to walk through every record in the file, in order.
 And that's the sticky wicket. That whole "in order" business.
 The disk drive controller has no clue on linked frames, but it *will* do 
ptimistic look aheads for you.
 So you are much better off, for BASIC SELECTs having nothing in overflow, at 
ll. :)
 That way, when you go to ask for the *next* frame, it will always be 
ontiguous, and already sitting in memory.
 
 
 
 
 
 
 
 
 -Original Message-
 From: Rick Nuckolls 
 To: 'U2 Users List' 
 Sent: Thu, Jul 5, 2012 4:43 pm
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Most disks and disk systems cache huge amounts of information these days, and, 
> epending on 20 factors or so, one solution will be better than another for a 
 iven file.
 For the wholesale, SELECT F WITH, The fewest disk records will almost 
lways 
 in. For files that have ~10 records/group and have ~10% of the groups 
 verflowed, then perhaps 1% of record reads will do a second read for the 
 verflow buffer because the target key was not in the primary group.  Writing a 
> ew record would possibly hit the 10% mark for reading overflow buffers. But 
 owering the split.load will increase the number of splits slightly, and 
 ncrease the total number of groups considerably.  What you have shown is that 
 ou need to increase the the modulus (and select time) of a large file more 
han 
 0% in order to decrease the read and update times for you records 0.5% of the 
 ime (assuming, that you have only reduced the number of overflow groups by 
 50%.)
 As Charles suggests, this is an interesting exercise, but your actual results 
 ill rapidly change if you actually add /remove records from your file, change 
 he load or number of files on your system, put in a new drive, cpu, memory 
 oard, or install a new release of Universe, move to raid, etc.
 -Rick
 -Original Message-
 rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Wjhonson
 ent: Thursday, July 05, 2012 2:38 PM
 o: u2-users@listserver.u2ug.org
 ubject: Re: [U2] RESIZE - dynamic files
 
 he hardward "look ahead" of the disk drive reader will grab consecutive 
 frames" into memory, since it assumes you'll want the "next" frame next.
 o the less overflow you have, the faster a full file scan will become.
 t least that's my theory ;)
 
 
 Original Message-----
 rom: Rick Nuckolls 
 o: 'U2 Users List' 
 ent: Thu, Jul 5, 2012 2:29 pm
 ubject: Re: [U2] RESIZE - dynamic files
 
 hris,
 or the type of use that you described earlier; BASIC selects and reads, 
 ducing overflow will have negligible performance benefit, especially compared 
  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
 latively small percentages, then it will never merge anyway (because you will 
 ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
  your optimum minimum modulus solution will probably be "how ever large it 
 ows"  The overhead for a group split is not as bad as it sounds unless your 
 dates/sec count is extremely high, such as during a copy.
 f you do regular SELECT and SCANS of the entire file, then your goal should be 
>  reduce the total disk size of the file, and not worry much about common 
 erflow. The important thing is that the file is dynamic, so you will never 
 counter the issues that undersized statically hashed files develop.
 e have thousands of dynamically hashed files on our (Solaris) systems, with an 
> tremely low problem rate.
 ick
 Original Message-
 om: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Chris Austin
 nt: Thursday, July 05, 2012 11:21 AM
 : u2-users@listserver.u2ug.org
 bject: Re: [U2] RESIZE - dynamic files
 ick,
 ou are correct, I should be using the smaller size (I just haven't changed it 
 t). Based on the reading I have done you should
 ly use the larger group size when the average reco

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Chris Austin

So is there a performance increase in BASIC SELECTS by reducing overflow? Some 
people are saying to reduce disk space to speed up the BASIC SELECT
while others say to reduce overflow.. I'm a bit confused. All of our programs 
that read that table use a BASIC SELECT WITH.. 

for a BASIC select do you gain anything by reducing overflow?

Chris


> To: u2-users@listserver.u2ug.org
> From: wjhon...@aol.com
> Date: Thu, 5 Jul 2012 20:12:21 -0400
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> A BASIC SELECT cannot use criteria at all.
> It is going to walk through every record in the file, in order.
> And that's the sticky wicket. That whole "in order" business.
> The disk drive controller has no clue on linked frames, but it *will* do 
> optimistic look aheads for you.
> So you are much better off, for BASIC SELECTs having nothing in overflow, at 
> all. :)
> That way, when you go to ask for the *next* frame, it will always be 
> contiguous, and already sitting in memory.
> 
> 
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Rick Nuckolls 
> To: 'U2 Users List' 
> Sent: Thu, Jul 5, 2012 4:43 pm
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Most disks and disk systems cache huge amounts of information these days, 
> and, 
> epending on 20 factors or so, one solution will be better than another for a 
> iven file.
> For the wholesale, SELECT F WITH, The fewest disk records will almost 
> always 
> in. For files that have ~10 records/group and have ~10% of the groups 
> verflowed, then perhaps 1% of record reads will do a second read for the 
> verflow buffer because the target key was not in the primary group.  Writing 
> a 
> ew record would possibly hit the 10% mark for reading overflow buffers. But 
> owering the split.load will increase the number of splits slightly, and 
> ncrease the total number of groups considerably.  What you have shown is that 
> ou need to increase the the modulus (and select time) of a large file more 
> than 
> 0% in order to decrease the read and update times for you records 0.5% of the 
> ime (assuming, that you have only reduced the number of overflow groups by 
> 50%.)
> As Charles suggests, this is an interesting exercise, but your actual results 
> ill rapidly change if you actually add /remove records from your file, change 
> he load or number of files on your system, put in a new drive, cpu, memory 
> oard, or install a new release of Universe, move to raid, etc.
> -Rick
> -Original Message-
> rom: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Wjhonson
> ent: Thursday, July 05, 2012 2:38 PM
> o: u2-users@listserver.u2ug.org
> ubject: Re: [U2] RESIZE - dynamic files
> 
> he hardward "look ahead" of the disk drive reader will grab consecutive 
> frames" into memory, since it assumes you'll want the "next" frame next.
> o the less overflow you have, the faster a full file scan will become.
> t least that's my theory ;)
> 
> 
> Original Message-
> rom: Rick Nuckolls 
> o: 'U2 Users List' 
> ent: Thu, Jul 5, 2012 2:29 pm
> ubject: Re: [U2] RESIZE - dynamic files
> 
> hris,
> or the type of use that you described earlier; BASIC selects and reads, 
> ducing overflow will have negligible performance benefit, especially compared 
>  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
> latively small percentages, then it will never merge anyway (because you will 
> ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
>  your optimum minimum modulus solution will probably be "how ever large it 
> ows"  The overhead for a group split is not as bad as it sounds unless your 
> dates/sec count is extremely high, such as during a copy.
> f you do regular SELECT and SCANS of the entire file, then your goal should 
> be 
>  reduce the total disk size of the file, and not worry much about common 
> erflow. The important thing is that the file is dynamic, so you will never 
> counter the issues that undersized statically hashed files develop.
> e have thousands of dynamically hashed files on our (Solaris) systems, with 
> an 
> tremely low problem rate.
> ick
> Original Message-
> om: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Chris Austin
> nt: Thursday, July 05, 2012 11:21 AM
> : u2-users@listserver.u2ug.org
> bject: Re: [U2] RESIZE - dynamic files
> ick,
> ou are correct, I should be using the smaller size (I just haven't changed it 
> t). Based on the reading I have done you should
> ly use the

Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Wols Lists
On 05/07/12 23:58, Rick Nuckolls wrote:
> Oops, I would of thought that if a file had, say 100,000 bytes, @ 70 percent 
> full, there would be 30,000 bytes "empty" or dead. Are you suggesting the 
> there would be 70,000 bytes of data and 42,000 bytes of dead space?

Do you mean 100,000 bytes of disk space, or 100,000 bytes of data?

I guess you are thinking that the file occupies, on disk, 100K. In which
case it will have 70K of data and 30K of empty space.

But if you are thinking that the file contains 100K of data, it will
actually occupy 142K of disk space.

So this particular option "wastes" 30% of the disk space it uses, but
uses 42% extra space to what it optimally needed assuming perfect packing.

I know, it's fun to try get your head round it :-)

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-06 Thread Israel, John R.
The best thing I can say about the MINIMUM.MODULUS is that if you set it close 
to what you expect the file to need (at least for a while), when you start 
populating it from scratch, you will not lose the performance as the file keeps 
splitting.  This can make an amazing difference in how long it take to 
initially populate the file.

John

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Charles Stevenson
Sent: Thursday, July 05, 2012 5:41 PM
To: U2 Users List
Subject: Re: [U2] RESIZE - dynamic files

Chris,

I can appreciate what you are doing as an academic exercise.

You seem happy how it looks at this moment, where, because you set 
"MINIMUM.MODULUS  118681", you ended up with a current load of 63%.
But think about it:  as you add records, the load will reach 70%, per 
"SPLIT.LOAD 70",  then splits will keep occuring and current modlus with grow 
past 118681.  MINIMUM.MODULUS will never matter again.  (This was described as 
an ever-growing file.)

If the current config is what you want, why not just set "SPLIT.LOAD 63  
MINIMUM.MODULUS 1".   That way the ratio that you like today will stay 
like this forever.

MINIMUM.MODULUS will not matter unless data is deleted.  It says to not shrink 
the file structure below that minimally allocated disk space, even if there is 
no data to occupy it.  That's really all MINIMUM.MODULUS is for.

Play with it all you want, because it puts you in a good place when some crisis 
happens.  At the end of the day, with this file, you'll find your tuning won't 
matter much.  Not a lot of help, but not much harm if you tweak it wrong, 
either.


On 7/5/2012 1:20 PM, Chris Austin wrote:
> Rick,
>
> You are correct, I should be using the smaller size (I just haven't 
> changed it yet). Based on the reading I have done you should only use the 
> larger group size when the average record size is greater than 1000 bytes.
>
> As far as being better off with the defaults that's basically what I'm 
> trying to test (as well as learn how linear hashing works). I was able 
> to reduce my overflow by 18% and I only increased my empty groups by a very 
> small amount as well as only increased my file size by 8%. This in theory 
> should be better for reads/writes than what I had before.
>
> To test the performance I need to write a ton of records and then capture the 
> output and compare the output using timestamps.
>
> Chris

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

That's what we use, 'BASIC SELECT' statements for this table, looping through 
records to build reports. It's an accounting table that has about 200-300 
records WRITES a day, with an average
of about ~250 bytes per record. We obviously have more READ operations since we 
are always building up these reports so I was hoping my #'s looked right. 

1) I reduced overflow by 18%.
2) I only increased file size ~8%.

So we do a combination of BASIC SELECTS and WRITES. Everything is done in the 
latest version of Rocket's Universe, PICK using BASIC for our programs that 
contain the SELECTS.

Chris


> To: u2-users@listserver.u2ug.org
> From: wjhon...@aol.com
> Date: Thu, 5 Jul 2012 20:12:21 -0400
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> A BASIC SELECT cannot use criteria at all.
> It is going to walk through every record in the file, in order.
> And that's the sticky wicket. That whole "in order" business.
> The disk drive controller has no clue on linked frames, but it *will* do 
> optimistic look aheads for you.
> So you are much better off, for BASIC SELECTs having nothing in overflow, at 
> all. :)
> That way, when you go to ask for the *next* frame, it will always be 
> contiguous, and already sitting in memory.
> 
> 
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Rick Nuckolls 
> To: 'U2 Users List' 
> Sent: Thu, Jul 5, 2012 4:43 pm
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Most disks and disk systems cache huge amounts of information these days, 
> and, 
> epending on 20 factors or so, one solution will be better than another for a 
> iven file.
> For the wholesale, SELECT F WITH, The fewest disk records will almost 
> always 
> in. For files that have ~10 records/group and have ~10% of the groups 
> verflowed, then perhaps 1% of record reads will do a second read for the 
> verflow buffer because the target key was not in the primary group.  Writing 
> a 
> ew record would possibly hit the 10% mark for reading overflow buffers. But 
> owering the split.load will increase the number of splits slightly, and 
> ncrease the total number of groups considerably.  What you have shown is that 
> ou need to increase the the modulus (and select time) of a large file more 
> than 
> 0% in order to decrease the read and update times for you records 0.5% of the 
> ime (assuming, that you have only reduced the number of overflow groups by 
> 50%.)
> As Charles suggests, this is an interesting exercise, but your actual results 
> ill rapidly change if you actually add /remove records from your file, change 
> he load or number of files on your system, put in a new drive, cpu, memory 
> oard, or install a new release of Universe, move to raid, etc.
> -Rick
> -Original Message-
> rom: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Wjhonson
> ent: Thursday, July 05, 2012 2:38 PM
> o: u2-users@listserver.u2ug.org
> ubject: Re: [U2] RESIZE - dynamic files
> 
> he hardward "look ahead" of the disk drive reader will grab consecutive 
> frames" into memory, since it assumes you'll want the "next" frame next.
> o the less overflow you have, the faster a full file scan will become.
> t least that's my theory ;)
> 
> 
> Original Message-
> rom: Rick Nuckolls 
> o: 'U2 Users List' 
> ent: Thu, Jul 5, 2012 2:29 pm
> ubject: Re: [U2] RESIZE - dynamic files
> 
> hris,
> or the type of use that you described earlier; BASIC selects and reads, 
> ducing overflow will have negligible performance benefit, especially compared 
>  changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
> latively small percentages, then it will never merge anyway (because you will 
> ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
>  your optimum minimum modulus solution will probably be "how ever large it 
> ows"  The overhead for a group split is not as bad as it sounds unless your 
> dates/sec count is extremely high, such as during a copy.
> f you do regular SELECT and SCANS of the entire file, then your goal should 
> be 
>  reduce the total disk size of the file, and not worry much about common 
> erflow. The important thing is that the file is dynamic, so you will never 
> counter the issues that undersized statically hashed files develop.
> e have thousands of dynamically hashed files on our (Solaris) systems, with 
> an 
> tremely low problem rate.
> ick
> Original Message-
> om: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] 
> n Behalf Of Chris Austin
> nt: Th

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
This will be mostly true if the full extent of the file was allocated at one 
time as a contiguous block, which could be a big plus.
As a file grows, sectors will be allocated piecemeal and when the hardware 
reads ahead, it will not necessarily be reading sectors in the same file.
Curiously, an old Pr1me CAM file had a trick around this, though it was late 
coming onto the scene.  Unix also has a few tricks, but they are only partial 
solutions to file fragmentation.  And Windows

Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson
Sent: Thursday, July 05, 2012 5:12 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


A BASIC SELECT cannot use criteria at all.
It is going to walk through every record in the file, in order.
And that's the sticky wicket. That whole "in order" business.
The disk drive controller has no clue on linked frames, but it *will* do 
optimistic look aheads for you.
So you are much better off, for BASIC SELECTs having nothing in overflow, at 
all. :)
That way, when you go to ask for the *next* frame, it will always be 
contiguous, and already sitting in memory.








-Original Message-
From: Rick Nuckolls 
To: 'U2 Users List' 
Sent: Thu, Jul 5, 2012 4:43 pm
Subject: Re: [U2] RESIZE - dynamic files


Most disks and disk systems cache huge amounts of information these days, and, 
epending on 20 factors or so, one solution will be better than another for a 
iven file.
For the wholesale, SELECT F WITH, The fewest disk records will almost 
always 
in. For files that have ~10 records/group and have ~10% of the groups 
verflowed, then perhaps 1% of record reads will do a second read for the 
verflow buffer because the target key was not in the primary group.  Writing a 
ew record would possibly hit the 10% mark for reading overflow buffers. But 
owering the split.load will increase the number of splits slightly, and 
ncrease the total number of groups considerably.  What you have shown is that 
ou need to increase the the modulus (and select time) of a large file more than 
0% in order to decrease the read and update times for you records 0.5% of the 
ime (assuming, that you have only reduced the number of overflow groups by 
50%.)
As Charles suggests, this is an interesting exercise, but your actual results 
ill rapidly change if you actually add /remove records from your file, change 
he load or number of files on your system, put in a new drive, cpu, memory 
oard, or install a new release of Universe, move to raid, etc.
-Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Wjhonson
ent: Thursday, July 05, 2012 2:38 PM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

he hardward "look ahead" of the disk drive reader will grab consecutive 
frames" into memory, since it assumes you'll want the "next" frame next.
o the less overflow you have, the faster a full file scan will become.
t least that's my theory ;)


Original Message-
rom: Rick Nuckolls 
o: 'U2 Users List' 
ent: Thu, Jul 5, 2012 2:29 pm
ubject: Re: [U2] RESIZE - dynamic files

hris,
or the type of use that you described earlier; BASIC selects and reads, 
ducing overflow will have negligible performance benefit, especially compared 
 changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
latively small percentages, then it will never merge anyway (because you will 
ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
 your optimum minimum modulus solution will probably be "how ever large it 
ows"  The overhead for a group split is not as bad as it sounds unless your 
dates/sec count is extremely high, such as during a copy.
f you do regular SELECT and SCANS of the entire file, then your goal should be 
 reduce the total disk size of the file, and not worry much about common 
erflow. The important thing is that the file is dynamic, so you will never 
counter the issues that undersized statically hashed files develop.
e have thousands of dynamically hashed files on our (Solaris) systems, with an 
tremely low problem rate.
ick
Original Message-
om: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
nt: Thursday, July 05, 2012 11:21 AM
: u2-users@listserver.u2ug.org
bject: Re: [U2] RESIZE - dynamic files
ick,
ou are correct, I should be using the smaller size (I just haven't changed it 
t). Based on the reading I have done you should
ly use the larger group size when the average record size is greater than 1000 
tes. 
s far as being better off with the defaults that's basically what I'm trying to 
est (as well as learn how linear hashing works). I was able
 reduce my overflow by 1

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wjhonson

A BASIC SELECT cannot use criteria at all.
It is going to walk through every record in the file, in order.
And that's the sticky wicket. That whole "in order" business.
The disk drive controller has no clue on linked frames, but it *will* do 
optimistic look aheads for you.
So you are much better off, for BASIC SELECTs having nothing in overflow, at 
all. :)
That way, when you go to ask for the *next* frame, it will always be 
contiguous, and already sitting in memory.








-Original Message-
From: Rick Nuckolls 
To: 'U2 Users List' 
Sent: Thu, Jul 5, 2012 4:43 pm
Subject: Re: [U2] RESIZE - dynamic files


Most disks and disk systems cache huge amounts of information these days, and, 
epending on 20 factors or so, one solution will be better than another for a 
iven file.
For the wholesale, SELECT F WITH, The fewest disk records will almost 
always 
in. For files that have ~10 records/group and have ~10% of the groups 
verflowed, then perhaps 1% of record reads will do a second read for the 
verflow buffer because the target key was not in the primary group.  Writing a 
ew record would possibly hit the 10% mark for reading overflow buffers. But 
owering the split.load will increase the number of splits slightly, and 
ncrease the total number of groups considerably.  What you have shown is that 
ou need to increase the the modulus (and select time) of a large file more than 
0% in order to decrease the read and update times for you records 0.5% of the 
ime (assuming, that you have only reduced the number of overflow groups by 
50%.)
As Charles suggests, this is an interesting exercise, but your actual results 
ill rapidly change if you actually add /remove records from your file, change 
he load or number of files on your system, put in a new drive, cpu, memory 
oard, or install a new release of Universe, move to raid, etc.
-Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Wjhonson
ent: Thursday, July 05, 2012 2:38 PM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

he hardward "look ahead" of the disk drive reader will grab consecutive 
frames" into memory, since it assumes you'll want the "next" frame next.
o the less overflow you have, the faster a full file scan will become.
t least that's my theory ;)


Original Message-
rom: Rick Nuckolls 
o: 'U2 Users List' 
ent: Thu, Jul 5, 2012 2:29 pm
ubject: Re: [U2] RESIZE - dynamic files

hris,
or the type of use that you described earlier; BASIC selects and reads, 
ducing overflow will have negligible performance benefit, especially compared 
 changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
latively small percentages, then it will never merge anyway (because you will 
ed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
 your optimum minimum modulus solution will probably be "how ever large it 
ows"  The overhead for a group split is not as bad as it sounds unless your 
dates/sec count is extremely high, such as during a copy.
f you do regular SELECT and SCANS of the entire file, then your goal should be 
 reduce the total disk size of the file, and not worry much about common 
erflow. The important thing is that the file is dynamic, so you will never 
counter the issues that undersized statically hashed files develop.
e have thousands of dynamically hashed files on our (Solaris) systems, with an 
tremely low problem rate.
ick
Original Message-
om: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
nt: Thursday, July 05, 2012 11:21 AM
: u2-users@listserver.u2ug.org
bject: Re: [U2] RESIZE - dynamic files
ick,
ou are correct, I should be using the smaller size (I just haven't changed it 
t). Based on the reading I have done you should
ly use the larger group size when the average record size is greater than 1000 
tes. 
s far as being better off with the defaults that's basically what I'm trying to 
est (as well as learn how linear hashing works). I was able
 reduce my overflow by 18% and I only increased my empty groups by a very 
all amount as well as only increased my file size
 8%. This in theory should be better for reads/writes than what I had before. 
o test the performance I need to write a ton of records and then capture the 
tput and compare the output using timestamps. 
hris
 From: r...@lynden.com
To: u2-users@listserver.u2ug.org
Date: Thu, 5 Jul 2012 09:22:02 -0700
Subject: Re: [U2] RESIZE - dynamic files

Chis,

I still am wondering what is prompting you to continue using the larger group 
ze.

I think that Martin, and the UV documentation is correct in this case; you 
uld be as well or better off with the defaults.

-Rick

On Jul 5, 2012, at 9:13 AM, "Martin Phillips"  
ote:
coming
> Hi,
> 

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
Most disks and disk systems cache huge amounts of information these days, and, 
depending on 20 factors or so, one solution will be better than another for a 
given file.

For the wholesale, SELECT F WITH, The fewest disk records will almost 
always win. For files that have ~10 records/group and have ~10% of the groups 
overflowed, then perhaps 1% of record reads will do a second read for the 
overflow buffer because the target key was not in the primary group.  Writing a 
new record would possibly hit the 10% mark for reading overflow buffers. But 
lowering the split.load will increase the number of splits slightly, and 
increase the total number of groups considerably.  What you have shown is that 
you need to increase the the modulus (and select time) of a large file more 
than 10% in order to decrease the read and update times for you records 0.5% of 
the time (assuming, that you have only reduced the number of overflow groups by 
~50%.)

As Charles suggests, this is an interesting exercise, but your actual results 
will rapidly change if you actually add /remove records from your file, change 
the load or number of files on your system, put in a new drive, cpu, memory 
board, or install a new release of Universe, move to raid, etc.

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson
Sent: Thursday, July 05, 2012 2:38 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


The hardward "look ahead" of the disk drive reader will grab consecutive 
"frames" into memory, since it assumes you'll want the "next" frame next.
So the less overflow you have, the faster a full file scan will become.
At least that's my theory ;)




-Original Message-
From: Rick Nuckolls 
To: 'U2 Users List' 
Sent: Thu, Jul 5, 2012 2:29 pm
Subject: Re: [U2] RESIZE - dynamic files


Chris,
For the type of use that you described earlier; BASIC selects and reads, 
educing overflow will have negligible performance benefit, especially compared 
o changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
elatively small percentages, then it will never merge anyway (because you will 
eed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
o your optimum minimum modulus solution will probably be "how ever large it 
rows"  The overhead for a group split is not as bad as it sounds unless your 
pdates/sec count is extremely high, such as during a copy.
If you do regular SELECT and SCANS of the entire file, then your goal should be 
o reduce the total disk size of the file, and not worry much about common 
verflow. The important thing is that the file is dynamic, so you will never 
ncounter the issues that undersized statically hashed files develop.
We have thousands of dynamically hashed files on our (Solaris) systems, with an 
xtremely low problem rate.
Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
ent: Thursday, July 05, 2012 11:21 AM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

ick,
You are correct, I should be using the smaller size (I just haven't changed it 
et). Based on the reading I have done you should
nly use the larger group size when the average record size is greater than 1000 
ytes. 
As far as being better off with the defaults that's basically what I'm trying 
to 
est (as well as learn how linear hashing works). I was able
o reduce my overflow by 18% and I only increased my empty groups by a very 
mall amount as well as only increased my file size
y 8%. This in theory should be better for reads/writes than what I had before. 
To test the performance I need to write a ton of records and then capture the 
utput and compare the output using timestamps. 
Chris

 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Thu, 5 Jul 2012 09:22:02 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 Chis,
 
 I still am wondering what is prompting you to continue using the larger group 
ize.
 
 I think that Martin, and the UV documentation is correct in this case; you 
ould be as well or better off with the defaults.
 
 -Rick
 
 On Jul 5, 2012, at 9:13 AM, "Martin Phillips"  
rote:
 coming
 > Hi,
 > 
 > The various suggestions about setting the minimum modulus to reduce overflow 
re all very well but effectively you are turning a
 > dynamic file into a static one, complete with all the continual maintenance 
ork needed to keep the parameters in step with the
 > data.
 > 
 > In most cases, the only parameter that is worth tuning is the group size to 
ry to pack things nicely. Even this is often fine left
 > alone though getting it to match the underlying o/s page size is helpful.
 > 
 > I missed the start of this thread but

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
Oops, I would of thought that if a file had, say 100,000 bytes, @ 70 percent 
full, there would be 30,000 bytes "empty" or dead. Are you suggesting the there 
would be 70,000 bytes of data and 42,000 bytes of dead space?

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists
Sent: Thursday, July 05, 2012 3:24 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files

On 05/07/12 16:12, Martin Phillips wrote:
> A file without overflow is not necessarily the best solution. Winding the 
> split load down to 70% means that at least 30% of the file
> is dead space. The implication of this is that the file is larger and will 
> take more disk reads to process sequentially from one end
> to the other.

Whoops Martin, I think you've made the classic percentages mistake here ...

The file is 30/70, or 42% dead space at least. A file with the default
80% split is at least 25% dead space.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wols Lists
On 05/07/12 14:49, Chris Austin wrote:
> 
> Disk space is not a factor, as we are a smaller shop and disk space comes 
> cheap. However, one thing I did notice is when I increased the modulus to a 
> very large
> number which then increased my disk space to about 3-4x of my record data, my 
> SELECT queries were slower. 
> 
> Are the 2 factors when choosing HOW the file is used based on whether your 
> using?
> 
> 1) a lot of SELECTS (then looping through the records) 

Is that a BASIC select, or a RETRIEVE select?

> 2) grabbing individual records (not using a SELECT)
> 
> With this file we really do a lot of SELECTS (option 1), then loop through 
> the records. With that being said and based on the reading I've done here it 
> would appear it's better to have a little overflow
> and not use up so much disk space for modulus (groups) for this application 
> since we do use a lot of SELECT queries. Is this correct?

If your selects are BASIC selects, then you won't notice too much
difference. If they are RETRIEVE selects, then reducing SPLIT will
increase the cost of the SELECT.

In both cases, if the RETRIEVE select is not BY, then the cost of
processing the list should not be seriously impacted.

(On a SELECT WITH index, however, reducing overflow will speed things up
a bit, probably not an awful lot.)
> 
> Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 
> bytes'. 
> 
> It would seem to me that I would want to REDUCE my split to ~70% to reduce 
> overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger 
> than my current modulus (~10% bigger) since this
> will be a growing file and will never merge. In my case using the formula 
> might not make sense since this file will never merge. Does this make sense?
> 
If the file will only ever grow, then MINIMUM.MODULUS is probably a
waste of time. You are best using that in one of two circumstances,
either (a) you are populating a file with a large number of initial
records and you are forcing the modulus to what it's likely to end up
anyway, or (b) your file grows and shrinks violently in size, and you
are forcing it to its typical state.

The first scenario simply avoids a bunch of inevitable splits, the
second avoids a yoyo split/merge/split scenario.

I'd just leave the settings at 80/20, and only use MINIMUM.MODULUS if I
was creating a copy of the file (setting the new minimum at the current
modulo of the existing file).

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wols Lists
On 05/07/12 16:12, Martin Phillips wrote:
> A file without overflow is not necessarily the best solution. Winding the 
> split load down to 70% means that at least 30% of the file
> is dead space. The implication of this is that the file is larger and will 
> take more disk reads to process sequentially from one end
> to the other.

Whoops Martin, I think you've made the classic percentages mistake here ...

The file is 30/70, or 42% dead space at least. A file with the default
80% split is at least 25% dead space.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Charles Stevenson

Chris,

I can appreciate what you are doing as an academic exercise.

You seem happy how it looks at this moment, where, because you set  
"MINIMUM.MODULUS  118681", you ended up with a current load of 63%.
But think about it:  as you add records, the load will reach 70%, per 
"SPLIT.LOAD 70",  then splits will keep occuring and current modlus with 
grow past 118681.  MINIMUM.MODULUS will never matter again.  (This was 
described as an ever-growing file.)


If the current config is what you want, why not just set "SPLIT.LOAD 63  
MINIMUM.MODULUS 1".   That way the ratio that you like today will stay 
like this forever.


MINIMUM.MODULUS will not matter unless data is deleted.  It says to not 
shrink the file structure below that minimally allocated disk space, 
even if there is no data to occupy it.  That's really all 
MINIMUM.MODULUS is for.


Play with it all you want, because it puts you in a good place when some 
crisis happens.  At the end of the day, with this file, you'll find your 
tuning won't matter much.  Not a lot of help, but not much harm if you 
tweak it wrong, either.



On 7/5/2012 1:20 PM, Chris Austin wrote:

Rick,

You are correct, I should be using the smaller size (I just haven't changed it 
yet). Based on the reading I have done you should
only use the larger group size when the average record size is greater than 
1000 bytes.

As far as being better off with the defaults that's basically what I'm trying 
to test (as well as learn how linear hashing works). I was able
to reduce my overflow by 18% and I only increased my empty groups by a very 
small amount as well as only increased my file size
by 8%. This in theory should be better for reads/writes than what I had before.

To test the performance I need to write a ton of records and then capture the 
output and compare the output using timestamps.

Chris


___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Wjhonson

The hardward "look ahead" of the disk drive reader will grab consecutive 
"frames" into memory, since it assumes you'll want the "next" frame next.
So the less overflow you have, the faster a full file scan will become.
At least that's my theory ;)




-Original Message-
From: Rick Nuckolls 
To: 'U2 Users List' 
Sent: Thu, Jul 5, 2012 2:29 pm
Subject: Re: [U2] RESIZE - dynamic files


Chris,
For the type of use that you described earlier; BASIC selects and reads, 
educing overflow will have negligible performance benefit, especially compared 
o changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
elatively small percentages, then it will never merge anyway (because you will 
eed to delete 20-30% of the file for that to happen with the mergeload at 50%, 
o your optimum minimum modulus solution will probably be "how ever large it 
rows"  The overhead for a group split is not as bad as it sounds unless your 
pdates/sec count is extremely high, such as during a copy.
If you do regular SELECT and SCANS of the entire file, then your goal should be 
o reduce the total disk size of the file, and not worry much about common 
verflow. The important thing is that the file is dynamic, so you will never 
ncounter the issues that undersized statically hashed files develop.
We have thousands of dynamically hashed files on our (Solaris) systems, with an 
xtremely low problem rate.
Rick
-Original Message-
rom: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
ent: Thursday, July 05, 2012 11:21 AM
o: u2-users@listserver.u2ug.org
ubject: Re: [U2] RESIZE - dynamic files

ick,
You are correct, I should be using the smaller size (I just haven't changed it 
et). Based on the reading I have done you should
nly use the larger group size when the average record size is greater than 1000 
ytes. 
As far as being better off with the defaults that's basically what I'm trying 
to 
est (as well as learn how linear hashing works). I was able
o reduce my overflow by 18% and I only increased my empty groups by a very 
mall amount as well as only increased my file size
y 8%. This in theory should be better for reads/writes than what I had before. 
To test the performance I need to write a ton of records and then capture the 
utput and compare the output using timestamps. 
Chris

 From: r...@lynden.com
 To: u2-users@listserver.u2ug.org
 Date: Thu, 5 Jul 2012 09:22:02 -0700
 Subject: Re: [U2] RESIZE - dynamic files
 
 Chis,
 
 I still am wondering what is prompting you to continue using the larger group 
ize.
 
 I think that Martin, and the UV documentation is correct in this case; you 
ould be as well or better off with the defaults.
 
 -Rick
 
 On Jul 5, 2012, at 9:13 AM, "Martin Phillips"  
rote:
 coming
 > Hi,
 > 
 > The various suggestions about setting the minimum modulus to reduce overflow 
re all very well but effectively you are turning a
 > dynamic file into a static one, complete with all the continual maintenance 
ork needed to keep the parameters in step with the
 > data.
 > 
 > In most cases, the only parameter that is worth tuning is the group size to 
ry to pack things nicely. Even this is often fine left
 > alone though getting it to match the underlying o/s page size is helpful.
 > 
 > I missed the start of this thread but, unless you have a performance problem 
r are seriously short of space, my recommendation
 > would be to leave the dynamic files to look after themselves.
 > 
 > A file without overflow is not necessarily the best solution. Winding the 
plit load down to 70% means that at least 30% of the file
 > is dead space. The implication of this is that the file is larger and will 
ake more disk reads to process sequentially from one end
 > to the other.
 > 
 > 
 > Martin Phillips
 > Ladybridge Systems Ltd
 > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
 > +44 (0)1604-709200
 > 
 > 
 > 
 > -Original Message-
 > From: u2-users-boun...@listserver.u2ug.org 
 > [mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
 > Sent: 05 July 2012 15:19
 > To: u2-users@listserver.u2ug.org
 > Subject: Re: [U2] RESIZE - dynamic files
 > 
 > 
 > I was able to drop from 30% overflow to 12% by making 2 changes:
 > 
 > 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
 > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
ata + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
 > 
 > My disk size only went up 8%..
 > 
 > My file looks like this now:
 > 
 > File name ..   GENACCTRN_POSTED
 > Pathname ...   GENACCTRN_POSTED
 > File type ..   DYNAMIC
 > File style and revision    32B

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
Chris,

For the type of use that you described earlier; BASIC selects and reads, 
reducing overflow will have negligible performance benefit, especially compared 
to changing the GROUP.SIZE back to 1 (2048) bytes.  If you purge the file in 
relatively small percentages, then it will never merge anyway (because you will 
need to delete 20-30% of the file for that to happen with the mergeload at 50%, 
so your optimum minimum modulus solution will probably be "how ever large it 
grows"  The overhead for a group split is not as bad as it sounds unless your 
updates/sec count is extremely high, such as during a copy.

If you do regular SELECT and SCANS of the entire file, then your goal should be 
to reduce the total disk size of the file, and not worry much about common 
overflow. The important thing is that the file is dynamic, so you will never 
encounter the issues that undersized statically hashed files develop.

We have thousands of dynamically hashed files on our (Solaris) systems, with an 
extremely low problem rate.

Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Thursday, July 05, 2012 11:21 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Rick,

You are correct, I should be using the smaller size (I just haven't changed it 
yet). Based on the reading I have done you should
only use the larger group size when the average record size is greater than 
1000 bytes. 

As far as being better off with the defaults that's basically what I'm trying 
to test (as well as learn how linear hashing works). I was able
to reduce my overflow by 18% and I only increased my empty groups by a very 
small amount as well as only increased my file size
by 8%. This in theory should be better for reads/writes than what I had before. 

To test the performance I need to write a ton of records and then capture the 
output and compare the output using timestamps. 

Chris


> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Thu, 5 Jul 2012 09:22:02 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Chis,
> 
> I still am wondering what is prompting you to continue using the larger group 
> size.
> 
> I think that Martin, and the UV documentation is correct in this case; you 
> would be as well or better off with the defaults.
> 
> -Rick
> 
> On Jul 5, 2012, at 9:13 AM, "Martin Phillips"  
> wrote:
> coming
> > Hi,
> > 
> > The various suggestions about setting the minimum modulus to reduce 
> > overflow are all very well but effectively you are turning a
> > dynamic file into a static one, complete with all the continual maintenance 
> > work needed to keep the parameters in step with the
> > data.
> > 
> > In most cases, the only parameter that is worth tuning is the group size to 
> > try to pack things nicely. Even this is often fine left
> > alone though getting it to match the underlying o/s page size is helpful.
> > 
> > I missed the start of this thread but, unless you have a performance 
> > problem or are seriously short of space, my recommendation
> > would be to leave the dynamic files to look after themselves.
> > 
> > A file without overflow is not necessarily the best solution. Winding the 
> > split load down to 70% means that at least 30% of the file
> > is dead space. The implication of this is that the file is larger and will 
> > take more disk reads to process sequentially from one end
> > to the other.
> > 
> > 
> > Martin Phillips
> > Ladybridge Systems Ltd
> > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> > +44 (0)1604-709200
> > 
> > 
> > 
> > -Original Message-
> > From: u2-users-boun...@listserver.u2ug.org 
> > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > Sent: 05 July 2012 15:19
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > I was able to drop from 30% overflow to 12% by making 2 changes:
> > 
> > 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
> > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
> > data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
> > 
> > My disk size only went up 8%..
> > 
> > My file looks like this now:
> > 
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > N

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

Rick,

You are correct, I should be using the smaller size (I just haven't changed it 
yet). Based on the reading I have done you should
only use the larger group size when the average record size is greater than 
1000 bytes. 

As far as being better off with the defaults that's basically what I'm trying 
to test (as well as learn how linear hashing works). I was able
to reduce my overflow by 18% and I only increased my empty groups by a very 
small amount as well as only increased my file size
by 8%. This in theory should be better for reads/writes than what I had before. 

To test the performance I need to write a ton of records and then capture the 
output and compare the output using timestamps. 

Chris


> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Thu, 5 Jul 2012 09:22:02 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Chis,
> 
> I still am wondering what is prompting you to continue using the larger group 
> size.
> 
> I think that Martin, and the UV documentation is correct in this case; you 
> would be as well or better off with the defaults.
> 
> -Rick
> 
> On Jul 5, 2012, at 9:13 AM, "Martin Phillips"  
> wrote:
> coming
> > Hi,
> > 
> > The various suggestions about setting the minimum modulus to reduce 
> > overflow are all very well but effectively you are turning a
> > dynamic file into a static one, complete with all the continual maintenance 
> > work needed to keep the parameters in step with the
> > data.
> > 
> > In most cases, the only parameter that is worth tuning is the group size to 
> > try to pack things nicely. Even this is often fine left
> > alone though getting it to match the underlying o/s page size is helpful.
> > 
> > I missed the start of this thread but, unless you have a performance 
> > problem or are seriously short of space, my recommendation
> > would be to leave the dynamic files to look after themselves.
> > 
> > A file without overflow is not necessarily the best solution. Winding the 
> > split load down to 70% means that at least 30% of the file
> > is dead space. The implication of this is that the file is larger and will 
> > take more disk reads to process sequentially from one end
> > to the other.
> > 
> > 
> > Martin Phillips
> > Ladybridge Systems Ltd
> > 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> > +44 (0)1604-709200
> > 
> > 
> > 
> > -Original Message-
> > From: u2-users-boun...@listserver.u2ug.org 
> > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > Sent: 05 July 2012 15:19
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > I was able to drop from 30% overflow to 12% by making 2 changes:
> > 
> > 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
> > 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
> > data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
> > 
> > My disk size only went up 8%..
> > 
> > My file looks like this now:
> > 
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
> >14431 overflowed, 778 badly )
> > Number of records ..   1292377
> > Large record size ..   3267 bytes
> > Number of large records    180
> > Group size .   4096 bytes
> > Load factors ...   70% (split), 50% (merge) and 63% (actual)
> > Total size .   546869248 bytes
> > Total size of record data ..   287789178 bytes
> > Total size of record IDs ...   21539538 bytes
> > Unused space ...   237532340 bytes
> > Total space for records    546861056 bytes
> > 
> > Chris
> > 
> > 
> > 
> >> From: keith.john...@datacom.co.nz
> >> To: u2-users@listserver.u2ug.org
> >> Date: Wed, 4 Jul 2012 14:05:02 +1200
> >> Subject: Re: [U2] RESIZE - dynamic files
> >> 
> >> Doug may have had a key bounce in his input
> >> 
> >>> Let's do the math:
> >>> 
> >>> 258687736 (Record Size)
> >>> 192283300 (Key Size)
> >>> 
> >> 
> >> The key size is actually 19283300 in Chris' 

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Rick Nuckolls
Chis,

I still am wondering what is prompting you to continue using the larger group 
size.

I think that Martin, and the UV documentation is correct in this case; you 
would be as well or better off with the defaults.

-Rick

On Jul 5, 2012, at 9:13 AM, "Martin Phillips"  
wrote:
coming
> Hi,
> 
> The various suggestions about setting the minimum modulus to reduce overflow 
> are all very well but effectively you are turning a
> dynamic file into a static one, complete with all the continual maintenance 
> work needed to keep the parameters in step with the
> data.
> 
> In most cases, the only parameter that is worth tuning is the group size to 
> try to pack things nicely. Even this is often fine left
> alone though getting it to match the underlying o/s page size is helpful.
> 
> I missed the start of this thread but, unless you have a performance problem 
> or are seriously short of space, my recommendation
> would be to leave the dynamic files to look after themselves.
> 
> A file without overflow is not necessarily the best solution. Winding the 
> split load down to 70% means that at least 30% of the file
> is dead space. The implication of this is that the file is larger and will 
> take more disk reads to process sequentially from one end
> to the other.
> 
> 
> Martin Phillips
> Ladybridge Systems Ltd
> 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> +44 (0)1604-709200
> 
> 
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: 05 July 2012 15:19
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> I was able to drop from 30% overflow to 12% by making 2 changes:
> 
> 1) changed the split from 80% to 70% (that alone reduce 10% overflow)
> 2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
> data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )
> 
> My disk size only went up 8%..
> 
> My file looks like this now:
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
>14431 overflowed, 778 badly )
> Number of records ..   1292377
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   70% (split), 50% (merge) and 63% (actual)
> Total size .   546869248 bytes
> Total size of record data ..   287789178 bytes
> Total size of record IDs ...   21539538 bytes
> Unused space ...   237532340 bytes
> Total space for records    546861056 bytes
> 
> Chris
> 
> 
> 
>> From: keith.john...@datacom.co.nz
>> To: u2-users@listserver.u2ug.org
>> Date: Wed, 4 Jul 2012 14:05:02 +1200
>> Subject: Re: [U2] RESIZE - dynamic files
>> 
>> Doug may have had a key bounce in his input
>> 
>>> Let's do the math:
>>> 
>>> 258687736 (Record Size)
>>> 192283300 (Key Size)
>>> 
>> 
>> The key size is actually 19283300 in Chris' figures
>> 
>> Regarding 68,063 being less than the current modulus of 82,850.  I think the 
>> answer may lie in the splitting process.
>> 
>> As I understand it, the first time a split occurs group 1 is split and its 
>> contents are split between new group 1 and new group 2.
> All the other groups effectively get 1 added to their number. The next split 
> is group 3 (which was 2) into 3 and 4 and so forth. A
> pointer is kept to say where the next split will take place and also to help 
> sort out how to adjust the algorithm to identify which
> group matches a given key.
>> 
>> Based on this, if you started with 1000 groups, by the time you have split 
>> the 500th time you will have 1500 groups.  The first
> 1000 will be relatively empty, the last 500 will probably be overflowed, but 
> not terribly badly.  By the time you get to the 1000th
> split, you will have 2000 groups and they will, one hopes, be quite 
> reasonably spread with very little overflow.
>> 
>> So I expect the average access times would drift up and down in a cycle.  
>> The cycle time would get longer as the file gets bigger
> but the worst time would be roughly the the same each cycle.
>> 
>> Given the power of two introduced into the algorithm by the before/af

Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Martin Phillips
Hi,

The various suggestions about setting the minimum modulus to reduce overflow 
are all very well but effectively you are turning a
dynamic file into a static one, complete with all the continual maintenance 
work needed to keep the parameters in step with the
data.

In most cases, the only parameter that is worth tuning is the group size to try 
to pack things nicely. Even this is often fine left
alone though getting it to match the underlying o/s page size is helpful.

I missed the start of this thread but, unless you have a performance problem or 
are seriously short of space, my recommendation
would be to leave the dynamic files to look after themselves.

A file without overflow is not necessarily the best solution. Winding the split 
load down to 70% means that at least 30% of the file
is dead space. The implication of this is that the file is larger and will take 
more disk reads to process sequentially from one end
to the other.


Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
+44 (0)1604-709200



-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 05 July 2012 15:19
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


I was able to drop from 30% overflow to 12% by making 2 changes:

1) changed the split from 80% to 70% (that alone reduce 10% overflow)
2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )

My disk size only went up 8%..

My file looks like this now:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
14431 overflowed, 778 badly )
Number of records ..   1292377
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   70% (split), 50% (merge) and 63% (actual)
Total size .   546869248 bytes
Total size of record data ..   287789178 bytes
Total size of record IDs ...   21539538 bytes
Unused space ...   237532340 bytes
Total space for records    546861056 bytes

Chris



> From: keith.john...@datacom.co.nz
> To: u2-users@listserver.u2ug.org
> Date: Wed, 4 Jul 2012 14:05:02 +1200
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Doug may have had a key bounce in his input
> 
> > Let's do the math:
> >
> > 258687736 (Record Size)
> > 192283300 (Key Size)
> > 
> 
> The key size is actually 19283300 in Chris' figures
> 
> Regarding 68,063 being less than the current modulus of 82,850.  I think the 
> answer may lie in the splitting process.
> 
> As I understand it, the first time a split occurs group 1 is split and its 
> contents are split between new group 1 and new group 2.
All the other groups effectively get 1 added to their number. The next split is 
group 3 (which was 2) into 3 and 4 and so forth. A
pointer is kept to say where the next split will take place and also to help 
sort out how to adjust the algorithm to identify which
group matches a given key.
> 
> Based on this, if you started with 1000 groups, by the time you have split 
> the 500th time you will have 1500 groups.  The first
1000 will be relatively empty, the last 500 will probably be overflowed, but 
not terribly badly.  By the time you get to the 1000th
split, you will have 2000 groups and they will, one hopes, be quite reasonably 
spread with very little overflow.
> 
> So I expect the average access times would drift up and down in a cycle.  The 
> cycle time would get longer as the file gets bigger
but the worst time would be roughly the the same each cycle.
> 
> Given the power of two introduced into the algorithm by the before/after the 
> split thing, I wonder if there is such a need to
start off with a prime?
> 
> Regards, Keith
> 
> PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

I was able to drop from 30% overflow to 12% by making 2 changes:

1) changed the split from 80% to 70% (that alone reduce 10% overflow)
2) changed the MINIMUM.MODULUS to 118,681 (calculated this way -> [ (record 
data + id) * 1.1 * 1.42857 (70% split load)] / 4096 )

My disk size only went up 8%..

My file looks like this now:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    118681 current ( minimum 118681, 140 empty,
14431 overflowed, 778 badly )
Number of records ..   1292377
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   70% (split), 50% (merge) and 63% (actual)
Total size .   546869248 bytes
Total size of record data ..   287789178 bytes
Total size of record IDs ...   21539538 bytes
Unused space ...   237532340 bytes
Total space for records    546861056 bytes

Chris



> From: keith.john...@datacom.co.nz
> To: u2-users@listserver.u2ug.org
> Date: Wed, 4 Jul 2012 14:05:02 +1200
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Doug may have had a key bounce in his input
> 
> > Let's do the math:
> >
> > 258687736 (Record Size)
> > 192283300 (Key Size)
> > 
> 
> The key size is actually 19283300 in Chris' figures
> 
> Regarding 68,063 being less than the current modulus of 82,850.  I think the 
> answer may lie in the splitting process.
> 
> As I understand it, the first time a split occurs group 1 is split and its 
> contents are split between new group 1 and new group 2. All the other groups 
> effectively get 1 added to their number. The next split is group 3 (which was 
> 2) into 3 and 4 and so forth. A pointer is kept to say where the next split 
> will take place and also to help sort out how to adjust the algorithm to 
> identify which group matches a given key.
> 
> Based on this, if you started with 1000 groups, by the time you have split 
> the 500th time you will have 1500 groups.  The first 1000 will be relatively 
> empty, the last 500 will probably be overflowed, but not terribly badly.  By 
> the time you get to the 1000th split, you will have 2000 groups and they 
> will, one hopes, be quite reasonably spread with very little overflow.
> 
> So I expect the average access times would drift up and down in a cycle.  The 
> cycle time would get longer as the file gets bigger but the worst time would 
> be roughly the the same each cycle.
> 
> Given the power of two introduced into the algorithm by the before/after the 
> split thing, I wonder if there is such a need to start off with a prime?
> 
> Regards, Keith
> 
> PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Chris Austin

Disk space is not a factor, as we are a smaller shop and disk space comes 
cheap. However, one thing I did notice is when I increased the modulus to a 
very large
number which then increased my disk space to about 3-4x of my record data, my 
SELECT queries were slower. 

Are the 2 factors when choosing HOW the file is used based on whether your 
using?

1) a lot of SELECTS (then looping through the records) 
2) grabbing individual records (not using a SELECT)

With this file we really do a lot of SELECTS (option 1), then loop through the 
records. With that being said and based on the reading I've done here it would 
appear it's better to have a little overflow
and not use up so much disk space for modulus (groups) for this application 
since we do use a lot of SELECT queries. Is this correct?

Most of my records are ~ 250 bytes, there's a handful that are 'up to 512 
bytes'. 

It would seem to me that I would want to REDUCE my split to ~70% to reduce 
overflow, and maybe increase my MINIMUM.MODULUS to a # a little bit bigger than 
my current modulus (~10% bigger) since this
will be a growing file and will never merge. In my case using the formula might 
not make sense since this file will never merge. Does this make sense?


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92903 current ( minimum 31, 87 empty,
28248 overflowed, 2510 badly )
Number of records ..   1292377
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   501219328 bytes
Total size of record data ..   287426366 bytes
Total size of record IDs ...   21539682 bytes
Unused space ...   192245088 bytes
Total space for records    501211136 bytes


With all that being said if I change the following:

1) SPLIT.LOAD to 70%
2) MINIMUM.MODULUS > 130,000

That's all I should really need to do to 'tweak' the performance of this file.. 
If this doesn't sound right I would be interested to hear how it should be 
tweaked instead. Thanks for all the help so far, I think
this is all starting to make sense.

Chris


> From: ro...@stamina.com.au
> To: u2-users@listserver.u2ug.org
> Date: Wed, 4 Jul 2012 01:36:26 +
> Subject: Re: [U2] RESIZE - dynamic files
> 
> I would suggest that then actual goal is to achieve maximum performance for 
> your system, so knowing HOW the file is used on a daily basis can also 
> influence decisions. Disk is a cheap commodity, so having some "wastage" in 
> file utilization shouldn't factor. 
> 
> 
> Ross Ferris
> Stamina Software
> Visage > Better by Design!

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-05 Thread Martin Phillips
Hi all,

 

> I wouldn't actually be surprised if QM is like PI.

 

Drifting away from U2 but the question was asked

 

The initial implementation of dynamic files in QM was fairly close to that of 
PI/open but it was totally reworked long before the
product went onto general release, resulting in some useful performance gains.

 

Like UV, a QM dynamic file is represented by a directory but the DATA.30 and 
OVER.30 items become %0 and %1. Further items may exist
to hold alternate key indices.

 

The underlying mechanism of dynamic files is common to PI, PI/open, UV and QM 
but UniData goes its own way. Although a couple of the
numbers have to be changed for UV, I think that the technical note at 
http://www.openqm.org/downloads/dynamic_files.pdf is largely
applicable to UV, at least in principle. There are some substantial differences 
in how the two products perform split/merge
operations, especially with regard to management of locking tables, but this is 
not the right forum to discuss this further.

 

Interestingly, the UV Internals course used to state that the dynamic file 
hashing algorithm was the same one as static file type
18. Experiments suggest that this is not true and it looks as though UV uses 
the same public domain hashing algorithm that we chose
for QM.

 

As a useful tip for users running UV (or QM) on Windows systems, getting the 
Windows memory caching parameters set correctly can
make a massive difference to performance.

 

 

Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
+44 (0)1604-709200

 

 

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Wols Lists
On 04/07/12 17:44, Charles Stevenson wrote:
>>SMAT -d  (or ANALYZE.SHM -d)   see uv/bin/smat[.exe]
> uv/bin/analyze.shm[.exe]
> 
> Dynamic Files:
> Slot #  Inode Device Ref Count Htype Split Merge Curmod Basemod
> Largerec   Filesp Selects Nextsplit
>  0 1285128087 209307792516208050 4001   
> 2048 3267  2782736   0  1954
>  1  153221440 151542860060208050 397040
> 262144 1628 58641084   0134897
>  2 1155376080  317006236 6208050 81  64
> 1628   133616   018
>  3  924071961  976405761 2208050 957 512
> 1628  1249180   0   446
>  4  619894993 1297457141 1208050 1157   
> 1024 1628  3837400   0   134
>  5 1401440370  656655020 6218050 213429
> 131072 1628 54052576   0 82358
>  6 1053905064 1350670129 2208050 365 256
> 1628   529956   0   110
>  7  963519080 1084306943 2208050 2564   
> 2048 1628  4019040   0   517
>  8 1909033200  47372346598208050 3851   
> 2048 3267 12775756   0  1804
>etc.
> 
> Because of the concurrency difficulties that Brian mentioned . . .
> 
> On 7/4/2012 5:26 AM, Brian Leach wrote:
>> What makes the implementation difficult is that Litwin et al were all
>> assuming a single threaded access to an in-memory list. Concurrent
>> access whilst maintaining the load factor, split pointer and splitting
>> all add a lot more complexity, unless you lock the whole file for the
>> duration of an IO operation and kill the performance.
> . . . is why UV reserves a table in shared memory for dynamic files, per
> SMAT -d.
> The 1st user that opens the file causes the control info in the file
> header to be loaded to shared memory, where it remains until Ref Count
> drops to 0.
> (It also get written to the file whenever there is a change.  At least
> on modern versions.)

Actually, thinking about it, why do you need to lock the entire file
when splitting or merging?

A merge actually could be done very quickly, to merge groups 10 and 2
you just chain 10 on to the end of 2 and don't bother actually
consolidating them.

But to split 2 into 2 and 10, you just need an exclusive lock on both of
them. Any attempt to access 1 or 3 or 9 can just sail right on by - only
if a process wants to access the group being split do you need to stall
it until you've finished. Although that is a problem if you're
sequentially scanning the file - which does block split/merge while
you're doing it.

I remember coming across a very badly sized dynamic file where that had
obviously happened - I guess someone had left a program half way through
a BASIC SELECT for a week or so and the file had grown somewhat
horrendously. It slowly corrected itself though. (I found it because our
client's system was horribly slow and I was looking for the cause. This
wasn't it though - it turned out to be some nasty code somewhere else,
can't remember exactly what.)

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Wols Lists
On 04/07/12 19:59, Rick Nuckolls wrote:
> I believe PiOpen used a directory with two files in it ‘&$0’ and ‘&$1’ 
> corresponding to DATA.30 and OVER.30.  If the numbers went up from there, I 
> think that they corresponded to alternate keys, ie ‘&$2’ and ‘&$3’ 
> represented DATA.30 and OVER.30 for the first alternate key.
> 
And &$2, and &$3, and the rest, iirc ...

> I do not think that PiOpen supported statically hashed files.  (Pr1me 
> Information did)

I'd be very surprised if it didn't. I might look up the manuals in my
garage and check ...

Or try to boot my EXL7330 and actually see what it does -)
> 
> All of that is a few years ago

Agreed. But I dug into that at the time, and I'm pretty certain there
were a lot more than just two files in most dynamic file directories...
I might even have a CD somewhere with a tape-dump on it ...
> 
> Unidata uses dat001 and over001 with the number increasing to allow for very 
> large files (I think).
> 
> -Rick
> 
Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Rick Nuckolls
I believe PiOpen used a directory with two files in it ‘&$0’ and ‘&$1’ 
corresponding to DATA.30 and OVER.30.  If the numbers went up from there, I 
think that they corresponded to alternate keys, ie ‘&$2’ and ‘&$3’ represented 
DATA.30 and OVER.30 for the first alternate key.

I do not think that PiOpen supported statically hashed files.  (Pr1me 
Information did)

All of that is a few years ago

Unidata uses dat001 and over001 with the number increasing to allow for very 
large files (I think).

-Rick

On Jul 4, 2012, at 10:51 AM, Wols Lists wrote:

> On 04/07/12 11:26, Brian Leach wrote:
>>> All the other groups effectively get 1 added to their number
>> Not exactly.
>> 
>> Sorry to those who already know this, but maybe it's time to go over linear
>> hashing in theory ..
>> 
>> Linear hashing was a system devised by Litwin and originally only for
>> in-memory lists. In fact there's some good implementations in C# that
>> provide better handling of Dictionary types. Applying it to a file system
>> adds some complexity but it's basically the same theory.
>> 
>> Let's start with a file that has 100 groups initially defined (that's 0
>> through 99). That is your minimum starting point and should ensure that it
>> never shrinks below that, so it doesn't begin it's life with loads of splits
>> right from the start as you populate the file. You would size this similarly
>> to the way you size a regular hashed file for your initial content: no point
>> making work for yourself (or the database).
>> 
>> As data gets added, because the content is allocated unevenly, some of that
>> load will be in primary and some in overflow: that's just the way of the
>> world. No hashing is perfect. Unlike a static file, the overflow can't be
>> added to the end of the file as a linked list (* why nobody has done managed
>> overflow is beyond me), it has to sit in a separate file.
> 
> I don't know what the definition of "badly overflowed" is, but assuming
> that a badly overflowed group has two blocks of overflow, then those
> file stats seem perfectly okay. As Brian has explained, the distribution
> of records is "lumpy" and as a percentage of the file, there aren't many
> badly overflowed groups.
> 
> You've got roughly 1/3 of groups overflowed - with an 80% split that
> doesn't seem at all out of order - on average each group is 80% full so
> 1/3rd more than 100% full is fine.
> 
> You've got (in thousands) one and a half groups badly overflowed out of
> eighty-three. That's less than two percent. That's nothing.
> 
> As for why no-one has done managed overflow, I think there are various
> reasons. The first successful implementation (Prime INFORMATION) didn't
> need it. It used a peculiar type of file called a "Segmented Directory"
> and while I don't know for certain what PI did, I strongly suspect each
> group had its own normal file so if a group overflowed, it just created
> a new block at the end of the file. Same with large records, it
> allocated a bunch of overflow blocks. This file structure was far more
> evident with PI-Open - at the OS level a dynamic file was a OS directory
> with lots of numbered files in it.
> 
> The UV implementation of "one file for data, one file for overflow" may
> be unique to UV. I don't know. What little I know of UD tells me it's
> different, and others like QM could well be different again. I wouldn't
> actually be surprised if QM is like PI.
> 
> Cheers,
> Wol
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Wols Lists
On 04/07/12 11:26, Brian Leach wrote:
>> All the other groups effectively get 1 added to their number
> Not exactly.
> 
> Sorry to those who already know this, but maybe it's time to go over linear
> hashing in theory ..
> 
> Linear hashing was a system devised by Litwin and originally only for
> in-memory lists. In fact there's some good implementations in C# that
> provide better handling of Dictionary types. Applying it to a file system
> adds some complexity but it's basically the same theory.
> 
> Let's start with a file that has 100 groups initially defined (that's 0
> through 99). That is your minimum starting point and should ensure that it
> never shrinks below that, so it doesn't begin it's life with loads of splits
> right from the start as you populate the file. You would size this similarly
> to the way you size a regular hashed file for your initial content: no point
> making work for yourself (or the database).
> 
> As data gets added, because the content is allocated unevenly, some of that
> load will be in primary and some in overflow: that's just the way of the
> world. No hashing is perfect. Unlike a static file, the overflow can't be
> added to the end of the file as a linked list (* why nobody has done managed
> overflow is beyond me), it has to sit in a separate file.

I don't know what the definition of "badly overflowed" is, but assuming
that a badly overflowed group has two blocks of overflow, then those
file stats seem perfectly okay. As Brian has explained, the distribution
of records is "lumpy" and as a percentage of the file, there aren't many
badly overflowed groups.

You've got roughly 1/3 of groups overflowed - with an 80% split that
doesn't seem at all out of order - on average each group is 80% full so
1/3rd more than 100% full is fine.

You've got (in thousands) one and a half groups badly overflowed out of
eighty-three. That's less than two percent. That's nothing.

As for why no-one has done managed overflow, I think there are various
reasons. The first successful implementation (Prime INFORMATION) didn't
need it. It used a peculiar type of file called a "Segmented Directory"
and while I don't know for certain what PI did, I strongly suspect each
group had its own normal file so if a group overflowed, it just created
a new block at the end of the file. Same with large records, it
allocated a bunch of overflow blocks. This file structure was far more
evident with PI-Open - at the OS level a dynamic file was a OS directory
with lots of numbered files in it.

The UV implementation of "one file for data, one file for overflow" may
be unique to UV. I don't know. What little I know of UD tells me it's
different, and others like QM could well be different again. I wouldn't
actually be surprised if QM is like PI.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Charles Stevenson
>SMAT -d  (or ANALYZE.SHM -d)   see uv/bin/smat[.exe] 
uv/bin/analyze.shm[.exe]


Dynamic Files:
Slot #  Inode Device Ref Count Htype Split Merge Curmod Basemod 
Largerec   Filesp Selects Nextsplit
 0 1285128087 209307792516208050 4001
2048 3267  2782736   0  1954
 1  153221440 151542860060208050 397040 
262144 1628 58641084   0134897
 2 1155376080  317006236 6208050 81  64 
1628   133616   018
 3  924071961  976405761 2208050 957 
512 1628  1249180   0   446
 4  619894993 1297457141 1208050 1157
1024 1628  3837400   0   134
 5 1401440370  656655020 6218050 213429 
131072 1628 54052576   0 82358
 6 1053905064 1350670129 2208050 365 
256 1628   529956   0   110
 7  963519080 1084306943 2208050 2564
2048 1628  4019040   0   517
 8 1909033200  47372346598208050 3851
2048 3267 12775756   0  1804

   etc.

Because of the concurrency difficulties that Brian mentioned . . .

On 7/4/2012 5:26 AM, Brian Leach wrote:

What makes the implementation difficult is that Litwin et al were all assuming 
a single threaded access to an in-memory list. Concurrent access whilst 
maintaining the load factor, split pointer and splitting all add a lot more 
complexity, unless you lock the whole file for the duration of an IO operation 
and kill the performance.
. . . is why UV reserves a table in shared memory for dynamic files, per 
SMAT -d.
The 1st user that opens the file causes the control info in the file 
header to be loaded to shared memory, where it remains until Ref Count 
drops to 0.
(It also get written to the file whenever there is a change.  At least 
on modern versions.)


Rick's post makes good sense if you work the numbers in the SMAT table.
Notice that (Curmod - Basemod) + 1 = Nextsplit  (off by 1 because groups 
start at 0.)
As Rick pointed out, Basemod is always a power of 2.  It is used by the 
hashing algorithms.  E.g., That 64 will eventually change to 128 or 32, 
once enough splits or merges happen.


Notice also that the future "Nextsplit" group number is set, i.e., 
predictable.  Remember Brian & Rick (others?) saying that split/merge 
decisions are determined by the entire file load, not which individual 
group that might happen to be in heavy overload? They were right: it is 
methodical.


Chris,
Notice that every number in the Split, Merge, & Largerec columns are the 
default values.
Although I do have exceptions, any random grab of 9 files like this 
would likely show straight default values.   Generally, fine-tuning 
isn't worth the bother.  It's more bang for the IT buck to buy more 
memory, disk than to pay Brian or Rick to squeeze performance out of 
type-30 files.



cds
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Rick Nuckolls
This makes it sound as if you might need to search two groups for a record, 
which is not correct.  If the initial hash is based on the larger modulo, and 
the group exists, then the key will be in the higher number group.  If the 
result of the first hash is larger than the modulus of the of the table, then 
you rehash with the smaller modulus.

And the modulo used for hashing is always a power of two. 

So if the initial hash function on a key is f(x), then the key will either be 
in f(x) mod 2**n or, if that group has not been created, then in f(x) mod 
2**(n-1).  n increases each time that the modulus grows to equal 2**n+1. So

For a modulus of 3 or 4, n = 2; for 5,6,7,8, n =3.

For instance:

If your groups are numbered 0-5 (6 groups), and your hash value is 5, then you 
are in the last (6th) group because the 5 mod 8 is 5.  Likewise 6 mod 8 is 6, 
but this would be beyond the highest group we have (5).  6 mod 4 is 2, and that 
is the group where 6 should fall. Likewise 7 should fall into group 3.  After 
two more splits (of groups 2 & 3) the modulus will be 8, and no rehashing is 
necessary until we next split group 0 and add a 9th group, at which point we 
start with a mod 16, and use 8 if the first result is over 8 (8 would go into 
the 9th group, 9 would hash into the second group, #1 -- 9 mod 4 -> 1.


Admittedly, this is probably at least as confusing as every other explanation 
of the process ;-)

-Rick

On Jul 4, 2012, at 3:26 AM, Brian Leach wrote:

>> All the other groups effectively get 1 added to their number
> 
> Not exactly.
> 
> Sorry to those who already know this, but maybe it's time to go over linear
> hashing in theory ..
> 
> Linear hashing was a system devised by Litwin and originally only for
> in-memory lists. In fact there's some good implementations in C# that
> provide better handling of Dictionary types. Applying it to a file system
> adds some complexity but it's basically the same theory.
> 
> Let's start with a file that has 100 groups initially defined (that's 0
> through 99). That is your minimum starting point and should ensure that it
> never shrinks below that, so it doesn't begin it's life with loads of splits
> right from the start as you populate the file. You would size this similarly
> to the way you size a regular hashed file for your initial content: no point
> making work for yourself (or the database).
> 
> As data gets added, because the content is allocated unevenly, some of that
> load will be in primary and some in overflow: that's just the way of the
> world. No hashing is perfect. Unlike a static file, the overflow can't be
> added to the end of the file as a linked list (* why nobody has done managed
> overflow is beyond me), it has to sit in a separate file.
> 
> At some point the amount of data held in respect of the available space
> reaches a critical level and the file needs to reorganize. Rather than split
> the most heavily populated group - which would be the obvious thing - linear
> hashing works on the basis of a split pointer that moves incrementally
> through the file. So the first split breaks group 0 and adds group 100 to
> the end of the file, hopefully moving around half the content of group 0 to
> this new group. Of course, there is no guarantee that it will (depending on
> key structure) and also no guarantee that this will help anything, if group
> 0 isn't overflowed or populated anyway. So the next write may also cause a
> split, except now to split group 1 into a new group 101, and so forth.
> 
> Eventually the pointer will reach the end and all the initial 100 groups
> will have been split, and the whole process restarts with the split pointer
> moving back to zero. You now have 200 groups and by this time everything
> should in theory have levelled out, but in the meantime there is still
> overloading and stuff will still be in overflow. The next split will create
> group 200 and split half of group 0 into it, and the whole process repeats
> for ever.
> 
> Oversized records (> buffer size) also get moved out because they stuff up
> the block allocation.
> 
> So why this crazy system, rather than hitting the filled groups as they get
> overstuffed? Because it makes finding a record easy. Because linear hashing
> is based on a power of 2, the maths is simple - if the group is after the
> split point, the record MUST be in that group (or its overflow). If it is
> before the split point, it could be in the original group or the split
> group: so you can just rehash with double the modulus to check which one
> without even having to scan the groups.
> 
> What makes the implementation difficult is that Litwin et al were all
> assuming a single threaded access to an in-memory list. Concurrent access
> whilst maintaining the load factor, split pointer and splitting all add a
> lot more complexity, unless you lock the whole file for the duration of an
> IO operation and kill the performance.
> 
> And coming back to the manual, storing la

Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Charles Stevenson

Good explanation, Brian!
To anyone who skipped it because it looked long:  read it anyway.
cds

On 7/4/2012 5:26 AM, Brian Leach wrote:

Sorry to those who already know this, but maybe it's time to go over linear 
hashing in theory ..

Linear hashing was a system devised by Litwin and originally only for in-memory 
lists. In fact there's some good implementations in C# that provide better 
handling of Dictionary types. Applying it to a file system adds some complexity 
but it's basically the same theory.

Let's start with a file that has 100 groups initially defined (that's 0 through 
99). That is your minimum starting point and should ensure that it never 
shrinks below that, so it doesn't begin it's life with loads of splits right 
from the start as you populate the file. You would size this similarly to the 
way you size a regular hashed file for your initial content: no point making 
work for yourself (or the database).

As data gets added, because the content is allocated unevenly, some of that 
load will be in primary and some in overflow: that's just the way of the world. 
No hashing is perfect. Unlike a static file, the overflow can't be added to the 
end of the file as a linked list (* why nobody has done managed overflow is 
beyond me), it has to sit in a separate file.

At some point the amount of data held in respect of the available space reaches 
a critical level and the file needs to reorganize. Rather than split the most 
heavily populated group - which would be the obvious thing - linear hashing 
works on the basis of a split pointer that moves incrementally through the 
file. So the first split breaks group 0 and adds group 100 to the end of the 
file, hopefully moving around half the content of group 0 to this new group. Of 
course, there is no guarantee that it will depending on key structure) and also 
no guarantee that this will help anything, if group 0 isn't overflowed or 
populated anyway. So the next write may also cause a split, except now to split 
group 1 into a new group 101, and so forth.

Eventually the pointer will reach the end and all the initial 100 groups will 
have been split, and the whole process restarts with the split pointer moving 
back to zero. You now have 200 groups and by this time everything should in 
theory have levelled out, but in the meantime there is still overloading and 
stuff will still be in overflow. The next split will create group 200 and split 
half of group 0 into it, and the whole process repeats for ever.

Oversized records (> buffer size) also get moved out because they stuff up the 
block allocation.

So why this crazy system, rather than hitting the filled groups as they get 
overstuffed? Because it makes finding a record easy. Because linear hashing is 
based on a power of 2, the maths is simple - if the group is after the split 
point, the record MUST be in that group (or its overflow). If it is before the 
split point, it could be in the original group or the split group: so you can 
just rehash with double the modulus to check which one without even having to 
scan the groups.

What makes the implementation difficult is that Litwin et al were all assuming 
a single threaded access to an in-memory list. Concurrent access whilst 
maintaining the load factor, split pointer and splitting all add a lot more 
complexity, unless you lock the whole file for the duration of an IO operation 
and kill the performance.

And coming back to the manual, storing large numbers of data items - even large 
ones - in a type 19 file is a bad idea. Traversing directories is slow, 
especially in Windows, and locking is done against the whole directory.

Brian

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-04 Thread Brian Leach
> All the other groups effectively get 1 added to their number

Not exactly.

Sorry to those who already know this, but maybe it's time to go over linear
hashing in theory ..

Linear hashing was a system devised by Litwin and originally only for
in-memory lists. In fact there's some good implementations in C# that
provide better handling of Dictionary types. Applying it to a file system
adds some complexity but it's basically the same theory.

Let's start with a file that has 100 groups initially defined (that's 0
through 99). That is your minimum starting point and should ensure that it
never shrinks below that, so it doesn't begin it's life with loads of splits
right from the start as you populate the file. You would size this similarly
to the way you size a regular hashed file for your initial content: no point
making work for yourself (or the database).

As data gets added, because the content is allocated unevenly, some of that
load will be in primary and some in overflow: that's just the way of the
world. No hashing is perfect. Unlike a static file, the overflow can't be
added to the end of the file as a linked list (* why nobody has done managed
overflow is beyond me), it has to sit in a separate file.

At some point the amount of data held in respect of the available space
reaches a critical level and the file needs to reorganize. Rather than split
the most heavily populated group - which would be the obvious thing - linear
hashing works on the basis of a split pointer that moves incrementally
through the file. So the first split breaks group 0 and adds group 100 to
the end of the file, hopefully moving around half the content of group 0 to
this new group. Of course, there is no guarantee that it will (depending on
key structure) and also no guarantee that this will help anything, if group
0 isn't overflowed or populated anyway. So the next write may also cause a
split, except now to split group 1 into a new group 101, and so forth.

Eventually the pointer will reach the end and all the initial 100 groups
will have been split, and the whole process restarts with the split pointer
moving back to zero. You now have 200 groups and by this time everything
should in theory have levelled out, but in the meantime there is still
overloading and stuff will still be in overflow. The next split will create
group 200 and split half of group 0 into it, and the whole process repeats
for ever.

Oversized records (> buffer size) also get moved out because they stuff up
the block allocation.

So why this crazy system, rather than hitting the filled groups as they get
overstuffed? Because it makes finding a record easy. Because linear hashing
is based on a power of 2, the maths is simple - if the group is after the
split point, the record MUST be in that group (or its overflow). If it is
before the split point, it could be in the original group or the split
group: so you can just rehash with double the modulus to check which one
without even having to scan the groups.

What makes the implementation difficult is that Litwin et al were all
assuming a single threaded access to an in-memory list. Concurrent access
whilst maintaining the load factor, split pointer and splitting all add a
lot more complexity, unless you lock the whole file for the duration of an
IO operation and kill the performance.

And coming back to the manual, storing large numbers of data items - even
large ones - in a type 19 file is a bad idea. Traversing directories is
slow, especially in Windows, and locking is done against the whole
directory..

Brian




___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Keith Johnson [DATACOM]
Doug may have had a key bounce in his input

> Let's do the math:
>
> 258687736 (Record Size)
> 192283300 (Key Size)
> 

The key size is actually 19283300 in Chris' figures

Regarding 68,063 being less than the current modulus of 82,850.  I think the 
answer may lie in the splitting process.

As I understand it, the first time a split occurs group 1 is split and its 
contents are split between new group 1 and new group 2. All the other groups 
effectively get 1 added to their number. The next split is group 3 (which was 
2) into 3 and 4 and so forth. A pointer is kept to say where the next split 
will take place and also to help sort out how to adjust the algorithm to 
identify which group matches a given key.

Based on this, if you started with 1000 groups, by the time you have split the 
500th time you will have 1500 groups.  The first 1000 will be relatively empty, 
the last 500 will probably be overflowed, but not terribly badly.  By the time 
you get to the 1000th split, you will have 2000 groups and they will, one 
hopes, be quite reasonably spread with very little overflow.

So I expect the average access times would drift up and down in a cycle.  The 
cycle time would get longer as the file gets bigger but the worst time would be 
roughly the the same each cycle.

Given the power of two introduced into the algorithm by the before/after the 
split thing, I wonder if there is such a need to start off with a prime?

Regards, Keith

PS I'm getting a bit Tony^H^H^H^Hverbose nowadays.

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Ross Ferris
I would suggest that then actual goal is to achieve maximum performance for 
your system, so knowing HOW the file is used on a daily basis can also 
influence decisions. Disk is a cheap commodity, so having some "wastage" in 
file utilization shouldn't factor. 


Ross Ferris
Stamina Software
Visage > Better by Design!


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Wednesday, 4 July 2012 7:38 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


This is why I'm confused.. Is the goal here to reduce 'overflow' or to keep the 
'Total size' of the disk down? If the goal is to keep the total  disk size down 
then it would appear you would want your actual load % a lot higher than 37%.. 
and then ignore 'some' of the overflow..

Chris


> But the total size of your file is up 60%.  Reading in 60% more records in a 
> full select of the file is going to be much slower than a few more overflows.
> 
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris 
> Austin
> Sent: Tuesday, July 03, 2012 2:15 PM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Dan,
> 
> I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
> Actual Load has really gone down (as well as overflow). See below for the 
> results:
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
> 3957 overflowed, 207 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   90% (split), 50% (merge) and 37% (actual)
> Total size .   836235264 bytes
> Total size of record data ..   287394719 bytes
> Total size of record IDs ...   21508521 bytes
> Unused space ...   527323832 bytes
> Total space for records    836227072 bytes
> 
> My overflow is now @ 2%
> My Load is @ 37% (actual)
> 
> granted my empty groups are now up to almost 3% but I hope that won't be a 
> big factor. How does this look?
> 
> Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
From the System Description manual:

Important Considerations

Dynamic files are meant to make file management easier for users. The default
parameters are set so that most dynamic files work efficiently. If you decide 
to change
the parameters of a dynamic file, keep the following considerations in mind:

􀂄 Use the SEQ.NUM hashing algorithm only when your record IDs are
numeric, sequential, and consecutive. Nonconsecutive numbers should not
be hashed using the SEQ.NUM hashing algorithm.

􀂄 Use a group size of 2 only if you expect the average record size to be larger
than 1000 bytes. If your record size is larger than 2000 bytes, consider using
a nonhashed file—type 1 or 19.

􀂄 Large record size should generally not be changed. Storing the data of a
large record in the overflow buffer causes that data not to be included in the
split and merge calculations. Also, the extra data length does not slow access
to subsequent records. By choosing a large record size of 0%, all the records
are considered large. In this case, record IDs can be accessed extremely
quickly by commands such as SELECT, but access to the actual data is
much less efficient.

􀂄 A small split load causes less data to be stored in each group buffer, 
resulting
in faster access time and less overflow at the expense of requiring extra
memory. A large split load causes more data to be stored in each group
buffer, resulting in better use of memory at the expense of slower access
time and more overflow. A split load of 100% disables splits.

􀂄 The gap between merge load and split load should be large enough so that
splits and merges do not occur too frequently. The split and merge processes
take a significant amount of processing time. If you make the merge load too
small, memory usage can be very poor. Also, selection time is increased
when record IDs are distributed in more groups than are needed. A merge
load of 0% disables merges.

􀂄 Consider increasing the minimum modulo if you intend to add a lot of initial
data to the file. Much data-entry time can be saved by avoiding the initial
splits that can occur if you enter a lot of initial data. You may want to
readjust this value after

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Charles Stevenson
Sent: Tuesday, July 03, 2012 3:34 PM
To: U2 Users List
Subject: Re: [U2] RESIZE - dynamic files

Chris,
Let's back way up.   I take it your original question is a general one,  
not specific to one poorly performing problematic file.  Is that right?

If so, generally speaking, you just don't get a lot out of fine-tuning 
dynamic files.
Tweaking the default parameters doesn't usually make a whole lot of 
difference.
Several people have said something similar in this thread.

Other than deciding which hashing algorithm,  I generally use the 
defaults and only tweak things once the file proves problematic, which 
usually means slow I/O.

When a problem erupts, look carefully at how that specific file is used, 
as Susan & others have said.   You might get hold of Fitzgerald&Long's 
paper on how dynamic files work.  If you understand the fundamentals, 
you'll understand how to attack your problem file, applying the ideas 
Rick & others have talked about here.

You may go several years without having to resort to that.

Chuck Stevenson


On 7/2/2012 2:22 PM, Chris Austin wrote:
> I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
> example I have a file called 'TEST_FILE'
> with the following:
>
> 01 ANALYZE.FILE TEST_FILE
> File name ..   TEST_FILE
> Pathname ...   TEST_FILE
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    83261 current ( minimum 31 )
> Large record size ..   3267 bytes
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   450613248 bytes
>
> How do you calculate what the modulus and separation should be? I can't use 
> HASH.HELP on a type 30 file to see the recommended settings
> so I was wondering how best you figure out the file RESIZE.
>
> Thanks,
>
> Chris
>

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Charles Stevenson

Chris,
Let's back way up.   I take it your original question is a general one,  
not specific to one poorly performing problematic file.  Is that right?


If so, generally speaking, you just don't get a lot out of fine-tuning 
dynamic files.
Tweaking the default parameters doesn't usually make a whole lot of 
difference.

Several people have said something similar in this thread.

Other than deciding which hashing algorithm,  I generally use the 
defaults and only tweak things once the file proves problematic, which 
usually means slow I/O.


When a problem erupts, look carefully at how that specific file is used, 
as Susan & others have said.   You might get hold of Fitzgerald&Long's 
paper on how dynamic files work.  If you understand the fundamentals, 
you'll understand how to attack your problem file, applying the ideas 
Rick & others have talked about here.


You may go several years without having to resort to that.

Chuck Stevenson


On 7/2/2012 2:22 PM, Chris Austin wrote:

I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How do you calculate what the modulus and separation should be? I can't use 
HASH.HELP on a type 30 file to see the recommended settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris



___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
Unless the minimum modulus is configured high enough to artificially lower the 
actual load, the actual load will rise to the designated split.load as the file 
grows. The split.load indicates nothing about the specific load of any given 
group; so if it is set to 90%, then on average, each group will be 90% full, 
and adding a (400byte) record to a group will send it into overflow, but since 
400 bytes is a trivial percentage of your overall file load, many groups will 
be overflowed before the total load factor exceeds 90%.  

Okay, there is a slight distortion with the numbers there, but the idea is that 
all buckets are not loaded equally, so if the average is "almost full" the 
reality is "many overflowed".

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:52 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


I set the split load based on what Dan suggested:

"I'd take the merge down a little, to maybe 30% or even less, and maybe knock 
the split up a bit - say, 90% - to cut down on the splitting."

I thought this would cut down on splitting. Is there a certain formula, or way 
to calculate the split.load? What should my SPLIT.LOAD be around,
and how do you come up with that %?

Chris

> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 14:45:28 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 37% is a very low load.  Reading disk records takes much longer than parsing 
> the records out of a disk record.  With variable record size and moderately 
> poor hashing, overflow is inevitable.  So, do you want 80,000 extra groups, 
> or 20,000 overflow buffers? I would go with the smaller number.  But for the 
> love of Knuth, do not set your split.load to 90% unless you have a perfectly 
> hashed file with uniformly sized records.
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 2:38 PM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
> keep the 'Total size' of the disk down? If the goal is to keep the total
>  disk size down then it would appear
> you would want your actual load % a lot higher than 37%.. and then ignore 
> 'some' of the overflow..
> 
> Chris
> 
> 
> > But the total size of your file is up 60%.  Reading in 60% more records in 
> > a full select of the file is going to be much slower than a few more 
> > overflows.
> > 
> > 
> > -Original Message-
> > From: u2-users-boun...@listserver.u2ug.org 
> > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > Sent: Tuesday, July 03, 2012 2:15 PM
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > Dan,
> > 
> > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and 
> > my Actual Load has really gone down (as well as overflow). See below for 
> > the results:
> > 
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
> > 3957 overflowed, 207 badly )
> > Number of records ..   1290469
> > Large record size ..   3267 bytes
> > Number of large records    180
> > Group size .   4096 bytes
> > Load factors ...   90% (split), 50% (merge) and 37% (actual)
> > Total size .   836235264 bytes
> > Total size of record data ..   287394719 bytes
> > Total size of record IDs ...   21508521 bytes
> > Unused space ...   527323832 bytes
> > Total space for records    836227072 bytes
> > 
> > My overflow is now @ 2%
> > My Load is @ 37% (actual)
> > 
> > granted my empty groups are now up to almost 3% but I hope that won't be a 
> > big factor. How does this look?
> > 
> > Chris
> 
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
> ___

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

I set the split load based on what Dan suggested:

"I'd take the merge down a little, to maybe 30% or even less, and maybe knock 
the split up a bit - say, 90% - to cut down on the splitting."

I thought this would cut down on splitting. Is there a certain formula, or way 
to calculate the split.load? What should my SPLIT.LOAD be around,
and how do you come up with that %?

Chris

> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 14:45:28 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 37% is a very low load.  Reading disk records takes much longer than parsing 
> the records out of a disk record.  With variable record size and moderately 
> poor hashing, overflow is inevitable.  So, do you want 80,000 extra groups, 
> or 20,000 overflow buffers? I would go with the smaller number.  But for the 
> love of Knuth, do not set your split.load to 90% unless you have a perfectly 
> hashed file with uniformly sized records.
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 2:38 PM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
> keep the 'Total size' of the disk down? If the goal is to keep the total
>  disk size down then it would appear
> you would want your actual load % a lot higher than 37%.. and then ignore 
> 'some' of the overflow..
> 
> Chris
> 
> 
> > But the total size of your file is up 60%.  Reading in 60% more records in 
> > a full select of the file is going to be much slower than a few more 
> > overflows.
> > 
> > 
> > -Original Message-
> > From: u2-users-boun...@listserver.u2ug.org 
> > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > Sent: Tuesday, July 03, 2012 2:15 PM
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > Dan,
> > 
> > I changed the MINIMUM.MODULUS to the value of 23 as you suggested and 
> > my Actual Load has really gone down (as well as overflow). See below for 
> > the results:
> > 
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
> > 3957 overflowed, 207 badly )
> > Number of records ..   1290469
> > Large record size ..   3267 bytes
> > Number of large records    180
> > Group size .   4096 bytes
> > Load factors ...   90% (split), 50% (merge) and 37% (actual)
> > Total size .   836235264 bytes
> > Total size of record data ..   287394719 bytes
> > Total size of record IDs ...   21508521 bytes
> > Unused space ...   527323832 bytes
> > Total space for records    836227072 bytes
> > 
> > My overflow is now @ 2%
> > My Load is @ 37% (actual)
> > 
> > granted my empty groups are now up to almost 3% but I hope that won't be a 
> > big factor. How does this look?
> > 
> > Chris
> 
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Susan Lynch
Chris,

This is why file-sizing is something that requires careful thought.  As
some of the other responders have indicated, sometimes you want to keep
overflow to a minimum (because accessing individual records that are in
overflow takes extra disk reads, which slow down your system, and adding
new records to a group that is already in overflow will inevitably be
slower than adding a new record to a group which is not in overflow), and
sometimes you don't (eg if you have a file that is primarily read in a
sequential fashion where you do a Basic SELECT, and then loop through the
file reading every single record).   Because most of the files that I have
supported in my career have been read and written primarily as
single-record reads, I have always chosen to minimize overflow as my
default criteria, and only sized things for sequential reads when the file
is rarely written, rarely read as anything but a 'read them all in no
particular order' fashion, and that happens rarely in my experience.
However, as other responders have written, 'your mileage may vary'!

Look at how the file is used.  Look at what resources you have.  Then
decide...


Susan M. Lynch
F. W. Davison & Company, Inc.
-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 07/03/2012 5:38 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


This is why I'm confused.. Is the goal here to reduce 'overflow' or to
keep the 'Total size' of the disk down? If the goal is to keep the total
 disk size down then it would appear
you would want your actual load % a lot higher than 37%.. and then ignore
'some' of the overflow..

Chris


> But the total size of your file is up 60%.  Reading in 60% more records
in a full select of the file is going to be much slower than a few more
overflows.
>
>
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 2:15 PM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
>
>
> Dan,
>
> I changed the MINIMUM.MODULUS to the value of 23 as you suggested
and my Actual Load has really gone down (as well as overflow). See below
for the results:
>
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    23 current ( minimum 23, 5263
empty,
> 3957 overflowed, 207 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   90% (split), 50% (merge) and 37% (actual)
> Total size .   836235264 bytes
> Total size of record data ..   287394719 bytes
> Total size of record IDs ...   21508521 bytes
> Unused space ...   527323832 bytes
> Total space for records    836227072 bytes
>
> My overflow is now @ 2%
> My Load is @ 37% (actual)
>
> granted my empty groups are now up to almost 3% but I hope that won't be
a big factor. How does this look?
>
> Chris


___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
37% is a very low load.  Reading disk records takes much longer than parsing 
the records out of a disk record.  With variable record size and moderately 
poor hashing, overflow is inevitable.  So, do you want 80,000 extra groups, or 
20,000 overflow buffers? I would go with the smaller number.  But for the love 
of Knuth, do not set your split.load to 90% unless you have a perfectly hashed 
file with uniformly sized records.

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:38 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
keep the 'Total size' of the disk down? If the goal is to keep the total
 disk size down then it would appear
you would want your actual load % a lot higher than 37%.. and then ignore 
'some' of the overflow..

Chris


> But the total size of your file is up 60%.  Reading in 60% more records in a 
> full select of the file is going to be much slower than a few more overflows.
> 
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 2:15 PM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Dan,
> 
> I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
> Actual Load has really gone down (as well as overflow). See below for the 
> results:
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
> 3957 overflowed, 207 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   90% (split), 50% (merge) and 37% (actual)
> Total size .   836235264 bytes
> Total size of record data ..   287394719 bytes
> Total size of record IDs ...   21508521 bytes
> Unused space ...   527323832 bytes
> Total space for records    836227072 bytes
> 
> My overflow is now @ 2%
> My Load is @ 37% (actual)
> 
> granted my empty groups are now up to almost 3% but I hope that won't be a 
> big factor. How does this look?
> 
> Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Wjhonson

Disks get "bigger" much faster than the rate they get "faster".
So the overflow is the thing to minimize.



-Original Message-
From: Chris Austin 
To: u2-users 
Sent: Tue, Jul 3, 2012 2:38 pm
Subject: Re: [U2] RESIZE - dynamic files



his is why I'm confused.. Is the goal here to reduce 'overflow' or to 
eep the 'Total size' of the disk down? If the goal is to keep the total
disk size down then it would appear
ou would want your actual load % a lot higher than 37%.. and then ignore 'some' 
f the overflow..
Chris

 But the total size of your file is up 60%.  Reading in 60% more records in a 
ull select of the file is going to be much slower than a few more overflows.
 
 
 -Original Message-
 From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] 
n Behalf Of Chris Austin
 Sent: Tuesday, July 03, 2012 2:15 PM
 To: u2-users@listserver.u2ug.org
 Subject: Re: [U2] RESIZE - dynamic files
 
 
 Dan,
 
 I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
ctual Load has really gone down (as well as overflow). See below for the 
esults:
 
 File name ..   GENACCTRN_POSTED
 Pathname ...   GENACCTRN_POSTED
 File type ..   DYNAMIC
 File style and revision    32BIT Revision 12
 Hashing Algorithm ..   GENERAL
 No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
 3957 overflowed, 207 badly )
 Number of records ..   1290469
 Large record size ..   3267 bytes
 Number of large records    180
 Group size .   4096 bytes
 Load factors ...   90% (split), 50% (merge) and 37% (actual)
 Total size .   836235264 bytes
 Total size of record data ..   287394719 bytes
 Total size of record IDs ...   21508521 bytes
 Unused space ...   527323832 bytes
 Total space for records    836227072 bytes
 
 My overflow is now @ 2%
 My Load is @ 37% (actual)
 
 granted my empty groups are now up to almost 3% but I hope that won't be a big 
actor. How does this look?
 
 Chris
  
__
2-Users mailing list
2-us...@listserver.u2ug.org
ttp://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

This is why I'm confused.. Is the goal here to reduce 'overflow' or to 
keep the 'Total size' of the disk down? If the goal is to keep the total
 disk size down then it would appear
you would want your actual load % a lot higher than 37%.. and then ignore 
'some' of the overflow..

Chris


> But the total size of your file is up 60%.  Reading in 60% more records in a 
> full select of the file is going to be much slower than a few more overflows.
> 
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 2:15 PM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Dan,
> 
> I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
> Actual Load has really gone down (as well as overflow). See below for the 
> results:
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
> 3957 overflowed, 207 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   90% (split), 50% (merge) and 37% (actual)
> Total size .   836235264 bytes
> Total size of record data ..   287394719 bytes
> Total size of record IDs ...   21508521 bytes
> Unused space ...   527323832 bytes
> Total space for records    836227072 bytes
> 
> My overflow is now @ 2%
> My Load is @ 37% (actual)
> 
> granted my empty groups are now up to almost 3% but I hope that won't be a 
> big factor. How does this look?
> 
> Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
I should have said "60% more disk records", to be clear.

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Rick Nuckolls
Sent: Tuesday, July 03, 2012 2:24 PM
To: 'U2 Users List'
Subject: Re: [U2] RESIZE - dynamic files

But the total size of your file is up 60%.  Reading in 60% more records in a 
full select of the file is going to be much slower than a few more overflows.


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:15 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Dan,

I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
Actual Load has really gone down (as well as overflow). See below for the 
results:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
3957 overflowed, 207 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 37% (actual)
Total size .   836235264 bytes
Total size of record data ..   287394719 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   527323832 bytes
Total space for records    836227072 bytes

My overflow is now @ 2%
My Load is @ 37% (actual)

granted my empty groups are now up to almost 3% but I hope that won't be a big 
factor. How does this look?

Chris


> From: dangf...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 16:57:34 -0400
> Subject: Re: [U2] RESIZE - dynamic files
>
>
> One rule of thumb is to make sure that you have an average of 10 or less 
> items in each group. Going by that, you'd want a minimum mod of 130k or more. 
> I've also noticed that files approach the "sweet spot" for minimizing 
> overflow without having excessive empty groups when the total size is pretty 
> nearly twice the data size.
>
> The goal can vary according to your situation. I'm personally not all that 
> afraid of making the modulus a little too large, as overflow is a pretty bad 
> performance hit (overflow means at least two disk reads to retrieve your 
> data, "badly" means at least 2 extra disk reads, and I've seen files where 
> that was thousands (this file isn't that bad, but 20% of your data is forcing 
> at least one extra disk read). Empty groups contribute to overhead on a 
> sequential search, so you'd want to consider how often you do a sequential 
> search on a file - usually, that's a pretty inefficient way to retrieve data, 
> but, again, your mileage may vary.
>
> To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
> than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what 
> it looks like there. That'll give you an average of 6 records per group, not 
> unreasonably shallow, and it's likely to be a while before you have to resize 
> again.
>
> > From: cjausti...@hotmail.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 15:23:23 -0500
> > Subject: Re: [U2] RESIZE - dynamic files
> >
> >
> > I guess what I need to know is what's an acceptable % of overflow for a 
> > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
> > the calculated min modulus)
> > I'm still left with ~ 20% of overflow (see below). Is 20% overflow 
> > considered acceptable on average or should I keep tinkering with it to 
> > reach a lower overflow %?
> >
> > Correct me if I'm wrong but it seems the goal here is to REDUCE the 
> > overflow % while not creating too many modulus (groups).
> >
> > Chris
> >
> >
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
> > 21092 overflowed, 1452 badly )
> > Number of records ..   1290469
> > Large record size ..   3267 bytes
> > Number of large records    180
> > Group size .

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
But the total size of your file is up 60%.  Reading in 60% more records in a 
full select of the file is going to be much slower than a few more overflows.


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 2:15 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Dan,

I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
Actual Load has really gone down (as well as overflow). See below for the 
results:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
3957 overflowed, 207 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 37% (actual)
Total size .   836235264 bytes
Total size of record data ..   287394719 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   527323832 bytes
Total space for records    836227072 bytes

My overflow is now @ 2%
My Load is @ 37% (actual)

granted my empty groups are now up to almost 3% but I hope that won't be a big 
factor. How does this look?

Chris


> From: dangf...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 16:57:34 -0400
> Subject: Re: [U2] RESIZE - dynamic files
>
>
> One rule of thumb is to make sure that you have an average of 10 or less 
> items in each group. Going by that, you'd want a minimum mod of 130k or more. 
> I've also noticed that files approach the "sweet spot" for minimizing 
> overflow without having excessive empty groups when the total size is pretty 
> nearly twice the data size.
>
> The goal can vary according to your situation. I'm personally not all that 
> afraid of making the modulus a little too large, as overflow is a pretty bad 
> performance hit (overflow means at least two disk reads to retrieve your 
> data, "badly" means at least 2 extra disk reads, and I've seen files where 
> that was thousands (this file isn't that bad, but 20% of your data is forcing 
> at least one extra disk read). Empty groups contribute to overhead on a 
> sequential search, so you'd want to consider how often you do a sequential 
> search on a file - usually, that's a pretty inefficient way to retrieve data, 
> but, again, your mileage may vary.
>
> To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
> than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what 
> it looks like there. That'll give you an average of 6 records per group, not 
> unreasonably shallow, and it's likely to be a while before you have to resize 
> again.
>
> > From: cjausti...@hotmail.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 15:23:23 -0500
> > Subject: Re: [U2] RESIZE - dynamic files
> >
> >
> > I guess what I need to know is what's an acceptable % of overflow for a 
> > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
> > the calculated min modulus)
> > I'm still left with ~ 20% of overflow (see below). Is 20% overflow 
> > considered acceptable on average or should I keep tinkering with it to 
> > reach a lower overflow %?
> >
> > Correct me if I'm wrong but it seems the goal here is to REDUCE the 
> > overflow % while not creating too many modulus (groups).
> >
> > Chris
> >
> >
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
> > 21092 overflowed, 1452 badly )
> > Number of records ..   1290469
> > Large record size ..   3267 bytes
> > Number of large records    180
> > Group size .   4096 bytes
> > Load factors ...   90% (split), 50% (merge) and 70% (actual)
> > Total size .........   522260480 bytes
> > Total size of record data ..   287400239 bytes
> > Total size of record IDs ...   21508521 bytes
> > Unused space ...  

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

Dan,

I changed the MINIMUM.MODULUS to the value of 23 as you suggested and my 
Actual Load has really gone down (as well as overflow). See below for the 
results:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    23 current ( minimum 23, 5263 empty,
3957 overflowed, 207 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 37% (actual)
Total size .   836235264 bytes
Total size of record data ..   287394719 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   527323832 bytes
Total space for records    836227072 bytes

My overflow is now @ 2% 
My Load is @ 37% (actual)

granted my empty groups are now up to almost 3% but I hope that won't be a big 
factor. How does this look?

Chris


> From: dangf...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 16:57:34 -0400
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> One rule of thumb is to make sure that you have an average of 10 or less 
> items in each group. Going by that, you'd want a minimum mod of 130k or more. 
> I've also noticed that files approach the "sweet spot" for minimizing 
> overflow without having excessive empty groups when the total size is pretty 
> nearly twice the data size.
>  
> The goal can vary according to your situation. I'm personally not all that 
> afraid of making the modulus a little too large, as overflow is a pretty bad 
> performance hit (overflow means at least two disk reads to retrieve your 
> data, "badly" means at least 2 extra disk reads, and I've seen files where 
> that was thousands (this file isn't that bad, but 20% of your data is forcing 
> at least one extra disk read). Empty groups contribute to overhead on a 
> sequential search, so you'd want to consider how often you do a sequential 
> search on a file - usually, that's a pretty inefficient way to retrieve data, 
> but, again, your mileage may vary. 
>  
> To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
> than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what 
> it looks like there. That'll give you an average of 6 records per group, not 
> unreasonably shallow, and it's likely to be a while before you have to resize 
> again.
>  
> > From: cjausti...@hotmail.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 15:23:23 -0500
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > I guess what I need to know is what's an acceptable % of overflow for a 
> > dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
> > the calculated min modulus)
> > I'm still left with ~ 20% of overflow (see below). Is 20% overflow 
> > considered acceptable on average or should I keep tinkering with it to 
> > reach a lower overflow %?
> > 
> > Correct me if I'm wrong but it seems the goal here is to REDUCE the 
> > overflow % while not creating too many modulus (groups). 
> > 
> > Chris
> > 
> > 
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
> > 21092 overflowed, 1452 badly )
> > Number of records ..   1290469
> > Large record size ..   3267 bytes
> > Number of large records    180
> > Group size .   4096 bytes
> > Load factors ...   90% (split), 50% (merge) and 70% (actual)
> > Total size .....   522260480 bytes
> > Total size of record data ..   287400239 bytes
> > Total size of record IDs ...   21508521 bytes
> > Unused space ...   213343528 bytes
> > Total space for records    522252288 bytes
> > 
> > > From: r...@lynden.com
> > > To: u2-users@listserver.u2ug.org
> > > Date: Tue, 3 Jul 2012 13:10:43 -0700
> > > Subject: Re: [U2] RESIZE - dynamic files
> > > 
> > > The split load is not affecting anything here, since it is more than the 
> >

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Dan Fitzgerald

One rule of thumb is to make sure that you have an average of 10 or less items 
in each group. Going by that, you'd want a minimum mod of 130k or more. I've 
also noticed that files approach the "sweet spot" for minimizing overflow 
without having excessive empty groups when the total size is pretty nearly 
twice the data size.
 
The goal can vary according to your situation. I'm personally not all that 
afraid of making the modulus a little too large, as overflow is a pretty bad 
performance hit (overflow means at least two disk reads to retrieve your data, 
"badly" means at least 2 extra disk reads, and I've seen files where that was 
thousands (this file isn't that bad, but 20% of your data is forcing at least 
one extra disk read). Empty groups contribute to overhead on a sequential 
search, so you'd want to consider how often you do a sequential search on a 
file - usually, that's a pretty inefficient way to retrieve data, but, again, 
your mileage may vary. 
 
To me, 20% is too much overflow, and 114 empty groups is trivial; much less 
than 0.2%. I'd be tempted to go to 23 as a minimum Mod, just to see what it 
looks like there. That'll give you an average of 6 records per group, not 
unreasonably shallow, and it's likely to be a while before you have to resize 
again.
 
> From: cjausti...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 15:23:23 -0500
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> I guess what I need to know is what's an acceptable % of overflow for a 
> dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
> the calculated min modulus)
> I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered 
> acceptable on average or should I keep tinkering with it to reach a lower 
> overflow %?
> 
> Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow 
> % while not creating too many modulus (groups). 
> 
> Chris
> 
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
> 21092 overflowed, 1452 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   90% (split), 50% (merge) and 70% (actual)
> Total size .   522260480 bytes
> Total size of record data ..   287400239 bytes
> Total size of record IDs ...   21508521 bytes
> Unused space ...   213343528 bytes
> Total space for records    522252288 bytes
> 
> > From: r...@lynden.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 13:10:43 -0700
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > The split load is not affecting anything here, since it is more than the 
> > actual load.  What your overflow suggests is that you lower the split.load 
> > value to 70$% or below.  You could go ahead and set the merge.load to an 
> > arbitrarily low number ("1"), and it will probably never do a merge, which 
> > would be the same as specifying a minimum.modulus equal to "as large as it 
> > ever gets".  The exception to this is during file creation & clear.file,  
> > when the minimum.modulus value determines the initial disk allocation.  
> > Short of going to an arbitrarily large minimum.modulus, and a very low 
> > split.load, you are going to have some overflow (unless you have sequential 
> > keys & like sized records).
> > 
> > -Rick
> > 
> > -Original Message-
> > From: u2-users-boun...@listserver.u2ug.org 
> > [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> > Sent: Tuesday, July 03, 2012 12:54 PM
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > Using the formula below, and changing the split to 90% I get the following:
> > 
> > File name ..   GENACCTRN_POSTED
> > Pathname ...   GENACCTRN_POSTED
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
> > 22249 overflowed, 1764 badly )
> > Number of records ...

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
The actual load is 70% on your file. The split.load of 90 was set after the 
file was loaded. If you leave it at that value, and add another 100,000 
records, your modulus will not grow, but the number of overflowed groups will. 

Perhaps you need to look at is as "80% not overflowed".  Despite the output, I 
doubt that any of those overflows are that bad. 




-Rick

On Jul 3, 2012, at 1:23 PM, "Chris Austin"  wrote:

> 
> I guess what I need to know is what's an acceptable % of overflow for a 
> dynamic file? For example, when I change the SPLIT LOAD to 90% (while using 
> the calculated min modulus)
> I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered 
> acceptable on average or should I keep tinkering with it to reach a lower 
> overflow %?
> 
> Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow 
> % while not creating too many modulus (groups).
> 
> Chris
> 
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
>21092 overflowed, 1452 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   90% (split), 50% (merge) and 70% (actual)
> Total size .   522260480 bytes
> Total size of record data ..   287400239 bytes
> Total size of record IDs ...   21508521 bytes
> Unused space ...   213343528 bytes
> Total space for records    522252288 bytes
> 
>> From: r...@lynden.com
>> To: u2-users@listserver.u2ug.org
>> Date: Tue, 3 Jul 2012 13:10:43 -0700
>> Subject: Re: [U2] RESIZE - dynamic files
>> 
>> The split load is not affecting anything here, since it is more than the 
>> actual load.  What your overflow suggests is that you lower the split.load 
>> value to 70$% or below.  You could go ahead and set the merge.load to an 
>> arbitrarily low number ("1"), and it will probably never do a merge, which 
>> would be the same as specifying a minimum.modulus equal to "as large as it 
>> ever gets".  The exception to this is during file creation & clear.file,  
>> when the minimum.modulus value determines the initial disk allocation.  
>> Short of going to an arbitrarily large minimum.modulus, and a very low 
>> split.load, you are going to have some overflow (unless you have sequential 
>> keys & like sized records).
>> 
>> -Rick
>> 
>> -Original Message-
>> From: u2-users-boun...@listserver.u2ug.org 
>> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
>> Sent: Tuesday, July 03, 2012 12:54 PM
>> To: u2-users@listserver.u2ug.org
>> Subject: Re: [U2] RESIZE - dynamic files
>> 
>> 
>> Using the formula below, and changing the split to 90% I get the following:
>> 
>> File name ..   GENACCTRN_POSTED
>> Pathname ...   GENACCTRN_POSTED
>> File type ..   DYNAMIC
>> File style and revision    32BIT Revision 12
>> Hashing Algorithm ..   GENERAL
>> No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
>>22249 overflowed, 1764 badly )
>> Number of records ..   1290469
>> Large record size ..   3267 bytes
>> Number of large records    180
>> Group size .   4096 bytes
>> Load factors ...   90% (split), 50% (merge) and 72% (actual)
>> Total size .   519921664 bytes
>> Total size of record data ..   287400591 bytes
>> Total size of record IDs ...   21508497 bytes
>> Unused space ...   211004384 bytes
>> Total space for records    519913472 bytes
>> 
>> How does this look in terms of performance?
>> 
>> My Actual load went down 8% as well as some overflow but it looks like my 
>> load % is still high at 72% I'm wondering if I should raise the 
>> MINIMUM.MODULUS even more
>> since I still have a decent amount of overflow and not many large records.
>> 
>> Chris
>> 
>> 
>>> From: r...@lynden.com
>>> To: u2-users@listserver.u2ug.org
>>> Date: Tue, 3 Jul 2012 10:21:16 -0700
>>> Subject: Re: [U2] RESIZE - dynamic files
>>> 
>>> 

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

I guess what I need to know is what's an acceptable % of overflow for a dynamic 
file? For example, when I change the SPLIT LOAD to 90% (while using the 
calculated min modulus)
I'm still left with ~ 20% of overflow (see below). Is 20% overflow considered 
acceptable on average or should I keep tinkering with it to reach a lower 
overflow %?

Correct me if I'm wrong but it seems the goal here is to REDUCE the overflow % 
while not creating too many modulus (groups). 

Chris


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    105715 current ( minimum 103889, 114 empty,
21092 overflowed, 1452 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 70% (actual)
Total size .   522260480 bytes
Total size of record data ..   287400239 bytes
Total size of record IDs ...   21508521 bytes
Unused space ...   213343528 bytes
Total space for records    522252288 bytes

> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 13:10:43 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> The split load is not affecting anything here, since it is more than the 
> actual load.  What your overflow suggests is that you lower the split.load 
> value to 70$% or below.  You could go ahead and set the merge.load to an 
> arbitrarily low number ("1"), and it will probably never do a merge, which 
> would be the same as specifying a minimum.modulus equal to "as large as it 
> ever gets".  The exception to this is during file creation & clear.file,  
> when the minimum.modulus value determines the initial disk allocation.  Short 
> of going to an arbitrarily large minimum.modulus, and a very low split.load, 
> you are going to have some overflow (unless you have sequential keys & like 
> sized records).
> 
> -Rick
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 12:54 PM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Using the formula below, and changing the split to 90% I get the following:
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
> 22249 overflowed, 1764 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   90% (split), 50% (merge) and 72% (actual)
> Total size .   519921664 bytes
> Total size of record data ..   287400591 bytes
> Total size of record IDs ...   21508497 bytes
> Unused space ...   211004384 bytes
> Total space for records    519913472 bytes
> 
> How does this look in terms of performance? 
> 
> My Actual load went down 8% as well as some overflow but it looks like my 
> load % is still high at 72% I'm wondering if I should raise the 
> MINIMUM.MODULUS even more 
> since I still have a decent amount of overflow and not many large records. 
> 
> Chris
> 
> 
> > From: r...@lynden.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 10:21:16 -0700
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > (record + id / 4096 or 2048)
> > 
> > You need to factor in overhead & the split factor:   (records + ids) * 1.1 
> > * 1.25  / 4096(for 80%) 
> > 
> > If you use a 20% merge factor and a 80% split factor, the file will start 
> > merging unless you delete 60 percent of your records.  If you use 90% split 
> > factor, you will have more overflowed groups.  These numbers refer to the 
> > total amount of data in the file, not to any individual group.
> > 
> > For records of the size that you have, I do not see any advantage to using 
> > a larger, 4096, group size. You will end up with twice the number of 
> > records per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
> > 
> > -Rick
> > 
> &

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
The split load is not affecting anything here, since it is more than the actual 
load.  What your overflow suggests is that you lower the split.load value to 
70$% or below.  You could go ahead and set the merge.load to an arbitrarily low 
number ("1"), and it will probably never do a merge, which would be the same as 
specifying a minimum.modulus equal to "as large as it ever gets".  The 
exception to this is during file creation & clear.file,  when the 
minimum.modulus value determines the initial disk allocation.  Short of going 
to an arbitrarily large minimum.modulus, and a very low split.load, you are 
going to have some overflow (unless you have sequential keys & like sized 
records).

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 12:54 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


Using the formula below, and changing the split to 90% I get the following:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
22249 overflowed, 1764 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 72% (actual)
Total size .   519921664 bytes
Total size of record data ..   287400591 bytes
Total size of record IDs ...   21508497 bytes
Unused space ...   211004384 bytes
Total space for records    519913472 bytes

How does this look in terms of performance? 

My Actual load went down 8% as well as some overflow but it looks like my load 
% is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even 
more 
since I still have a decent amount of overflow and not many large records. 

Chris


> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 10:21:16 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> (record + id / 4096 or 2048)
> 
> You need to factor in overhead & the split factor:   (records + ids) * 1.1 * 
> 1.25  / 4096(for 80%) 
> 
> If you use a 20% merge factor and a 80% split factor, the file will start 
> merging unless you delete 60 percent of your records.  If you use 90% split 
> factor, you will have more overflowed groups.  These numbers refer to the 
> total amount of data in the file, not to any individual group.
> 
> For records of the size that you have, I do not see any advantage to using a 
> larger, 4096, group size. You will end up with twice the number of records 
> per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
> 
> -Rick
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 9:48 AM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
> 28229 overflowed, 2485 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   500600832 bytes
> Total size of record data ..   287035391 bytes
> Total size of record IDs ...   21508449 bytes
> Unused space ...   192048800 bytes
> Total space for records    500592640 bytes
> 
> Using the record above, how would I calculate the following?
> 
> 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
> current number)?
> 2) SPLIT - would 90% seem about right?
> 3) MERGE - would 20% seem about right? 
> 4) Large Record Size - does 3276 seem right? 
> 5) Group Size - should I be using 4096?
> 
> I'm just a bit confused as to how to set these, I saw the formula to 
> calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I 
> always get a lower number
> than my current modulus.. 
> 
> I also saw where it said to simply take your current modulus #

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Jeff Schasny
I would recommend that if you intend to do resizing on a regular basis 
an you want to improve the performance of the file you might consider 
resizing the file to a static file type so that you can have more 
control over the hashing algorithm, separation and modulo.


Chris Austin wrote:

Using the formula below, and changing the split to 90% I get the following:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
22249 overflowed, 1764 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 72% (actual)
Total size .   519921664 bytes
Total size of record data ..   287400591 bytes
Total size of record IDs ...   21508497 bytes
Unused space ...   211004384 bytes
Total space for records    519913472 bytes

How does this look in terms of performance? 

My Actual load went down 8% as well as some overflow but it looks like my load % is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even more 
since I still have a decent amount of overflow and not many large records. 


Chris


  

From: r...@lynden.com
To: u2-users@listserver.u2ug.org
Date: Tue, 3 Jul 2012 10:21:16 -0700
Subject: Re: [U2] RESIZE - dynamic files

(record + id / 4096 or 2048)

You need to factor in overhead & the split factor:   (records + ids) * 1.1 * 1.25  / 4096(for 80%) 


If you use a 20% merge factor and a 80% split factor, the file will start 
merging unless you delete 60 percent of your records.  If you use 90% split 
factor, you will have more overflowed groups.  These numbers refer to the total 
amount of data in the file, not to any individual group.

For records of the size that you have, I do not see any advantage to using a 
larger, 4096, group size. You will end up with twice the number of records per 
group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 9:48 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

Using the record above, how would I calculate the following?

1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
current number)?
2) SPLIT - would 90% seem about right?
3) MERGE - would 20% seem about right? 
4) Large Record Size - does 3276 seem right? 
5) Group Size - should I be using 4096?


I'm just a bit confused as to how to set these, I saw the formula to calculate 
the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a 
lower number
than my current modulus.. 


I also saw where it said to simply take your current modulus # and add 10-20% 
and set the MINIMUM.MODULUS based on that..

Based on the table above I'm just trying to get an idea of what these should be 
set at.

Thanks,

Chris




From: cjausti...@hotmail.com
To: u2-users@listserver.u2ug.org
Date: Tue, 3 Jul 2012 10:28:17 -0500
Subject: Re: [U2] RESIZE - dynamic files


Doug,

When I do the math I come up with a different # (see below):

File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
26225 overflowed, 1441 badly )
Number of records ..   1157122
Large record size ..   2036 bytes
Number of large records    576
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   44

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

Using the formula below, and changing the split to 90% I get the following:

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    103889 current ( minimum 103889, 114 empty,
22249 overflowed, 1764 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   90% (split), 50% (merge) and 72% (actual)
Total size .   519921664 bytes
Total size of record data ..   287400591 bytes
Total size of record IDs ...   21508497 bytes
Unused space ...   211004384 bytes
Total space for records    519913472 bytes

How does this look in terms of performance? 

My Actual load went down 8% as well as some overflow but it looks like my load 
% is still high at 72% I'm wondering if I should raise the MINIMUM.MODULUS even 
more 
since I still have a decent amount of overflow and not many large records. 

Chris


> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 10:21:16 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> (record + id / 4096 or 2048)
> 
> You need to factor in overhead & the split factor:   (records + ids) * 1.1 * 
> 1.25  / 4096(for 80%) 
> 
> If you use a 20% merge factor and a 80% split factor, the file will start 
> merging unless you delete 60 percent of your records.  If you use 90% split 
> factor, you will have more overflowed groups.  These numbers refer to the 
> total amount of data in the file, not to any individual group.
> 
> For records of the size that you have, I do not see any advantage to using a 
> larger, 4096, group size. You will end up with twice the number of records 
> per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
> 
> -Rick
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 9:48 AM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
> 28229 overflowed, 2485 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   500600832 bytes
> Total size of record data ..   287035391 bytes
> Total size of record IDs ...   21508449 bytes
> Unused space ...   192048800 bytes
> Total space for records    500592640 bytes
> 
> Using the record above, how would I calculate the following?
> 
> 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
> current number)?
> 2) SPLIT - would 90% seem about right?
> 3) MERGE - would 20% seem about right? 
> 4) Large Record Size - does 3276 seem right? 
> 5) Group Size - should I be using 4096?
> 
> I'm just a bit confused as to how to set these, I saw the formula to 
> calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I 
> always get a lower number
> than my current modulus.. 
> 
> I also saw where it said to simply take your current modulus # and add 10-20% 
> and set the MINIMUM.MODULUS based on that..
> 
> Based on the table above I'm just trying to get an idea of what these should 
> be set at.
> 
> Thanks,
> 
> Chris
> 
> 
> > From: cjausti...@hotmail.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 10:28:17 -0500
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > Doug,
> > 
> > When I do the math I come up with a different # (see below):
> > 
> > File name ..   TEST_FILE
> > Pathname ...   TEST_FILE
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
> > 26225 overflowed, 1441 badly )
> > Number of records .

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin


Doug,

The data is growing over time with this file. Does that mean I should ignore 
the formula? Or should I still use a lower MINIMUM.MODULO than the
actual modulo #..

Is the idea to reduce overflow by lowering the split? What is this 'overflow' 
referring to?

> > 2) SPLIT - would 90% seem about right?
> >
> Depends on the history of the file.  Is the data growing over time?  The
> way the file looks now the split should be reduced because you have 31% in
> overflow.

So basically don't spend much time worrying about large record size?

> 4) Large Record Size - does 3276 seem right?
> >
> Can be calculated with a lot of effort, but yield little gain.

This seems like a moot point as well, as long as the ratio in regards to the 
MINIMUM.MODULO is set proportionally?

> 5) Group Size - should I be using 4096?
> >
> You have two group sizes on dynamic files 2048 and 4096.  If you lower it
> you need to double your modulo, roughly.  If you keep it the same you need
> to increase your modulo because 31% of your file is in overflow.

Chris


  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

So for this example what would be a good SPLIT level and what would be a good 
MERGE level to use? It was my understanding that I wanted to lower my merge to 
something below 50% and
increase the split to reduce splitting.

Chris


> From: r...@lynden.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 10:21:16 -0700
> Subject: Re: [U2] RESIZE - dynamic files
> 
> (record + id / 4096 or 2048)
> 
> You need to factor in overhead & the split factor:   (records + ids) * 1.1 * 
> 1.25  / 4096(for 80%) 
> 
> If you use a 20% merge factor and a 80% split factor, the file will start 
> merging unless you delete 60 percent of your records.  If you use 90% split 
> factor, you will have more overflowed groups.  These numbers refer to the 
> total amount of data in the file, not to any individual group.
> 
> For records of the size that you have, I do not see any advantage to using a 
> larger, 4096, group size. You will end up with twice the number of records 
> per group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.
> 
> -Rick
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: Tuesday, July 03, 2012 9:48 AM
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> File name ..   GENACCTRN_POSTED
> Pathname ...   GENACCTRN_POSTED
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
> 28229 overflowed, 2485 badly )
> Number of records ..   1290469
> Large record size ..   3267 bytes
> Number of large records    180
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   500600832 bytes
> Total size of record data ..   287035391 bytes
> Total size of record IDs ...   21508449 bytes
> Unused space ...   192048800 bytes
> Total space for records    500592640 bytes
> 
> Using the record above, how would I calculate the following?
> 
> 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
> current number)?
> 2) SPLIT - would 90% seem about right?
> 3) MERGE - would 20% seem about right? 
> 4) Large Record Size - does 3276 seem right? 
> 5) Group Size - should I be using 4096?
> 
> I'm just a bit confused as to how to set these, I saw the formula to 
> calculate the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I 
> always get a lower number
> than my current modulus.. 
> 
> I also saw where it said to simply take your current modulus # and add 10-20% 
> and set the MINIMUM.MODULUS based on that..
> 
> Based on the table above I'm just trying to get an idea of what these should 
> be set at.
> 
> Thanks,
> 
> Chris
> 
> 
> > From: cjausti...@hotmail.com
> > To: u2-users@listserver.u2ug.org
> > Date: Tue, 3 Jul 2012 10:28:17 -0500
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > 
> > Doug,
> > 
> > When I do the math I come up with a different # (see below):
> > 
> > File name ..   TEST_FILE
> > Pathname ...   TEST_FILE
> > File type ..   DYNAMIC
> > File style and revision    32BIT Revision 12
> > Hashing Algorithm ..   GENERAL
> > No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
> > 26225 overflowed, 1441 badly )
> > Number of records ..   1157122
> > Large record size ..   2036 bytes
> > Number of large records    576
> > Group size .   4096 bytes
> > Load factors ...   80% (split), 50% (merge) and 80% (actual)
> > Total size .   449605632 bytes
> > Total size of record data ..   258687736 bytes
> > Total size of record IDs ...   19283300 bytes
> > Unused space ...   171626404 bytes
> > Total space for records    449597440 bytes
> > 
> > -- 
> > 258,687,736 bytes - Total size of record data
> > 19,283,300 bytes - Total size of record IDs
> > ===
> > 277,971,036 bytes (record + id's)
> > 
> > 277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
> > -- 
> > 
> > 68,063 is less than the current modulus of 82,850. Something with this 
> > formula doesn't seem right because if I use that

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Rick Nuckolls
(record + id / 4096 or 2048)

You need to factor in overhead & the split factor:   (records + ids) * 1.1 * 
1.25  / 4096(for 80%) 

If you use a 20% merge factor and a 80% split factor, the file will start 
merging unless you delete 60 percent of your records.  If you use 90% split 
factor, you will have more overflowed groups.  These numbers refer to the total 
amount of data in the file, not to any individual group.

For records of the size that you have, I do not see any advantage to using a 
larger, 4096, group size. You will end up with twice the number of records per 
group vs 2048 (~ 13 vs ~ 7 ), and a little slower keyed access.

-Rick

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Tuesday, July 03, 2012 9:48 AM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

Using the record above, how would I calculate the following?

1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
current number)?
2) SPLIT - would 90% seem about right?
3) MERGE - would 20% seem about right? 
4) Large Record Size - does 3276 seem right? 
5) Group Size - should I be using 4096?

I'm just a bit confused as to how to set these, I saw the formula to calculate 
the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a 
lower number
than my current modulus.. 

I also saw where it said to simply take your current modulus # and add 10-20% 
and set the MINIMUM.MODULUS based on that..

Based on the table above I'm just trying to get an idea of what these should be 
set at.

Thanks,

Chris


> From: cjausti...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 10:28:17 -0500
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Doug,
> 
> When I do the math I come up with a different # (see below):
> 
> File name ..   TEST_FILE
> Pathname ...   TEST_FILE
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
> 26225 overflowed, 1441 badly )
> Number of records ..   1157122
> Large record size ..   2036 bytes
> Number of large records    576
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   449605632 bytes
> Total size of record data ..   258687736 bytes
> Total size of record IDs ...   19283300 bytes
> Unused space ...   171626404 bytes
> Total space for records    449597440 bytes
> 
> -- 
> 258,687,736 bytes - Total size of record data
> 19,283,300 bytes - Total size of record IDs
> ===
> 277,971,036 bytes (record + id's)
> 
> 277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
> -- 
> 
> 68,063 is less than the current modulus of 82,850. Something with this 
> formula doesn't seem right because if I use that formula I always calculate a 
> minimum modulus of less than the current modulus.
> 
> Thanks,
> 
> Chris
> 
> 
> 
> > Date: Mon, 2 Jul 2012 16:08:16 -0600
> > From: dave...@gmail.com
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > Hi Chris:
> > 
> > You cannot get away with not resizing dynamic files in my experience.  The
> > files do not split and merge like we are led to believe.  The separator is
> > not used on dynamic files.  Your Universe file is badly sized.  The math
> > below will get you reasonably file size.
> > 
> > Let's do the math:
> > 
> > 258687736 (Record Size)
> > 192283300 (Key Size)
> > 
> > 450,971,036 (Data and Key Size)
> > 
> > 4096 (Group Size)
> > - 12   (32 Bit Overhe

Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Doug Averch
See comment interspersed...

Using the record above, how would I calculate the following?
>
> 1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the
> current number)?
>
Should be less the the current size, if you want the file to merge


> 2) SPLIT - would 90% seem about right?
>
Depends on the history of the file.  Is the data growing over time?  The
way the file looks now the split should be reduced because you have 31% in
overflow.

3) MERGE - would 20% seem about right?
>
Won't be used on a growth file, so the file history is important.

4) Large Record Size - does 3276 seem right?
>
Can be calculated with a lot of effort, but yield little gain.

5) Group Size - should I be using 4096?
>
You have two group sizes on dynamic files 2048 and 4096.  If you lower it
you need to double your modulo, roughly.  If you keep it the same you need
to increase your modulo because 31% of your file is in overflow.

>
>
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

No worries Doug. I'm just wondering if the calculation makes sense (if we use 
the example below):

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

FORMULA -> (287,035,391+21,508,449) / (4,084) = 75,549  MINIMUM.MODULUS

The question I have is whether 75,549 makes sense for this record. I thought 
the MINIMUM.MODULUS was supposed to be bigger than the number of groups (92,776 
in this case)?

Chris


> Date: Tue, 3 Jul 2012 11:04:53 -0600
> From: dave...@gmail.com
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Yep, I added an extra 2 in the key value.  Oh, the perils of cut and
> paste...
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Doug Averch
Yep, I added an extra 2 in the key value.  Oh, the perils of cut and
paste...
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

File name ..   GENACCTRN_POSTED
Pathname ...   GENACCTRN_POSTED
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    92776 current ( minimum 31, 89 empty,
28229 overflowed, 2485 badly )
Number of records ..   1290469
Large record size ..   3267 bytes
Number of large records    180
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   500600832 bytes
Total size of record data ..   287035391 bytes
Total size of record IDs ...   21508449 bytes
Unused space ...   192048800 bytes
Total space for records    500592640 bytes

Using the record above, how would I calculate the following?

1) MINIMUM.MODULUS (Is there a formula to use or should I add 20% to the 
current number)?
2) SPLIT - would 90% seem about right?
3) MERGE - would 20% seem about right? 
4) Large Record Size - does 3276 seem right? 
5) Group Size - should I be using 4096?

I'm just a bit confused as to how to set these, I saw the formula to calculate 
the MINIMUM.MODULUS which is (record + id / 4096 or 2048) but I always get a 
lower number
than my current modulus.. 

I also saw where it said to simply take your current modulus # and add 10-20% 
and set the MINIMUM.MODULUS based on that..

Based on the table above I'm just trying to get an idea of what these should be 
set at.

Thanks,

Chris


> From: cjausti...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Tue, 3 Jul 2012 10:28:17 -0500
> Subject: Re: [U2] RESIZE - dynamic files
> 
> 
> Doug,
> 
> When I do the math I come up with a different # (see below):
> 
> File name ..   TEST_FILE
> Pathname ...   TEST_FILE
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
> 26225 overflowed, 1441 badly )
> Number of records ..   1157122
> Large record size ..   2036 bytes
> Number of large records    576
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   449605632 bytes
> Total size of record data ..   258687736 bytes
> Total size of record IDs ...   19283300 bytes
> Unused space ...   171626404 bytes
> Total space for records    449597440 bytes
> 
> -- 
> 258,687,736 bytes - Total size of record data
> 19,283,300 bytes - Total size of record IDs
> ===
> 277,971,036 bytes (record + id's)
> 
> 277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
> -- 
> 
> 68,063 is less than the current modulus of 82,850. Something with this 
> formula doesn't seem right because if I use that formula I always calculate a 
> minimum modulus of less than the current modulus.
> 
> Thanks,
> 
> Chris
> 
> 
> 
> > Date: Mon, 2 Jul 2012 16:08:16 -0600
> > From: dave...@gmail.com
> > To: u2-users@listserver.u2ug.org
> > Subject: Re: [U2] RESIZE - dynamic files
> > 
> > Hi Chris:
> > 
> > You cannot get away with not resizing dynamic files in my experience.  The
> > files do not split and merge like we are led to believe.  The separator is
> > not used on dynamic files.  Your Universe file is badly sized.  The math
> > below will get you reasonably file size.
> > 
> > Let's do the math:
> > 
> > 258687736 (Record Size)
> > 192283300 (Key Size)
> > 
> > 450,971,036 (Data and Key Size)
> > 
> > 4096 (Group Size)
> > - 12   (32 Bit Overhead)
> > 
> > 4084 Usable Space
> > 
> > 450971036/4084 = Minimum Modulo 110424 (Prime is 110431)
> > 
> > 
> > [ad]
> > I hate doing this math all of the time.  I have a reasonably priced resize
> > program called XLr8Resizer for $99.00 to do this for me.
> > [/ad]
> > 
> > Regards,
> > Doug
> > www.u2logic.com/tools.html
> > "XLr8Resizer for the rest of us"
> > ___
> > U2-Users mailing list
> > U2-Users@listserver.u2ug.org
> > http://listserver.u2ug.org/mailman/listinfo/u2-users
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Chris Austin

Doug,

When I do the math I come up with a different # (see below):

File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
26225 overflowed, 1441 badly )
Number of records ..   1157122
Large record size ..   2036 bytes
Number of large records    576
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   449605632 bytes
Total size of record data ..   258687736 bytes
Total size of record IDs ...   19283300 bytes
Unused space ...   171626404 bytes
Total space for records    449597440 bytes

-- 
258,687,736 bytes - Total size of record data
19,283,300 bytes - Total size of record IDs
===
277,971,036 bytes (record + id's)

277,971,036 / 4,084 = 68,063 bytes (minimum modulus)
-- 

68,063 is less than the current modulus of 82,850. Something with this formula 
doesn't seem right because if I use that formula I always calculate a 
minimum modulus of less than the current modulus.

Thanks,

Chris



> Date: Mon, 2 Jul 2012 16:08:16 -0600
> From: dave...@gmail.com
> To: u2-users@listserver.u2ug.org
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Hi Chris:
> 
> You cannot get away with not resizing dynamic files in my experience.  The
> files do not split and merge like we are led to believe.  The separator is
> not used on dynamic files.  Your Universe file is badly sized.  The math
> below will get you reasonably file size.
> 
> Let's do the math:
> 
> 258687736 (Record Size)
> 192283300 (Key Size)
> 
> 450,971,036 (Data and Key Size)
> 
> 4096 (Group Size)
> - 12   (32 Bit Overhead)
> 
> 4084 Usable Space
> 
> 450971036/4084 = Minimum Modulo 110424 (Prime is 110431)
> 
> 
> [ad]
> I hate doing this math all of the time.  I have a reasonably priced resize
> program called XLr8Resizer for $99.00 to do this for me.
> [/ad]
> 
> Regards,
> Doug
> www.u2logic.com/tools.html
> "XLr8Resizer for the rest of us"
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-03 Thread Brett Callacher
Almost.  Though the file will look after itself, it may not do so very well.  
Dynamic files, for best performance, do sometimes need periodic resizing.  
Having said that it is true that some never resize Dynamic files.

If the minimum modulo is much lower than the actual, then this will cause 
constant splits to occur if the file is constantly growing.  The 80% actual 
load is further indication of this.  What can be even worse is if the file then 
shrinks dramatically in this case as very intensive merges will takes place - 
not desirable if you expect the file to grow again.

In this case I would choose a new modulo greater than the actual - how much 
bigger depends on the rate of growth expected.  That is with the current 
separation - the best separation you will only determine by examining the size 
of the records.

"Martin Phillips"  wrote in message 
news:<00f601cd588c$cd3d1310$67b73930$@ladybridge.com>...
> Hi Chris,
> 
> The whole point of dynamic files is that you don't do RESIZE. The file will 
> look after itself, automatically responding to
> variations in the volume of data.
> 
> There are "knobs to twiddle" but in most cases they can safely be left at 
> their defaults. A dynamic file will never perform as well
> as a perfectly tuned static file but they are a heck of a lot better than 
> typical static files that haven't been reconfigured for
> ages.
> 
> 
> Martin Phillips
> Ladybridge Systems Ltd
> 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> +44 (0)1604-709200
> 
> 
> 
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: 02 July 2012 20:22
> To: u2-users@listserver.u2ug.org
> Subject: [U2] RESIZE - dynamic files
> 
> 
> I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
> example I have a file called 'TEST_FILE'
> with the following:
> 
> 01 ANALYZE.FILE TEST_FILE
> File name ..   TEST_FILE
> Pathname ...   TEST_FILE
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    83261 current ( minimum 31 )
> Large record size ..   3267 bytes
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   450613248 bytes
> 
> How do you calculate what the modulus and separation should be? I can't use 
> HASH.HELP on a type 30 file to see the recommended
> settings
> so I was wondering how best you figure out the file RESIZE.
> 
> Thanks,
> 
> Chris
> 
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
> 
This message contains information that may be privileged or confidential and is 
the property of GPM Development Ltd. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient ,you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.

This e-mail was sent to you by GPM Development Ltd.  We are incorporated under 
the laws of England and Wales (company no. 2292156 and VAT registration no. 523 
5622 63).  Our registered office is 6th Floor, AMP House, Croydon, Surrey CR0 
2LX.
 

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Rick Nuckolls
Chris,

I second the thought that, because of the splitting and merging of groups, it 
can be a waste of effort to overwork the "sizing" of a dynamic file.

One problem with your TEST_FILE below is that the Large Record Size is spec'ed 
at less than 50% of the group size.  Each record that is larger than the "large 
record size" is given at least one full sized buffer in the overflow file, so a 
record of 2037 bytes, in your example, would occupy 4096 bytes of space.  The 
ID and pointer is left in the primary data group.  It appears that your records 
average 250 bytes, so this probably is not a large factor, but that would also 
suggest that you stick to a GROUP.SIZE of 1 (2048 bytes) rather than 2.  Btw, 
each of your 576 large records probably counts towards the "overflowed badly" 
column, though, from an access point of view, the group might be in optimal 
shape.

The effective modulo of a dynamic file is based on the space used by the 
not(large records), but the " Total size of record data" includes the full 
buffer size of the overflow records, I believe, and so should not be used to 
compute the total size of your data. For record sizes like you have, I would 
compute the total of the ids+records, add about 10% for overhead, divide by the 
group size (2048, if you use the default), multiply by 1.25 (allow for the 80% 
splitting factor), and then set the minimum modulus to the next larger prime 
number.

In the example below, you can see 50 large records in a single group of a 
dynamic file, but only the id's are in the primary buffer.  If you do the math, 
you will find that each 1001 record is using up a  4096 byte overflow buffer. 


File name ..   BIGD
Pathname ...   BIGD
File type ..   DYNAMIC
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    1 current ( minimum 1, 0 empty,
1 overflowed, 1 badly )
Number of records ..   50
Large record size ..   1000 bytes
Number of large records    50
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 30% (actual)
Total size .   217088 bytes
Total size of record data ..   205466 bytes
Total size of record IDs ...   534 bytes
Unused space ...   2896 bytes
Total space for records    208896 bytes

LIST BIGD TOTAL EVAL "LEN(@ID)" TOTAL EVAL "LEN(@RECORD)" DET.SUP 18:03:29  07-
02-12  PAGE 1
LEN(@ID)..LEN(@RECORD)

==
134   50050

50 records listed.

Note that if I stuck to the defaults and used sequential ids, I would have 
saved more than 1/2 of the disk space, but still have used 150% of the total 
id+record size.

File name ..   BIGD
Pathname ...   BIGD
File type ..   DYNAMIC
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    31 current ( minimum 1, 3 empty,
4 overflowed, 0 badly )
Number of records ..   50
Large record size ..   1628 bytes
Number of large records    0
Group size .   2048 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   79872 bytes
Total size of record data ..   50709 bytes
Total size of record IDs ...   91 bytes
Unused space ...   24976 bytes
Total space for records    75776 bytes

Rick Nuckolls
Lynden Inc


-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: Monday, July 02, 2012 2:07 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] RESIZE - dynamic files


The dynamic file I'm working with is below. What do 'overflowed' and 'badly' 
refer to under MODULUS? Is the goal of the RESIZE to eliminate that
overflow? Any ideas what I should change to achieve this?


File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
26225 overflowed, 1441 badly )
Number of records ..   1157122
Large record size ..   2036 bytes
Number of large records    576
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   449605632 bytes
Total size of record data ..   258687736 bytes
Total size of record IDs ...   19283300 bytes
Unused space ...   171626404 bytes
Total space for records    449597440 bytes

Thanks,

Chris


> From: cjausti...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Mon, 2 Jul 2012 14:

Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Doug Averch
Hi Chris:

You cannot get away with not resizing dynamic files in my experience.  The
files do not split and merge like we are led to believe.  The separator is
not used on dynamic files.  Your Universe file is badly sized.  The math
below will get you reasonably file size.

Let's do the math:

258687736 (Record Size)
192283300 (Key Size)

450,971,036 (Data and Key Size)

4096 (Group Size)
- 12   (32 Bit Overhead)

4084 Usable Space

450971036/4084 = Minimum Modulo 110424 (Prime is 110431)


[ad]
I hate doing this math all of the time.  I have a reasonably priced resize
program called XLr8Resizer for $99.00 to do this for me.
[/ad]

Regards,
Doug
www.u2logic.com/tools.html
"XLr8Resizer for the rest of us"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

I guess my main question is regarding the 'overflow' and 'badly' #'s which you 
can see when you do an ANALYZE.FILE  STATISTICS. 
Is the goal not to have any overflow #? And what is 'badly'?

After playing around with RESIZE on this file, I was able to come up with the 
following:

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 24
 82850 current ( minimum 24, 104 empty,26225 
overflowed, 1441 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 1000
   82850 current ( minimum 1000, 104 empty,26225 
overflowed, 1441 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 99420
   99420 current ( minimum 99420, 182 empty,18725 
overflowed, 1054 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 119304
  119304 current ( minimum 119304, 247 empty, 9511 
overflowed, 406 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 143165
  143165 current ( minimum 143165, 1328 empty,4333 
overflowed, 259 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 171799
  171799 current ( minimum 171799, 3814 empty,3063 
overflowed, 237 badly )

RESIZE TEST_FILE 30 GROUP.SIZE 2 MINIMUM.MODULUS 223339
  223339 current ( minimum 223339, 9215 empty,1810 
overflowed, 222 badly )

As you can see as I increase my MINIMUM.MODULUS, my 'overflowed' and 'badly' 
#'s go down. Is this the goal when tuning a dynamic file?

Chris


> From: martinphill...@ladybridge.com
> To: u2-users@listserver.u2ug.org
> Date: Mon, 2 Jul 2012 20:56:40 +0100
> Subject: Re: [U2] RESIZE - dynamic files
> 
> Hi Chris,
> 
> The whole point of dynamic files is that you don't do RESIZE. The file will 
> look after itself, automatically responding to
> variations in the volume of data.
> 
> There are "knobs to twiddle" but in most cases they can safely be left at 
> their defaults. A dynamic file will never perform as well
> as a perfectly tuned static file but they are a heck of a lot better than 
> typical static files that haven't been reconfigured for
> ages.
> 
> 
> Martin Phillips
> Ladybridge Systems Ltd
> 17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
> +44 (0)1604-709200
> 
> 
> 
> 
> -Original Message-
> From: u2-users-boun...@listserver.u2ug.org 
> [mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
> Sent: 02 July 2012 20:22
> To: u2-users@listserver.u2ug.org
> Subject: [U2] RESIZE - dynamic files
> 
> 
> I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
> example I have a file called 'TEST_FILE'
> with the following:
> 
> 01 ANALYZE.FILE TEST_FILE
> File name ..   TEST_FILE
> Pathname ...   TEST_FILE
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    83261 current ( minimum 31 )
> Large record size ..   3267 bytes
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   450613248 bytes
> 
> How do you calculate what the modulus and separation should be? I can't use 
> HASH.HELP on a type 30 file to see the recommended
> settings
> so I was wondering how best you figure out the file RESIZE.
> 
> Thanks,
> 
> Chris
> 
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
> 
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Dan Fitzgerald

Group size appears adequate (although anytime anything hashes into the group(s) 
with the largest record [3267b], you'll split: 3267 is 79.8% of 4096, so if you 
have a lot of records up in the 3K range, you may want to increase group size 
and decrease min modulus accordingly), but the minimum modulus should be a 
prime north of the current modulus, with a padding factor based on growth 
expectations. The sweet spot is where you have enough data in each group to 
avoid merging (I'd argue that 50% is a bit high for the merge; but that's 
because I'm unafraid of unused space, while I'm averse to file maintenance 
overhead), but not so much that you do a lot of splitting. You should do a 
count on the number of records, too. It almost never makes sense to have the 
modulus exceed the number of records by a substantial percentage.
 
So, you should increase minimum modulus to 83267 or higher, unless you double 
the group size to 8K, in which case something around 50K as a modulus sounds 
good. I'd take the merge down a little, to maybe 30% or even less, and maybe 
knock the split up a bit - say, 90% - to cut down on the splitting.
 
> From: cjausti...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Mon, 2 Jul 2012 14:55:21 -0500
> Subject: [U2] RESIZE - dynamic files
> 
> 
> I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
> example I have a file called 'TEST_FILE'
> with the following:
> 
> 01 ANALYZE.FILE TEST_FILE
> File name ..   TEST_FILE
> Pathname ...   TEST_FILE
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    83261 current ( minimum 31 )
> Large record size ..   3267 bytes
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   450613248 bytes
> 
> How
>  do you calculate what the modulus and separation should be? I can't use
>  HASH.HELP on a type 30 file to see the recommended settings
> so I was wondering how best you figure out the file RESIZE.
> 
> Thanks,
> 
> Chris   
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

The dynamic file I'm working with is below. What do 'overflowed' and 'badly' 
refer to under MODULUS? Is the goal of the RESIZE to eliminate that
overflow? Any ideas what I should change to achieve this?


File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    82850 current ( minimum 24, 104 empty,
26225 overflowed, 1441 badly )
Number of records ..   1157122
Large record size ..   2036 bytes
Number of large records    576
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   449605632 bytes
Total size of record data ..   258687736 bytes
Total size of record IDs ...   19283300 bytes
Unused space ...   171626404 bytes
Total space for records    449597440 bytes

Thanks,

Chris


> From: cjausti...@hotmail.com
> To: u2-users@listserver.u2ug.org
> Date: Mon, 2 Jul 2012 14:55:21 -0500
> Subject: [U2] RESIZE - dynamic files
> 
> 
> I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
> example I have a file called 'TEST_FILE'
> with the following:
> 
> 01 ANALYZE.FILE TEST_FILE
> File name ..   TEST_FILE
> Pathname ...   TEST_FILE
> File type ..   DYNAMIC
> File style and revision    32BIT Revision 12
> Hashing Algorithm ..   GENERAL
> No. of groups (modulus)    83261 current ( minimum 31 )
> Large record size ..   3267 bytes
> Group size .   4096 bytes
> Load factors ...   80% (split), 50% (merge) and 80% (actual)
> Total size .   450613248 bytes
> 
> How
>  do you calculate what the modulus and separation should be? I can't use
>  HASH.HELP on a type 30 file to see the recommended settings
> so I was wondering how best you figure out the file RESIZE.
> 
> Thanks,
> 
> Chris   
> ___
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users
  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE - dynamic files

2012-07-02 Thread Martin Phillips
Hi Chris,

The whole point of dynamic files is that you don't do RESIZE. The file will 
look after itself, automatically responding to
variations in the volume of data.

There are "knobs to twiddle" but in most cases they can safely be left at their 
defaults. A dynamic file will never perform as well
as a perfectly tuned static file but they are a heck of a lot better than 
typical static files that haven't been reconfigured for
ages.


Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton NN4 6DB, England
+44 (0)1604-709200




-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Chris Austin
Sent: 02 July 2012 20:22
To: u2-users@listserver.u2ug.org
Subject: [U2] RESIZE - dynamic files


I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How do you calculate what the modulus and separation should be? I can't use 
HASH.HELP on a type 30 file to see the recommended
settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


[U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How
 do you calculate what the modulus and separation should be? I can't use
 HASH.HELP on a type 30 file to see the recommended settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris 
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


[U2] RESIZE - dynamic files

2012-07-02 Thread Chris Austin

I was wondering if anyone had instructions on RESIZE with a dynamic file? For 
example I have a file called 'TEST_FILE'
with the following:

01 ANALYZE.FILE TEST_FILE
File name ..   TEST_FILE
Pathname ...   TEST_FILE
File type ..   DYNAMIC
File style and revision    32BIT Revision 12
Hashing Algorithm ..   GENERAL
No. of groups (modulus)    83261 current ( minimum 31 )
Large record size ..   3267 bytes
Group size .   4096 bytes
Load factors ...   80% (split), 50% (merge) and 80% (actual)
Total size .   450613248 bytes

How do you calculate what the modulus and separation should be? I can't use 
HASH.HELP on a type 30 file to see the recommended settings
so I was wondering how best you figure out the file RESIZE.

Thanks,

Chris

  
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Timothy Snyder
[EMAIL PROTECTED] wrote on 06/12/2006 12:57:03 PM:

> What does the guide -r option do ?
> 
>   We have been using the -a option.

The -r option sends guide output to a hashed file. This makes it very easy 
to select for files that are undersized, or that have corruption.  So I'll 
often do a CREATE.FILE DATA UDT_GUIDE 101, then edit the VOC entry of 
UDT_GUIDE so attribute 3 points to @UDTHOME/sys/D_UDT_GUIDE.  Then I can 
do something like this from ECL:
  !guide /some_dir/some_file -na -ne -ns -r UDT_GUIDE

This will create a record in UDT_GUIDE keyed as /some_dir/some_file.  With 
that information for all of your files, you can do something like this:
  list UDT_GUIDE WITH STATUS LIKE ...2 (to find files with level 2 
overflow)
  list UDT_GUIDE WITH STATUS LIKE Err... (to file files with corruption)
  list UDT_GUIDE MAXSIZ AVGSIZ DEVSIZ COUNT AVGKEY (to get the info for 
the dynamic file sizing calculations)

It's SO much easier than writing code to parse through the text output of 
guide.

Tim Snyder
Consulting I/T Specialist , U2 Professional Services
North American Lab Services
DB2 Information Management, IBM Software Group
717-545-6403
[EMAIL PROTECTED]
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


RE: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Dave S
We have used the product here before.
   
  I think our license on it lapsed.
   
  I have been using the guide for several years instead of using fast.

"Hennessey, Mark F." <[EMAIL PROTECTED]> wrote:
  Should I put [AD] in the subject line for an "unsolicited testimonial"? :)

The best advice I can give you is to buy a product called FAST:

http://www.fitzlong.com/

A great tool for analyzing and resizing files, be they dynamic or standard 
hashed files. Excellent support from excellent people at a great price.

There might be more expensive utilities out there, but I can't imagine that 
there is anything better.

Mark Hennessey
(a FAST Customer since 2002)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Dave S
Sent: Monday, June 12, 2006 10:25 AM
To: u2-users@listserver.u2ug.org
Subject: [U2] RESIZE DYNAMIC FILES


Does anyone have any tech tips on how to select parameters when resizing 
dynamic files ?
__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


Re: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Dave S
What does the guide -r option do ?
   
  We have been using the -a option.

Timothy Snyder <[EMAIL PROTECTED]> wrote:
  [EMAIL PROTECTED] wrote on 06/12/2006 10:24:51 AM:

> Does anyone have any tech tips on how to select parameters when
> resizing dynamic files ?

The following is from a published tech tip. It provides guidelines, but
of course the nature of MV files makes it difficult to predict optimal
sizing. To get the appropriate input data, run guide with the -r option
to send the output to a hashed file. Point the dictionary of that file as
directed, and you'll have what you need. It's important to note that this
only applies to KEYONLY files.
===
Formula for determining base modulo, block size, SPLIT_LOAD,
and MERGE_LOAD for UniData KEYONLY Dynamic Files

Note that the variables used are the same as the DICT items in
$UDTHOME/sys/D_UDT_GUIDE.

Considerations:
a) The following does not take into account the Unix disk
record (frame) size so it is best to select a block
size based on the number of items you?d like in a group.
b) No one method will provide absolute results but these
calculations will minimize level one overflow caused
by a high SPLIT_LOAD value.
c) Type 0 works best for most Dynamic Files but it is best
to check a small sample via the GROUP.STAT command.

Step 1: Determine the blocksize. (Use 4096 unless the Items
per group is larger then 35 or less then 2)
A) If the MAXSIZ < 1K ITEMSIZE = 10 * MAXSIZ
B) If 1 K < MAXSIZ < 3 K ITEMSIZE = 5 * MAXSIZ
C) If MAXSIZ > 3 K ITEMSIZE = 5 * (AVGSIZ + DEVSIZ)

Once you determine the item size, use it to determine the NEWBLOCKSIZE.
A) ITEMSIZE < 1024; NEWBLOCKSIZE = 1024
B) 1024 > ITEMSIZE < 2048; NEWBLOCKSIZE = 2048
C) 2048 > ITEMSIZE < 4096; NEWBLOCKSIZE = 4096
D) 4096 > ITEMSIZE < 8192; NEWBLOCKSIZE = 8192
E) 8192 > ITEMSIZE < 16384; NEWBLOCKSIZE = 16384

Step 2: Determine the actual number of items per group.
ITEMS_PER_GROUP = NEWBLOCKSIZE-32 / AVGSIZ

Step 3: Determine the base modulo
BASEMODULO = COUNT / ITEMS_PER_GROUP

Step 4: Determine SPLIT_LOAD
SPLIT_LOAD=INTAVGKEY + 9)*ITEMS_PER_GROUP)/NEW_BLOCKSIZE)*100)+1
If the SPLIT_LOAD is less then ten then: SPLIT_LOAD = 10

Step 5: Determine MERGE_LOAD
MERGE_LOAD = SPLIT_LOAD / 2 (Rounded up)


Tim Snyder
Consulting I/T Specialist , U2 Professional Services
North American Lab Services
DB2 Information Management, IBM Software Group
717-545-6403
[EMAIL PROTECTED]
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


Re: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Timothy Snyder
[EMAIL PROTECTED] wrote on 06/12/2006 10:24:51 AM:

> Does anyone have any tech tips on how to select parameters when
> resizing dynamic files ?

The following is from a published tech tip.  It provides guidelines, but
of course the nature of MV files makes it difficult to predict optimal
sizing.  To get the appropriate input data, run guide with the -r option
to send the output to a hashed file.  Point the dictionary of that file as
directed, and you'll have what you need.  It's important to note that this
only applies to KEYONLY files.
===
Formula for determining base modulo, block size, SPLIT_LOAD,
and MERGE_LOAD for UniData KEYONLY Dynamic Files

Note that the variables used are the same as the DICT items in
$UDTHOME/sys/D_UDT_GUIDE.

Considerations:
a) The following does not take into account the Unix disk
   record (frame) size so it is best to select a block
   size based on the number of items you?d like in a group.
b) No one method will provide absolute results but these
   calculations will minimize level one overflow caused
   by a high SPLIT_LOAD value.
c) Type 0 works best for most Dynamic Files but it is best
   to check a small sample via the GROUP.STAT command.

Step 1: Determine the blocksize.  (Use 4096 unless the Items
per group is larger then 35 or less then 2)
  A) If the MAXSIZ < 1K ITEMSIZE = 10 * MAXSIZ
  B) If  1 K < MAXSIZ < 3 K ITEMSIZE = 5 * MAXSIZ
  C) If  MAXSIZ > 3 K ITEMSIZE = 5 * (AVGSIZ + DEVSIZ)

Once you determine the item size, use it to determine the NEWBLOCKSIZE.
  A) ITEMSIZE < 1024;  NEWBLOCKSIZE = 1024
  B) 1024 > ITEMSIZE < 2048;   NEWBLOCKSIZE = 2048
  C) 2048 > ITEMSIZE < 4096;   NEWBLOCKSIZE = 4096
  D) 4096 > ITEMSIZE < 8192;   NEWBLOCKSIZE = 8192
  E) 8192 > ITEMSIZE < 16384;  NEWBLOCKSIZE = 16384

Step 2: Determine the actual number of items per group.
  ITEMS_PER_GROUP = NEWBLOCKSIZE-32 / AVGSIZ

Step 3: Determine the base modulo
  BASEMODULO = COUNT / ITEMS_PER_GROUP

Step 4: Determine SPLIT_LOAD
  SPLIT_LOAD=INTAVGKEY + 9)*ITEMS_PER_GROUP)/NEW_BLOCKSIZE)*100)+1
  If the SPLIT_LOAD is less then ten then: SPLIT_LOAD = 10

Step 5: Determine MERGE_LOAD
  MERGE_LOAD = SPLIT_LOAD / 2 (Rounded up)


Tim Snyder
Consulting I/T Specialist , U2 Professional Services
North American Lab Services
DB2 Information Management, IBM Software Group
717-545-6403
[EMAIL PROTECTED]
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


RE: [U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Hennessey, Mark F.
Should I put [AD] in the subject line for an "unsolicited testimonial"?  :)

The best advice I can give you is to buy a product called FAST:

http://www.fitzlong.com/

A great tool for analyzing and resizing files, be they dynamic or standard 
hashed files. Excellent support from excellent people at a great price.

There might be more expensive utilities out there, but I can't imagine that 
there is anything better.

Mark Hennessey
(a FAST Customer since 2002)

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Dave S
Sent: Monday, June 12, 2006 10:25 AM
To: u2-users@listserver.u2ug.org
Subject: [U2] RESIZE DYNAMIC FILES


Does anyone have any tech tips on how to select parameters when resizing 
dynamic files ?
 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/


[U2] RESIZE DYNAMIC FILES

2006-06-12 Thread Dave S
Does anyone have any tech tips on how to select parameters when resizing 
dynamic files ?
 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
---
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/