Re: [U2] Learning about file sizing

2012-06-25 Thread Dave Laansma
Marc,

I need to understand more clearly about the split/merge formula. Below
it states:

SPLIT = INT(RECORDS PER BLOCK * IDSIZE * 100 / BLOCKSIZE)

SPLIT = INT( 9 * 17 * 100 / 1024) = 14

How did they come up with 9 as the RECORDS PER BLOCK from the file
status outlined?

Sincerely,
David Laansma
IT Manager
Hubbard Supply Co.
Direct: 810-342-7143
Office: 810-234-8681
Fax: 810-234-6142
www.hubbardsupply.com
"Delivering Products, Services and Innovative Solutions"


-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Rutherford,
Marc
Sent: Tuesday, June 05, 2012 3:50 PM
To: U2 Users List
Subject: Re: [U2] Learning about file sizing

Rod, Excellent post!  I have a file I have been wanting to convert to
dynamic.  Since it not something I do every day I have been stalling for
a while now...

Thanks,

Marc Rutherford
Principal Programmer Analyst
Advanced Bionics LLC
661) 362 1754

-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Baakkonen,
Rodney A (Rod) 46K
Sent: Tuesday, June 05, 2012 10:53 AM
To: 'U2 Users List'
Subject: Re: [U2] Learning about file sizing

 Can't remember if this came from Wally or not a long time ago. But I
use it to figure out Split/Merge. I have a development box that has a
copy of production that I can play with. So I do a lot of playing with
mod and sep and depend on GROUP.STAT to give me some idea of how groups
are being populated.


Sizing Dynamic Files

 Technote (FAQ) 
  
Problem
Sometimes, administrators would like some ideas and insights on how to
configure dynamic files to maximize file access speed and minimize the
physical size. This article describes one process for making this
determination.  
  
 
  
Solution
To improve dynamic file performance an administrator can choose a new
modulo and/or block size. Other important factors, however, are the
percent standard deviation of the record size, the correct the hash
type, and the split load percent. 

The first step is to generate file statistics using the ECL command
FILE.STAT (in ECLTYPE U mode). The percent standard deviation can be
obtained by the following formula: "Standard deviation from average"
divided by "Average number of bytes in a record". Ideally, this percent
would be zero - all records are exactly the same size. Having all
records the same size makes calculations more accurate for our file
sizing purposes. A standard deviation percent under 15-20% means the
variation of record sizes is less than perfect but we can still predict
well enough to be confident that the problem has a satisfactory
solution.

However, it is very common in the U2 world for a file design to have
been left in service beyond what is reasonable for today's situations -
i.e. what worked well in 1980 may not be a good solution for much larger
files than was originally anticipated. So, if in the old days you had,
say, 10 multivalues in most records and today you have between 20 and
3000, then it is easy to see how the percent standard deviation for
record size can creep up over the years without being noticed. Anyway,
we'll slog forward on the assumption that the standard deviation percent
is "good". 

Final point: a high standard deviation percent for record size usually
leads to wasted space, either in the form of sparsely populated primary
groups and/or excessive overflow. A high standard deviation percent can
create a situation where there is no "good" answer. 

An important factor in correct file sizing is to determine the better
hashing algorithm - either type 0 or 1. It is useful to keep an open
mind on this because hash type is another thing that can be set
correctly and, over time, the format of the ids changes, and now the
other hash type is better. First, you should always do ANALYZE.FILE
filename and look at the "Keys" column. If you see consistency in the
number as you look down the column, then the algorithm currently in
place is likely correct. It can be variable enough to warrant further
study. How you do this kind of analysis is to select a sample of 10,000
record ids from the file. Then create two dynamic files (one a type 0,
the other a type 1) of blocksize 1024 and a modulo of 3. Then,
CONFIGURE.FILE to set the MERGE.LOAD to 5 and the SPLIT.LOAD to 10. This
configuration helps exagerate the results of the testing to make the
decision a little easier. Then, populate each of the files using the
sample list of ids and th  e empty string for a record. Whichever file
is the smaller is usually the better hash type. 

Determine Id Size and Record Size

Get two numbers from the FILE.STAT report: "Average number of bytes in a
record"(avg rec size), rounded up to the next whole number; and,
"Average number of bytes in record ID"(id size), rounded up to the next
whole number.


Re: [U2] Learning about file sizing

2012-06-06 Thread Dave Laansma
Unidata. I have gotten what I believe to be very helpful insight through
this thread.

Thanks!

Sincerely,
David Laansma
IT Manager
Hubbard Supply Co.
Direct: 810-342-7143
Office: 810-234-8681
Fax: 810-234-6142
www.hubbardsupply.com
"Delivering Products, Services and Innovative Solutions"


-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wols Lists
Sent: Wednesday, June 06, 2012 2:57 PM
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] Learning about file sizing

On 05/06/12 18:33, Dave Laansma wrote:
> Can anyone point me to a good document that will give me guidelines 
> for 'proper' file sizing of dynamic files in particular?
> 
Which database? Please note that as regards their underlying
implementation UniVerse and UniData are *very* different.

At user level they're very similar but when you're doing admin stuff
like file-sizing I gather they are very different. I'm guessing you're
talking UniData? (About which I know nothing. :-)
>  
> 
> And when to use KEYONLY vs KEYDATA?
> 
I thought you were talking about UniData :-) afaik these keywords will
produce an error on UniVerse.

So be careful that you're getting advice for the right product, and let
others know what product you're talking about.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Learning about file sizing

2012-06-06 Thread Wols Lists
On 05/06/12 18:33, Dave Laansma wrote:
> Can anyone point me to a good document that will give me guidelines for
> 'proper' file sizing of dynamic files in particular?
> 
Which database? Please note that as regards their underlying
implementation UniVerse and UniData are *very* different.

At user level they're very similar but when you're doing admin stuff
like file-sizing I gather they are very different. I'm guessing you're
talking UniData? (About which I know nothing. :-)
>  
> 
> And when to use KEYONLY vs KEYDATA?
> 
I thought you were talking about UniData :-) afaik these keywords will
produce an error on UniVerse.

So be careful that you're getting advice for the right product, and let
others know what product you're talking about.

Cheers,
Wol
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Learning about file sizing

2012-06-05 Thread Tony Gravagno
> From: Dave Laansma 
> Can anyone point me to a good document that will give me guidelines
> for 'proper' file sizing of dynamic files in particular?

There was an excellent article series on file sizing in Spectrum
Magazine. I found part 6 in the Jan/Feb-2005 issue by searching
http://intl-spectrum.com - it specifically discusses dynamic files.
Part 1 was published in the Mar/Apr-2004 issue.
 
UNIVERSE DYNAMICALLY HASHED FILES: TUNING PARAMETERS, PART 6
The concept that dynamic files are maintenance free is a misconception
held by many database administrators and prompts the question, "If
dynamic files don't need any maintenance, why are there so many tuning
parameters?" This article discusses the tuning parameters for
dynamically hashed UniVerse files. BY JEFF FITZGERALD AND PEGGY LONG

HTH
T

___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Learning about file sizing

2012-06-05 Thread Rutherford, Marc
Rod, Excellent post!  I have a file I have been wanting to convert to dynamic.  
Since it not something I do every day I have been stalling for a while now...

Thanks,

Marc Rutherford
Principal Programmer Analyst
Advanced Bionics LLC
661) 362 1754

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Baakkonen, Rodney A 
(Rod) 46K
Sent: Tuesday, June 05, 2012 10:53 AM
To: 'U2 Users List'
Subject: Re: [U2] Learning about file sizing

 Can't remember if this came from Wally or not a long time ago. But I use it to 
figure out Split/Merge. I have a development box that has a copy of production 
that I can play with. So I do a lot of playing with mod and sep and depend on 
GROUP.STAT to give me some idea of how groups are being populated.


Sizing Dynamic Files

 Technote (FAQ) 
  
Problem
Sometimes, administrators would like some ideas and insights on how to 
configure dynamic files to maximize file access speed and minimize the physical 
size. This article describes one process for making this determination.  
  
 
  
Solution
To improve dynamic file performance an administrator can choose a new modulo 
and/or block size. Other important factors, however, are the percent standard 
deviation of the record size, the correct the hash type, and the split load 
percent. 

The first step is to generate file statistics using the ECL command FILE.STAT 
(in ECLTYPE U mode). The percent standard deviation can be obtained by the 
following formula: "Standard deviation from average" divided by "Average number 
of bytes in a record". Ideally, this percent would be zero - all records are 
exactly the same size. Having all records the same size makes calculations more 
accurate for our file sizing purposes. A standard deviation percent under 
15-20% means the variation of record sizes is less than perfect but we can 
still predict well enough to be confident that the problem has a satisfactory 
solution.

However, it is very common in the U2 world for a file design to have been left 
in service beyond what is reasonable for today's situations - i.e. what worked 
well in 1980 may not be a good solution for much larger files than was 
originally anticipated. So, if in the old days you had, say, 10 multivalues in 
most records and today you have between 20 and 3000, then it is easy to see how 
the percent standard deviation for record size can creep up over the years 
without being noticed. Anyway, we'll slog forward on the assumption that the 
standard deviation percent is "good". 

Final point: a high standard deviation percent for record size usually leads to 
wasted space, either in the form of sparsely populated primary groups and/or 
excessive overflow. A high standard deviation percent can create a situation 
where there is no "good" answer. 

An important factor in correct file sizing is to determine the better hashing 
algorithm - either type 0 or 1. It is useful to keep an open mind on this 
because hash type is another thing that can be set correctly and, over time, 
the format of the ids changes, and now the other hash type is better. First, 
you should always do ANALYZE.FILE filename and look at the "Keys" column. If 
you see consistency in the number as you look down the column, then the 
algorithm currently in place is likely correct. It can be variable enough to 
warrant further study. How you do this kind of analysis is to select a sample 
of 10,000 record ids from the file. Then create two dynamic files (one a type 
0, the other a type 1) of blocksize 1024 and a modulo of 3. Then, 
CONFIGURE.FILE to set the MERGE.LOAD to 5 and the SPLIT.LOAD to 10. This 
configuration helps exagerate the results of the testing to make the decision a 
little easier. Then, populate each of the files using the sample list of ids 
and the empty string for a record. Whichever file is the smaller is usually the 
better hash type. 

Determine Id Size and Record Size

Get two numbers from the FILE.STAT report: "Average number of bytes in a 
record"(avg rec size), rounded up to the next whole number; and, "Average 
number of bytes in record ID"(id size), rounded up to the next whole number.

Follow these steps: 
1.IDSIZE = id size from report above + 8 2.DATASIZE = avg rec size from report 
above - id size from report above 3.TOTAL = IDSIZE + DATASIZE 

Example: 

File name(Dynamic File)   = DYN1 
Number of groups in file (modulo) = 115 
Dynamic hashing, hash type= 1 
Split/Merge type  = KEYONLY 
Block size= 1024 
File has 5 groups in level one overflow. 
Number of records = 575 
Total number of bytes = 25708
.
.
.
Average number of bytes in a record   = 100.7
Average number of bytes in record ID  = 8.2 
Standard deviation from average   = 15.3


Re: [U2] Learning about file sizing

2012-06-05 Thread Baakkonen, Rodney A (Rod) 46K
 Can't remember if this came from Wally or not a long time ago. But I use it to 
figure out Split/Merge. I have a development box that has a copy of production 
that I can play with. So I do a lot of playing with mod and sep and depend on 
GROUP.STAT to give me some idea of how groups are being populated.


Sizing Dynamic Files

 Technote (FAQ) 
  
Problem 
Sometimes, administrators would like some ideas and insights on how to 
configure dynamic files to maximize file access speed and minimize the physical 
size. This article describes one process for making this determination.  
  
 
  
Solution 
To improve dynamic file performance an administrator can choose a new modulo 
and/or block size. Other important factors, however, are the percent standard 
deviation of the record size, the correct the hash type, and the split load 
percent. 

The first step is to generate file statistics using the ECL command FILE.STAT 
(in ECLTYPE U mode). The percent standard deviation can be obtained by the 
following formula: "Standard deviation from average" divided by "Average number 
of bytes in a record". Ideally, this percent would be zero - all records are 
exactly the same size. Having all records the same size makes calculations more 
accurate for our file sizing purposes. A standard deviation percent under 
15-20% means the variation of record sizes is less than perfect but we can 
still predict well enough to be confident that the problem has a satisfactory 
solution.

However, it is very common in the U2 world for a file design to have been left 
in service beyond what is reasonable for today's situations - i.e. what worked 
well in 1980 may not be a good solution for much larger files than was 
originally anticipated. So, if in the old days you had, say, 10 multivalues in 
most records and today you have between 20 and 3000, then it is easy to see how 
the percent standard deviation for record size can creep up over the years 
without being noticed. Anyway, we'll slog forward on the assumption that the 
standard deviation percent is "good". 

Final point: a high standard deviation percent for record size usually leads to 
wasted space, either in the form of sparsely populated primary groups and/or 
excessive overflow. A high standard deviation percent can create a situation 
where there is no "good" answer. 

An important factor in correct file sizing is to determine the better hashing 
algorithm - either type 0 or 1. It is useful to keep an open mind on this 
because hash type is another thing that can be set correctly and, over time, 
the format of the ids changes, and now the other hash type is better. First, 
you should always do ANALYZE.FILE filename and look at the "Keys" column. If 
you see consistency in the number as you look down the column, then the 
algorithm currently in place is likely correct. It can be variable enough to 
warrant further study. How you do this kind of analysis is to select a sample 
of 10,000 record ids from the file. Then create two dynamic files (one a type 
0, the other a type 1) of blocksize 1024 and a modulo of 3. Then, 
CONFIGURE.FILE to set the MERGE.LOAD to 5 and the SPLIT.LOAD to 10. This 
configuration helps exagerate the results of the testing to make the decision a 
little easier. Then, populate each of the files using the sample list of ids 
and the empty string for a record. Whichever file is the smaller is usually the 
better hash type. 

Determine Id Size and Record Size

Get two numbers from the FILE.STAT report: "Average number of bytes in a 
record"(avg rec size), rounded up to the next whole number; and, "Average 
number of bytes in record ID"(id size), rounded up to the next whole number.

Follow these steps: 
1.IDSIZE = id size from report above + 8 
2.DATASIZE = avg rec size from report above - id size from report above 
3.TOTAL = IDSIZE + DATASIZE 

Example: 

File name(Dynamic File)   = DYN1 
Number of groups in file (modulo) = 115 
Dynamic hashing, hash type= 1 
Split/Merge type  = KEYONLY 
Block size= 1024 
File has 5 groups in level one overflow. 
Number of records = 575 
Total number of bytes = 25708
.
.
.
Average number of bytes in a record   = 100.7
Average number of bytes in record ID  = 8.2 
Standard deviation from average   = 15.3


Average number of bytes in a record = 100.7 -> 101 
Average number of bytes in record ID = 8.2 -> 9 

IDSIZE = 9 + 8 = 17 
DATASIZE = 101 - 9 = 92
TOTAL = 17 + 92 = 109


Determine Blocksize and Modulo

The first block in each group has 32 bytes of header information. So, for a 
1024 byte block, 992 bytes are useable for keys and data. Of this, a minimum of 
roughly 10 percent (124 bytes in this case) is reserved for key information. 
Each key will use up 8 bytes of overhead plus the length of the key itself. 
This is represented by IDSIZE above. The data portion of the record(s) begins 
after the key ar