Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Dave Laansma
Hey Doug,

Thanks for your confirmation of my understanding Wally's adeptness in
Unidata. I always wait anxiously for his replies to these threads.

I believe the *100 simply gets the percentage to a whole number so the
INT() works.

I am indeed going to go to the 8k block size to reduce the % of 'wasted'
space to < 2%. That just seems like a reasonable without going all the
way to 16k blocks.

I'm also going to increase the modulo by 20% from the number of records
so it will have a little extra room to grow over the next few months.

I'll probably configure the SPLIT for 10% and MERGE for 5% unless
someone discourages me before noon tomorrow. That's when I'm pulling the
trigger on this resize.

Sincerely,
David Laansma
IT Manager
Hubbard Supply Co.
Direct: 810-342-7143
Office: 810-234-8681
Fax: 810-234-6142
www.hubbardsupply.com
"Delivering Products, Services and Innovative Solutions"


-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Doug Averch
Sent: Friday, June 29, 2012 4:17 PM
To: U2 Users List
Subject: Re: [U2] Really trying to understand dynamic file sizing

Hi Dave and others:

Having worked for Wally many years ago at Unidata, his knowledge is
unparalleled about Unidata.  So given that I've been doing this for a
number of years, I'm not sure where the 100 number is coming.  The other
numbers I have reams of paper, CD's, and thumb drives backing up all of
the other numbers.

I have tested many a file with different blocks sizes with the same data
either adding or subtracting records.  I found the split and merge to
not
happen when I would expect it to.   If you put more keys into a smaller
modulo the percentage will change because you are using KEYONLY.

Your record size is quite large so that is why going to a 2K block size
will make things worse.  I would recommend going to a 8K block size
leaving the current modulo since you a near 100% overflow.  You can try
it on a temporary file and either tell me I'm wrong or I'm right.

Regards,
Doug
www.u2logic.com/tools.html
"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Doug Averch
Hi Dave and others:

Having worked for Wally many years ago at Unidata, his knowledge
is unparalleled about Unidata.  So given that I've been doing this for a
number of years, I'm not sure where the 100 number is coming.  The other
numbers I have reams of paper, CD's, and thumb drives backing up all of the
other numbers.

I have tested many a file with different blocks sizes with the same data
either adding or subtracting records.  I found the split and merge to not
happen when I would expect it to.   If you put more keys into a smaller
modulo the percentage will change because you are using KEYONLY.

Your record size is quite large so that is why going to a 2K block size
will make things worse.  I would recommend going to a 8K block size leaving
the current modulo since you a near 100% overflow.  You can try it on a
temporary file and either tell me I'm wrong or I'm right.

Regards,
Doug
www.u2logic.com/tools.html
"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Dave Laansma
Thank you Wally. That's what I was looking for, a 'base' line. 1% or
even 6% seemed low.

Doesn't splitting basically happen when too many keys are hashed to the
same group for KEYONLY hashed files?

Sincerely,
David Laansma
IT Manager
Hubbard Supply Co.
Direct: 810-342-7143
Office: 810-234-8681
Fax: 810-234-6142
www.hubbardsupply.com
"Delivering Products, Services and Innovative Solutions"


-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wally Terhune
Sent: Friday, June 29, 2012 3:06 PM
To: U2 Users List
Subject: Re: [U2] Really trying to understand dynamic file sizing

I wouldn't go below 10%. You could end up with lots of splitting and
very sparsely populated groups.

Wally Terhune
Technical Support Engineer
Rocket Software
4600 South Ulster Street, Suite 1100 **Denver, CO 80237 **USA
t: +1 720 475 8055 **e: wterh...@rocketsoftware.com **w:
rocketsoftware.com/u2




-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Doug Averch
Sent: Friday, June 29, 2012 11:38 AM
To: U2 Users List
Subject: Re: [U2] Really trying to understand dynamic file sizing

Hi Dave:

You cannot get any lower than one percent.  If you set your block size
to 2K that will fix the split problem making about 6%.  Your average
record size is 1537 which means you will get about one record per block
so split by key will be worthless and you will be into Level 2 and/or
Level 1 overflow.

I have found if you spend a lot of time on one file try to optimize it,
then you had 100K worth of records to the file and your work is does not
matter.  What you want to do is get the file in best shape you can
considering growth and usability, then look at this file or all your
files in a week or month and see what has happened.

Regards,
Doug
www.u2logic.com
"XLr8Resizer for fast resizing"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Wally Terhune
I wouldn't go below 10%. You could end up with lots of splitting and very 
sparsely populated groups.

Wally Terhune
Technical Support Engineer
Rocket Software
4600 South Ulster Street, Suite 1100 **Denver, CO 80237 **USA
t: +1 720 475 8055 **e: wterh...@rocketsoftware.com **w: rocketsoftware.com/u2




-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Doug Averch
Sent: Friday, June 29, 2012 11:38 AM
To: U2 Users List
Subject: Re: [U2] Really trying to understand dynamic file sizing

Hi Dave:

You cannot get any lower than one percent.  If you set your block size to 2K 
that will fix the split problem making about 6%.  Your average record size is 
1537 which means you will get about one record per block so split by key will 
be worthless and you will be into Level 2 and/or Level 1 overflow.

I have found if you spend a lot of time on one file try to optimize it, then 
you had 100K worth of records to the file and your work is does not matter.  
What you want to do is get the file in best shape you can considering growth 
and usability, then look at this file or all your files in a week or month and 
see what has happened.

Regards,
Doug
www.u2logic.com
"XLr8Resizer for fast resizing"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Dave Laansma
Greetings Doug,

This is where my confusion is. The formula that I'm using to compute the
split load does not change, regardless of the block size. And yet you're
saying changing my block size to 2k will change it from 1% to 6%.

So, here's my formula, based on the very informative FAQ from Rocket:

SPLIT = INT(RECORDS PER BLOCK * IDSIZE * 100 / BLOCKSIZE)

SPLIT = INT(1 * 21 * 100 / 2048) = 1
SPLIT = INT(2 * 21 * 100 / 4096) = 1

Given this formula, the split will never change, regardless of the block
size because the RECORDS PER BLOCK will generally increase
proportionally to the BLOCK SIZE.

Sincerely,
David Laansma
IT Manager
Hubbard Supply Co.
Direct: 810-342-7143
Office: 810-234-8681
Fax: 810-234-6142
www.hubbardsupply.com
"Delivering Products, Services and Innovative Solutions"


-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Doug Averch
Sent: Friday, June 29, 2012 1:38 PM
To: U2 Users List
Subject: Re: [U2] Really trying to understand dynamic file sizing

Hi Dave:

You cannot get any lower than one percent.  If you set your block size
to 2K that will fix the split problem making about 6%.  Your average
record size is 1537 which means you will get about one record per block
so split by key will be worthless and you will be into Level 2 and/or
Level 1 overflow.

I have found if you spend a lot of time on one file try to optimize it,
then you had 100K worth of records to the file and your work is does not
matter.  What you want to do is get the file in best shape you can
considering growth and usability, then look at this file or all your
files in a week or month and see what has happened.

Regards,
Doug
www.u2logic.com
"XLr8Resizer for fast resizing"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Doug Averch
Hi Dave:

You cannot get any lower than one percent.  If you set your block size to
2K that will fix the split problem making about 6%.  Your average record
size is 1537 which means you will get about one record per block so split
by key will be worthless and you will be into Level 2 and/or Level 1
overflow.

I have found if you spend a lot of time on one file try to optimize it,
then you had 100K worth of records to the file and your work is does not
matter.  What you want to do is get the file in best shape you can
considering growth and usability, then look at this file or all your files
in a week or month and see what has happened.

Regards,
Doug
www.u2logic.com
"XLr8Resizer for fast resizing"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Mecki Foerthmann
So if I understand this correctly - a dynamic file will only split when 
it goes into level 2 overflow?
If that is so than wouldn't decreasing the block size as I suggested 
earlier make the file split much earlier than using a larger block size?
Why then don't you just double the modulo, find the next prime and use a 
2K instead of a 4K Block size?
If you have a development sytem with enough disk space just give it a 
try and find out if this increases performance or not and let us know 
how it went.



On 29/06/2012 16:58, Dave Laansma wrote:

Doesn't 1% split load seem a little low?

Sincerely,
David Laansma
IT Manager
Hubbard Supply Co.
Direct: 810-342-7143
Office: 810-234-8681
Fax: 810-234-6142
www.hubbardsupply.com
"Delivering Products, Services and Innovative Solutions"

-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Doug Averch
Sent: Thursday, June 28, 2012 8:56 PM
To: U2 Users List
Subject: Re: [U2] Really trying to understand dynamic file sizing

On Thu, Jun 28, 2012 at 6:50 PM, Doug Averch  wrote:



126/4064   (Correction, too late to be working on this)


 1  (Correction average Load before split)

And that matches your numbers

Regards,
Doug
www.u2logic.com
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users



___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-29 Thread Dave Laansma
Doesn't 1% split load seem a little low?

Sincerely,
David Laansma
IT Manager
Hubbard Supply Co.
Direct: 810-342-7143
Office: 810-234-8681
Fax: 810-234-6142
www.hubbardsupply.com
"Delivering Products, Services and Innovative Solutions"

-Original Message-
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Doug Averch
Sent: Thursday, June 28, 2012 8:56 PM
To: U2 Users List
Subject: Re: [U2] Really trying to understand dynamic file sizing

On Thu, Jun 28, 2012 at 6:50 PM, Doug Averch  wrote:


> 126/4064   (Correction, too late to be working on this)
>
1  (Correction average Load before split)

And that matches your numbers

Regards,
Doug
www.u2logic.com
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-28 Thread Doug Averch
On Thu, Jun 28, 2012 at 6:50 PM, Doug Averch  wrote:


> 126/4064   (Correction, too late to be working on this)
>
1  (Correction average Load before split)

And that matches your numbers

Regards,
Doug
www.u2logic.com
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-28 Thread Doug Averch
Here is the math from how I read the manual on KEYONLY Split/Merge.

4096 (Group Size from Guide)
  -32  (Unidata Overhead from documentation)

4064 (Usable Space)

6(Keys per group from Guide)
13   (Key size from Guide)
=
78  (Key size on average)

6(Keys per Group from Guide)
8(Key Overhead from documentation)
=
48
78  (addition here)

126  (Avg Key Bytes)

4064
126  (Divide)

32(Average Load before split)
16(Merge Load is half)

This is math averages which does always work out.  However, this is the
place to start.  From here I generally work the number down not up when you
are using KEYONLY calculation.

Regards,
Doug
www.u2logic.com/tools.html
"XLr8Resizer for the rest of us"
___
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users


Re: [U2] Really trying to understand dynamic file sizing

2012-06-28 Thread Baakkonen, Rodney A (Rod) 46K
 Sorry I don't have time to really dive into this. A lot of the documentation 
are things that work most of the time. But sizing dynamic files is more art 
than science some times.

But just looking at your stats, KEYDATA might be an option as well. You have a 
pretty high ratio of bytes per record to a 12 character key and will you ever 
get enough keys into a group to cause a split? Maybe. But KEYDATA uses both the 
data and keys to figure out when to split. 

The only problem I have had using KEYDATA is if you have a meaningful key with 
separators like '#' in them. The hashing gets goofed up and I can get good 
distribution into groups. I get splitting but I end up having groups that never 
get anything in them. So I will have one group with 11 records and then 2 with 
zero bytes:

  4 0 0
  5   795 1>
  6 0 0
  7  4888 9>
  8 0 0
  9   669 1>
 10 0 0
 11  598711>>>
 12 0 0
 13 0 0
 14 0 0
 15  579511>>>
 16 0 0
 17  1113 2>>
 18 0 0
 19  984216

I have one file that has nice numeric keys and KEYDATA works great:

-rwxrwxr-x   1 root mcc  20 Jun 17 17:40 dat001
-rwxrwxr-x   1 root mcc  20 Jun 17 17:40 dat002
-rwxrwxr-x   1 root mcc  20 Jun 17 17:40 dat003
-rwxrwxr-x   1 root mcc  20 Jun 17 17:10 dat004
-rwxrwxr-x   1 root mcc  20 Jun 17 17:40 dat005
-rwxrwxr-x   1 5126 mcc  20 Jun 28 09:02 dat006
-rwxrwxr-x   1 30012mcc  20 Jun 17 17:46 dat007
-rwxrwxr-x   1 9421 mcc  20 Jun 17 17:46 dat008
-rwxrwxr-x   1 30334mcc  20 Jun 28 09:02 dat009
-rwxrwxr-x   1 30334mcc  20 Jun 17 16:16 dat010
-rw-rw-r--   1 9319 mcc  20 Jun 17 17:40 dat011
-rwxrwxr-x   1 30334mcc  20 Jun 17 17:40 dat012
-rwxrwxr-x   1 30334mcc  20 Jun 17 17:40 dat013
-rwxrwxr-x   1 30334mcc  20 Jun 17 17:40 dat014
-rwxrwxr-x   1 30334mcc  941259776 Jun 28 09:02 dat015
-rwxrwxr-x   1 root mcc  194880 Jun 25 12:57 idx001
-rwxrwxr-x   1 root mcc  194880 Jun 17 17:10 idx002
-rwxrwxr-x   1 root mcc  194880 Jun 17 17:40 idx003
-rwxrwxr-x   1 root mcc  194880 Jun 25 12:57 idx004
-rwxrwxr-x   1 root mcc  194880 Jun 17 17:30 idx005
-rwxrwxr-x   1 root mcc  194880 Jun 17 17:24 idx006
-rwxrwxr-x   1 c00655   mcc  194880 Jun 17 17:40 idx007
-rwxrwxr-x   1 8575 mcc  194880 Jun 17 16:16 idx008
-rw-rw-rw-   1 udtcron  mcc  194880 Jun 17 17:46 idx009
-rw-rw-rw-   1 30334mcc  194880 Jun 25 12:57 idx010
-rw-rw-rw-   1 30334mcc  194880 Jun 25 12:57 idx011
-rw-rw-rw-   1 udtcron  mcc  1765761024 Jun 26 15:17 idx012
-rwxrwxr-x   1 root mcc  954723328 Jun 28 09:02 over001

I haven't had to do anything to this file for a couple of years.

-Original Message-
From: u2-users-boun...@listserver.u2ug.org 
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Dave Laansma
Sent: Thursday, June 28, 2012 9:29 AM
To: U2-Users@listserver.u2ug.org
Subject: [U2] Really trying to understand dynamic file sizing

I've only got a handful of dynamic files but of course they're huge and
have a big impact on our daily and monthly processing. I'd REALLY like
to understand the tuning mechanisms for these files, specifically
SPLIT/MERGE.

 

The formulas that I got on previous responses just don't seem to make
sense on one particular file.

 

So here's a FILE.STAT and ANALYZE.FILE of a file that I believe is in
need of resizing and/or reconfiguring. I believe that if I can get some
input on this file, I'll be able to apply that knowledge to my other
files.

 

First, I understand quite clearly that the modulo of 235889 is about
half of what it should be, at least for a block size of 4096.

 

Second, unless I'm doing something wrong, I computed my SPLIT LOAD to be
1, which just doesn't seem right.

 

I'd like to resize this file this weekend and I know that if I do one
thing incorrectly it could make my performance even worse.

 

Any input would be greatly appreciated.

 

File name(Dynamic File)   = OH

Number of groups in file (modulo) = 235889

Dynamic hashing, hash type= 0

Split/Merge type  = KEYONLY

Block size= 4096

File has 234167 groups in level one overflow.

Number of records = 1387389

Total number of bytes = 2132217978

 

Average number of records per group   = 5.9

Standard deviation from average   = 1.6

Average number of bytes per group = 9039.1

Standard deviation from average   = 9949.2

 

Average number of bytes in a record   = 1536.9

Average number of bytes in record ID  = 12.4

Standard deviation from average   = 4009.7

Min