Re: [U2] Really trying to understand dynamic file sizing

Baakkonen, Rodney A (Rod) 46K Thu, 28 Jun 2012 09:18:48 -0700

 Sorry I don't have time to really dive into this. A lot of the documentation 
are things that work most of the time. But sizing dynamic files is more art 
than science some times.


But just looking at your stats, KEYDATA might be an option as well. You have a 
pretty high ratio of bytes per record to a 12 character key and will you ever 
get enough keys into a group to cause a split? Maybe. But KEYDATA uses both the 
data and keys to figure out when to split. 

The only problem I have had using KEYDATA is if you have a meaningful key with 
separators like '#' in them. The hashing gets goofed up and I can get good 
distribution into groups. I get splitting but I end up having groups that never 
get anything in them. So I will have one group with 11 records and then 2 with 
zero bytes:

  4     0     0
  5   795     1>
  6     0     0
  7  4888     9>>>>>>>>>
  8     0     0
  9   669     1>
 10     0     0
 11  5987    11>>>>>>>>>>>
 12     0     0
 13     0     0
 14     0     0
 15  5795    11>>>>>>>>>>>
 16     0     0
 17  1113     2>>
 18     0     0
 19  9842    16>>>>>>>>>>>>>>>>

I have one file that has nice numeric keys and KEYDATA works great:

-rwxrwxr-x   1 root     mcc      2000000000 Jun 17 17:40 dat001
-rwxrwxr-x   1 root     mcc      2000000000 Jun 17 17:40 dat002
-rwxrwxr-x   1 root     mcc      2000000000 Jun 17 17:40 dat003
-rwxrwxr-x   1 root     mcc      2000000000 Jun 17 17:10 dat004
-rwxrwxr-x   1 root     mcc      2000000000 Jun 17 17:40 dat005
-rwxrwxr-x   1 5126     mcc      2000000000 Jun 28 09:02 dat006
-rwxrwxr-x   1 30012    mcc      2000000000 Jun 17 17:46 dat007
-rwxrwxr-x   1 9421     mcc      2000000000 Jun 17 17:46 dat008
-rwxrwxr-x   1 30334    mcc      2000000000 Jun 28 09:02 dat009
-rwxrwxr-x   1 30334    mcc      2000000000 Jun 17 16:16 dat010
-rw-rw-r--   1 9319     mcc      2000000000 Jun 17 17:40 dat011
-rwxrwxr-x   1 30334    mcc      2000000000 Jun 17 17:40 dat012
-rwxrwxr-x   1 30334    mcc      2000000000 Jun 17 17:40 dat013
-rwxrwxr-x   1 30334    mcc      2000000000 Jun 17 17:40 dat014
-rwxrwxr-x   1 30334    mcc      941259776 Jun 28 09:02 dat015
-rwxrwxr-x   1 root     mcc      1999994880 Jun 25 12:57 idx001
-rwxrwxr-x   1 root     mcc      1999994880 Jun 17 17:10 idx002
-rwxrwxr-x   1 root     mcc      1999994880 Jun 17 17:40 idx003
-rwxrwxr-x   1 root     mcc      1999994880 Jun 25 12:57 idx004
-rwxrwxr-x   1 root     mcc      1999994880 Jun 17 17:30 idx005
-rwxrwxr-x   1 root     mcc      1999994880 Jun 17 17:24 idx006
-rwxrwxr-x   1 c00655   mcc      1999994880 Jun 17 17:40 idx007
-rwxrwxr-x   1 8575     mcc      1999994880 Jun 17 16:16 idx008
-rw-rw-rw-   1 udtcron  mcc      1999994880 Jun 17 17:46 idx009
-rw-rw-rw-   1 30334    mcc      1999994880 Jun 25 12:57 idx010
-rw-rw-rw-   1 30334    mcc      1999994880 Jun 25 12:57 idx011
-rw-rw-rw-   1 udtcron  mcc      1765761024 Jun 26 15:17 idx012
-rwxrwxr-x   1 root     mcc      954723328 Jun 28 09:02 over001

I haven't had to do anything to this file for a couple of years.

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Dave Laansma
Sent: Thursday, June 28, 2012 9:29 AM
To: [email protected]
Subject: [U2] Really trying to understand dynamic file sizing

I've only got a handful of dynamic files but of course they're huge and
have a big impact on our daily and monthly processing. I'd REALLY like
to understand the tuning mechanisms for these files, specifically
SPLIT/MERGE.

 

The formulas that I got on previous responses just don't seem to make
sense on one particular file.

 

So here's a FILE.STAT and ANALYZE.FILE of a file that I believe is in
need of resizing and/or reconfiguring. I believe that if I can get some
input on this file, I'll be able to apply that knowledge to my other
files.

 

First, I understand quite clearly that the modulo of 235889 is about
half of what it should be, at least for a block size of 4096.

 

Second, unless I'm doing something wrong, I computed my SPLIT LOAD to be
1, which just doesn't seem right.

 

I'd like to resize this file this weekend and I know that if I do one
thing incorrectly it could make my performance even worse.

 

Any input would be greatly appreciated.

 

File name(Dynamic File)               = OH

Number of groups in file (modulo)     = 235889

Dynamic hashing, hash type            = 0

Split/Merge type                      = KEYONLY

Block size                            = 4096

File has 234167 groups in level one overflow.

Number of records                     = 1387389

Total number of bytes                 = 2132217978

 

Average number of records per group   = 5.9

Standard deviation from average       = 1.6

Average number of bytes per group     = 9039.1

Standard deviation from average       = 9949.2

 

Average number of bytes in a record   = 1536.9

Average number of bytes in record ID  = 12.4

Standard deviation from average       = 4009.7

Minimum number of bytes in a record   = 659

Maximum number of bytes in a record   = 2205579

 

Minimum number of fields in a record  = 237

Maximum number of fields in a record  = 414

Average number of fields per record   = 328.3

Standard deviation from average       = 32.0

 

Dynamic File name                     = OH

Number of groups in file (modulo)     = 235889

Minimum groups of file                = 235889

Hash type = 0, blocksize = 4096

Split load = 10, Merge load = 5

Split/Merge type = KEYONLY

 

 

Sincerely,

David Laansma

IT Manager

Hubbard Supply Co.

Direct: 810-342-7143

Office: 810-234-8681

Fax: 810-234-6142

www.hubbardsupply.com <http://www.hubbardsupply.com> 

"Delivering Products, Services and Innovative Solutions"

 

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

------------------------------------------------------------------------------
CONFIDENTIALITY NOTICE: If you have received this email in error,
please immediately notify the sender by e-mail at the address shown.  
This email transmission may contain confidential information.  This 
information is intended only for the use of the individual(s) or entity to 
whom it is intended even if addressed incorrectly.  Please delete it from 
your files if you are not the intended recipient.  Thank you for your 
compliance.  Copyright (c) 2012 Cigna
==============================================================================

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] Really trying to understand dynamic file sizing

Reply via email to