I have had some good experiences with KEYDATA. But I think KEYDATA will use 
slightly more disk space.

The only time I had problems with KEYDATA was when the keys to the file did not 
hash well (meaningful keys with a '#' separator followed by a 2 digit number). 
If you have sequential keys, this should not be a problem. I think  KEYDATA 
works better for dynamic files that have a small number of characters for a key 
and a large record. I don't see your key size info here. I don't think your 
current configuration will result in splitting. You probably never get enough 
keys into a group to cause a split.

I like to look at the UNIX listing of a dynamic file to see the relation of 
datxxx to overxx files. I have one KEYDATA dynamic file (our largest), that has 
22 datxxx files and one over001 file. The size of the over001 file is 746 meg. 
So 99% of the data is in the dat portion of the file where the hashing should 
be able to get to it quickly.

When I first converted this file to KEYDATA, the result of the rebuild had 19 
datxxx files and one over001. So splitting has added 3 more dats, dat020, 
dat021 and dat022. The file pre KEYDATA was 8 dat's and 8 over's. Performance 
was not good prior to the KEYDATA conversion. It is very good now that I have 
it as KEYDATA, as almost none of the file is in overflow.

But that said, KEYDATA can have it's pitfalls. The bad file I had would split 
and split continually. It was sucking up major disk space. And when I looked at 
the GROUP.STAT, I would see chunks of groups with zero records in them. The 
meaningful keys would never hash to the new groups that we added to the file. I 
tried HASH TYPE 1, but that was even worse. So I went back to KEYONLY for those 
files.

One other thing I have found with dynamic files. (I guess this would apply to 
large static files as well). Indexes that grow a lot need to be rebuilt from 
scratch periodically. The file from above with 22 dats was just rebuilt. It 
started out with 7 idx segments. After the rebuild of the indexes, there were 
only 5 idx files. So the indexes compressed down and saved me close to 4 gig in 
space.

I would experiment with KEYDATA with the file you sent info on. I think it 
looks like a good candidate. Especially if the file has a lot of data in the 
overxxx part of the file vs. the datxxx. The trade off is performance vs. more 
disk space used.

This probably has gone on long enough. I could branch into a discussion about 
why I don't use memresize on dynamic files. But I think I covered that a while 
ago. -Rod

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Jeffrey Butera
I'
Sent: Monday, October 23, 2006 8:10 AM
To: u2-users@listserver.u2ug.org
Subject: [U2] Unidata dynamic file tuning


I'm seeking advice on tuning a dynamic file in Unidata.  In short, the file 
has a numeric (sequential) key and contains only about 10 small fields: date, 
time, user, etc and a single 'large' field: a block of text.  The text can be 
anywhere from one sentence to pages in length, hence I know this is going to 
be a lumpy file as the records will vary wildly in size.  Records will never 
be deleted, and we add about 4000 new records/month.  I converted this from 
static to dynamic knowing I'd hit the 2gig limit in a couple of years.

The file is behaving fine, I'm just trying to see if I can find some better 
parameters for it, and a dynamic files are my weakest area in Unidata.  Here 
are the current parameters:

File name(Dynamic File)               = H08.CR.FF.TEXT
Number of groups in file (modulo)     = 15511
Dynamic hashing, hash type            = 0
Split/Merge type                      = KEYONLY
Block size                            = 4096
File has 15294 groups in level one overflow.
Number of records                     = 133578
Total number of bytes                 = 180532137

Average number of records per group   = 8.6
Standard deviation from average       = 1.1
Average number of bytes per group     = 11639.0
Standard deviation from average       = 6392.6

Average number of bytes in a record   = 1351.5
Average number of bytes in record ID  = 6.4
Standard deviation from average       = 2080.9
Minimum number of bytes in a record   = 51
Maximum number of bytes in a record   = 105864

Minimum number of fields in a record  = 14
Maximum number of fields in a record  = 14
Average number of fields per record   = 14.0

In particular, I'm curious about better choices for split/merge loads:

Dynamic File name                     = H08.CR.FF.TEXT
Number of groups in file (modulo)     = 15511
Minimum groups of file                = 15511
Hash type = 0, blocksize = 4096
Split load = 20, Merge load = 10
Split/Merge type = KEYONLY

Any insight or comments appreciated.

-- 
Jeff Butera, Ph.D.
Administrative Systems
Hampshire College
[EMAIL PROTECTED]
413-559-5556

"...our behavior matters more than the beliefs that we profess."
                                Elizabeth Deutsch Earle
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to