Re: [U2] RESIZE - dynamic files

Rick Nuckolls Wed, 04 Jul 2012 11:59:44 -0700

I believe PiOpen used a directory with two files in it ‘&$0’ and ‘&$1’ 
corresponding to DATA.30 and OVER.30.  If the numbers went up from there, I 
think that they corresponded to alternate keys, ie ‘&$2’ and ‘&$3’ represented 
DATA.30 and OVER.30 for the first alternate key.


I do not think that PiOpen supported statically hashed files.  (Pr1me 
Information did)

All of that is a few years ago

Unidata uses dat001 and over001 with the number increasing to allow for very 
large files (I think).

-Rick

On Jul 4, 2012, at 10:51 AM, Wols Lists wrote:

> On 04/07/12 11:26, Brian Leach wrote:
>>> All the other groups effectively get 1 added to their number
>> Not exactly.
>> 
>> Sorry to those who already know this, but maybe it's time to go over linear
>> hashing in theory ..
>> 
>> Linear hashing was a system devised by Litwin and originally only for
>> in-memory lists. In fact there's some good implementations in C# that
>> provide better handling of Dictionary types. Applying it to a file system
>> adds some complexity but it's basically the same theory.
>> 
>> Let's start with a file that has 100 groups initially defined (that's 0
>> through 99). That is your minimum starting point and should ensure that it
>> never shrinks below that, so it doesn't begin it's life with loads of splits
>> right from the start as you populate the file. You would size this similarly
>> to the way you size a regular hashed file for your initial content: no point
>> making work for yourself (or the database).
>> 
>> As data gets added, because the content is allocated unevenly, some of that
>> load will be in primary and some in overflow: that's just the way of the
>> world. No hashing is perfect. Unlike a static file, the overflow can't be
>> added to the end of the file as a linked list (* why nobody has done managed
>> overflow is beyond me), it has to sit in a separate file.
> 
> I don't know what the definition of "badly overflowed" is, but assuming
> that a badly overflowed group has two blocks of overflow, then those
> file stats seem perfectly okay. As Brian has explained, the distribution
> of records is "lumpy" and as a percentage of the file, there aren't many
> badly overflowed groups.
> 
> You've got roughly 1/3 of groups overflowed - with an 80% split that
> doesn't seem at all out of order - on average each group is 80% full so
> 1/3rd more than 100% full is fine.
> 
> You've got (in thousands) one and a half groups badly overflowed out of
> eighty-three. That's less than two percent. That's nothing.
> 
> As for why no-one has done managed overflow, I think there are various
> reasons. The first successful implementation (Prime INFORMATION) didn't
> need it. It used a peculiar type of file called a "Segmented Directory"
> and while I don't know for certain what PI did, I strongly suspect each
> group had its own normal file so if a group overflowed, it just created
> a new block at the end of the file. Same with large records, it
> allocated a bunch of overflow blocks. This file structure was far more
> evident with PI-Open - at the OS level a dynamic file was a OS directory
> with lots of numbered files in it.
> 
> The UV implementation of "one file for data, one file for overflow" may
> be unique to UV. I don't know. What little I know of UD tells me it's
> different, and others like QM could well be different again. I wouldn't
> actually be surprised if QM is like PI.
> 
> Cheers,
> Wol
> _______________________________________________
> U2-Users mailing list
> U2-Users@listserver.u2ug.org
> http://listserver.u2ug.org/mailman/listinfo/u2-users

_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] RESIZE - dynamic files

Reply via email to