On Wed, 09 Aug 2000 12:03:40 -0400, Dan Sugalski wrote:
>> >I hope this RFC will be "Arrays should be sparse when possible, and
>> >compact" and just about nothing else. :)
>>
>>You mean, something like hashes?
>
>Nope.
>
>>Faster hashes, maybe, with a hash function optimized for numerical
>>integer keys.
>
>I was thinking we might keep a bitmap for used/unused cells (with unused
>doubling as undef) and a two-level array pointing to chunks of real
>elements/element pointers.
Ouch. If you can have any index that first in a 32-bit integer, your
bitmap list may use up 2^24 bytes, or 16Mb. For one array!
>So, for example, if you did a:
>
> $foo[$elem];
>
>perl would first check bit $elem to see if that element is in use. If so,
>it'd do a ($elem / chunk_size) to get the index to the chunk pointer in the
>array structure, then access element ($elem % chunk_size) to get a pointer
>to the ultimate thing. (Or the thing itself, if it were an int, say)
Hmm.. that may turn out to be a sparsely filled table in itself.
>Other methods are possible, of course, and which would be a win depends on
>your element distribution. (A linked list would be a win for very sparse
>arrays--$foo[time])
A linked list? How that? It sounds like a far slower approach even than
a hash.
I myself would have though of something remotely like B-trees. Imagine
that it will be traversed based upon the groups of bits in the array
index. Say, with 32 bit indices, subdivided into 4 bytes. You can start
with the lower byte, which can give you one of 256 pointers to the next
table -- or null (zero), if that segment is completely empty and the
next table nonexistent. Repeat with the second byte, get another table
pointer, or null. Repeat and follow the pointer, at most 4 times in
total. In the last pointer table, a nonzero value points to the thing
itself.
In case of one item, you'd have 4 tables of 1k each, and one slot in the
fourth table that points to the value.
How's that for an idea from the top of my head?
--
Bart.