On Thu, 10 Aug 2000 05:03:38 +0200, Bart Lateur wrote:

[description of a mechanism for storing sparse arrays:]

>Imagine
>that it will be traversed based upon the groups of bits in the array
>index. Say, with 32 bit indices, subdivided into 4 bytes. You can start
>with the lower byte, which can give you one of 256 pointers to the next
>table -- or null (zero), if that segment is completely empty and the
>next table nonexistent. Repeat with the second byte, get another table
>pointer, or null. Repeat and follow the pointer, at most 4 times in
>total. In the last pointer table, a nonzero value points to the thing
>itself.

Stike that. you can start with the *higher* byte, instead of the lower,
and this mechanism could even be used to store ordinary one-dimensional
arrays, if you must.

(I am *not* familiar with the current system behind arrays; reading the
source to extract the idea doesn't sound smart. If somebody could point
me to an explanation on how it works, I would be much obliged.)

I thought that starting with the lower byte would nicely split all
indices up into groups. It will; but there's absolutely no advantage in
it. You'll always get a lot of tables. The number of required steps is
always the same anyway. The only advantage would be ease of
implementation: do a bitwise AND to get the lower byte, and do a shift
to the right to get at the next chunk.

I think that the common case would be that used array places will be
used in chunks, for example, indices 10000 till 10100 are in use, and
the rest is empty. If you start with the higher byte, there will be a
lot more NULL pointers, i.e. no 1k pointer block attached. For more
randomly distributed indices, the pointer blocks would still be largely
empty. The situation would be much the same, as far as I can gather.

But: the number of steps need not be a fixed number! Say that the array
descriptor contains a field containing the number of steps, for example,
3 if the largest index is above 65k but below 16 million. (It is the
number of bytes that this largest index can fit in.) You can traverse
the tree with a simple loop.

Say that you start with an ordinary array of 250 items. This first into
one byte, so you just need one step, and one 1k block to hold the
pointers to the array items. So it's almost as fast as it can be, just
one indirection and you've got your pointer to the scalar value.

Now it turns out that you need an array index of 300, beyond the 255
limit. What do you do? You allocate another 1k block, clear it, and put
a pointer to the original block in slot #0. Increment the steps field to
2, and set the tree root pointer to the address of this new block. Tada!

You can now simply allocate and clear a new 1k block for the second
page, array indices 256 to 511, and add the pointer to slot #1 of the
root block.

All in all, you need just 3 1k blocks to hold the whole tree, and
extending it is a simple algorythm. That's why I said it can be used for
ordinary arrays as well.

Problems? Yes: first of all, doing a splice isn't as simple as it might
have been without this tree structure. You'll need lots of copying of
chunks of pointers between the 1k blocks, a few per block.

Secondly, the subdivision of the index into bit groups isn't as easy (or
as fast) as before. A simple "mask and shift" won't do. A "roll and
mask" would; but that's in assembler. C doesn't support rolls (shift
left, but move top byte of register into lower byte), AFAIK.

-- 
        Bart.

Reply via email to