Hi,
On Tue, May 23, 2023 at 7:17 PM John Naylor
<[email protected]> wrote:
>
> I wrote:
> > the current insert/delete paths are quite complex. Using bitmap heap scan
> > as a motivating use case, I hope to refocus complexity to where it's most
> > needed, and aggressively simplify where possible.
>
> Sometime in the not-too-distant future, I will start a new thread focusing on
> bitmap heap scan, but for now, I just want to share some progress on making
> the radix tree usable not only for that, but hopefully a wider range of
> applications, while making the code simpler and the binary smaller. The
> attached patches are incomplete (e.g. no iteration) and quite a bit messy, so
> tar'd and gzip'd for the curious (should apply on top of v32 0001-03 +
> 0007-09 ).
>
Thank you for making progress on this. I agree with these directions
overall. I have some comments and questions:
> - With the latter in mind, searching the child within a node now returns the
> address of the slot. This allows the same interface whether the slot contains
> a child pointer or a value.
Probably we can apply similar changes to the iteration as well.
> * Other nodes will follow in due time, but only after I figure out how to do
> it nicely (ideas welcome!) -- currently node32's two size classes work fine
> for growing, but the code should be simplified before extending to other
> cases.)
Within the size class, we just alloc a new node of lower size class
and do memcpy(). I guess it will be almost same as what we do for
growing. It might be a good idea to support node shrinking within the
size class for node32 (and node125 if we support). I don't think
shrinking class-3 to class-1 makes sense.
>
> Limited support for "combined pointer-value slots". At compile-time, choose
> either that or "single-value leaves" based on the size of the value type
> template parameter. Values that are pointer-sized or less can fit in the
> last-level child slots of nominal "inner nodes" without duplicated leaf-node
> code. Node256 now must act like the previous 'node256 leaf', since zero is a
> valid value. Aside from that, this was a small change.
Yes, but it also means that we use pointer-sized value anyway even if
the value size is less than that, which wastes the memory, no?
>
> What I've shared here could work (in principal, since it uses uint64 values)
> for tidstore, possibly faster (untested) because of better code density, but
> as mentioned I want to shoot for higher. For tidbitmap.c, I want to extend
> this idea and branch at run-time on a per-value basis, so that a page-table
> entry that fits in a pointer can go there, and if not, it'll be a full leaf.
> (This technique enables more flexibility in lossifying pages as well.)
> Run-time info will require e.g. an additional bit per slot. Since the node
> header is now 3 bytes, we can spare one more byte in the node3 case. In
> addition, we can and should also bump it back up to node4, still keeping the
> metadata within 8 bytes (no struct padding).
Sounds good.
> I've started in this patchset to refer to the node kinds as "4/16/48/256",
> regardless of their actual fanout. This is for readability (by matching the
> language in the paper) and maintainability (should *not* ever change again).
> The size classes (including multiple classes per kind) could be determined by
> macros and #ifdef's. For example, in non-SIMD architectures, it's likely slow
> to search an array of 32 key chunks, so in that case the compiler should
> choose size classes similar to these four nominal kinds.
If we want to use the node kinds used in the paper, I think we should
change the number in RT_NODE_KIND_X too. Otherwise, it would be
confusing when reading the code without referring to the paper.
Particularly, this part is very confusing:
case RT_NODE_KIND_3:
RT_ADD_CHILD_4(tree, ref, node, chunk, child);
break;
case RT_NODE_KIND_32:
RT_ADD_CHILD_16(tree, ref, node, chunk, child);
break;
case RT_NODE_KIND_125:
RT_ADD_CHILD_48(tree, ref, node, chunk, child);
break;
case RT_NODE_KIND_256:
RT_ADD_CHILD_256(tree, ref, node, chunk, child);
break;
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com