Re: [PoC] Improve dead tuple storage for lazy vacuum

Masahiko Sawada Mon, 05 Jun 2023 03:32:36 -0700

Hi,

On Tue, May 23, 2023 at 7:17 PM John Naylor
<[email protected]> wrote:
>
> I wrote:
> > the current insert/delete paths are quite complex. Using bitmap heap scan 
> > as a motivating use case, I hope to refocus complexity to where it's most 
> > needed, and aggressively simplify where possible.
>
> Sometime in the not-too-distant future, I will start a new thread focusing on 
> bitmap heap scan, but for now, I just want to share some progress on making 
> the radix tree usable not only for that, but hopefully a wider range of 
> applications, while making the code simpler and the binary smaller. The 
> attached patches are incomplete (e.g. no iteration) and quite a bit messy, so 
> tar'd and gzip'd for the curious (should apply on top of v32 0001-03 + 
> 0007-09 ).
>


Thank you for making progress on this. I agree with these directions
overall. I have some comments and questions:

> - With the latter in mind, searching the child within a node now returns the 
> address of the slot. This allows the same interface whether the slot contains 
> a child pointer or a value.

Probably we can apply similar changes to the iteration as well.

> * Other nodes will follow in due time, but only after I figure out how to do 
> it nicely (ideas welcome!) -- currently node32's two size classes work fine 
> for growing, but the code should be simplified before extending to other 
> cases.)

Within the size class, we just alloc a new node of lower size class
and do memcpy(). I guess it will be almost same as what we do for
growing. It might be a good idea to support node shrinking within the
size class for node32 (and node125 if we support). I don't think
shrinking class-3 to class-1 makes sense.

>
> Limited support for "combined pointer-value slots". At compile-time, choose 
> either that or "single-value leaves" based on the size of the value type 
> template parameter. Values that are pointer-sized or less can fit in the 
> last-level child slots of nominal "inner nodes" without duplicated leaf-node 
> code. Node256 now must act like the previous 'node256 leaf', since zero is a 
> valid value. Aside from that, this was a small change.

Yes, but it also means that we use pointer-sized value anyway even if
the value size is less than that, which wastes the memory, no?

>
> What I've shared here could work (in principal, since it uses uint64 values) 
> for tidstore, possibly faster (untested) because of better code density, but 
> as mentioned I want to shoot for higher. For tidbitmap.c, I want to extend 
> this idea and branch at run-time on a per-value basis, so that a page-table 
> entry that fits in a pointer can go there, and if not, it'll be a full leaf. 
> (This technique enables more flexibility in lossifying pages as well.) 
> Run-time info will require e.g. an additional bit per slot. Since the node 
> header is now 3 bytes, we can spare one more byte in the node3 case. In 
> addition, we can and should also bump it back up to node4, still keeping the 
> metadata within 8 bytes (no struct padding).

Sounds good.

> I've started in this patchset to refer to the node kinds as "4/16/48/256", 
> regardless of their actual fanout. This is for readability (by matching the 
> language in the paper) and maintainability (should *not* ever change again). 
> The size classes (including multiple classes per kind) could be determined by 
> macros and #ifdef's. For example, in non-SIMD architectures, it's likely slow 
> to search an array of 32 key chunks, so in that case the compiler should 
> choose size classes similar to these four nominal kinds.

If we want to use the node kinds used in the paper, I think we should
change the number in RT_NODE_KIND_X too. Otherwise, it would be
confusing when reading the code without referring to the paper.
Particularly, this part is very confusing:

        case RT_NODE_KIND_3:
            RT_ADD_CHILD_4(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_32:
            RT_ADD_CHILD_16(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_125:
            RT_ADD_CHILD_48(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_256:
            RT_ADD_CHILD_256(tree, ref, node, chunk, child);
            break;

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to