On Tue, Jun 6, 2023 at 2:13 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > On Mon, Jun 5, 2023 at 5:32 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > > Sometime in the not-too-distant future, I will start a new thread > > > focusing on bitmap heap scan, but for now, I just want to share some > > > progress on making the radix tree usable not only for that, but hopefully > > > a wider range of applications, while making the code simpler and the > > > binary smaller. The attached patches are incomplete (e.g. no iteration) > > > and quite a bit messy, so tar'd and gzip'd for the curious (should apply > > > on top of v32 0001-03 + 0007-09 ). > > > > > > > Thank you for making progress on this. I agree with these directions > > overall. I have some comments and questions: > > Glad to hear it and thanks for looking! > > > > * Other nodes will follow in due time, but only after I figure out how to > > > do it nicely (ideas welcome!) -- currently node32's two size classes work > > > fine for growing, but the code should be simplified before extending to > > > other cases.) > > > > Within the size class, we just alloc a new node of lower size class > > and do memcpy(). I guess it will be almost same as what we do for > > growing. > > Oh, the memcpy part is great, very simple. I mean the (compile-time) "class > info" table lookups are a bit awkward. I'm thinking the hard-coded numbers > like this: > > .fanout = 3, > .inner_size = sizeof(RT_NODE_INNER_3) + 3 * sizeof(RT_PTR_ALLOC), > > ...may be better with a #defined symbol that can also be used elsewhere.
FWIW, exposing these definitions would be good in terms of testing too since we can use them in regression tests. > > > I don't think > > shrinking class-3 to class-1 makes sense. > > Agreed. The smallest kind should just be freed when empty. > > > > Limited support for "combined pointer-value slots". At compile-time, > > > choose either that or "single-value leaves" based on the size of the > > > value type template parameter. Values that are pointer-sized or less can > > > fit in the last-level child slots of nominal "inner nodes" without > > > duplicated leaf-node code. Node256 now must act like the previous > > > 'node256 leaf', since zero is a valid value. Aside from that, this was a > > > small change. > > > > Yes, but it also means that we use pointer-sized value anyway even if > > the value size is less than that, which wastes the memory, no? > > At a low-level, that makes sense, but I've found an interesting global effect > showing the opposite: _less_ memory, which may compensate: > > psql -c "select * from bench_search_random_nodes(1*1000*1000)" > num_keys = 992660 > > (using a low enough number that the experimental change n125->n63 doesn't > affect anything) > height = 4, n3 = 375258, n15 = 137490, n32 = 0, n63 = 0, n256 = 1025 > > v31: > mem_allocated | load_ms | search_ms > ---------------+---------+----------- > 47800768 | 253 | 134 > > (unreleased code "similar" to v33, but among other things restores the > separate "extend down" function) > mem_allocated | load_ms | search_ms > ---------------+---------+----------- > 42926048 | 221 | 127 > > I'd need to make sure, but apparently just going from 6 non-empty memory > contexts to 3 (remember all values are embedded here) reduces memory > fragmentation significantly in this test. (That should also serve as a > demonstration that additional size classes have both runtime costs as well as > benefits. We need to have a balance.) Interesting. The result would probably vary if we change the slab block sizes. I'd like to experiment if the code is available. > > So, I'm inclined to think the only reason to prefer "multi-value leaves" is > if 1) the value type is _bigger_ than a pointer 2) there is no convenient > abbreviation (like tid bitmaps have) and 3) the use case really needs to > avoid another memory access. Under those circumstances, though, the new code > plus lazy expansion etc might suit and be easier to maintain. Indeed. > > > > What I've shared here could work (in principal, since it uses uint64 > > > values) for tidstore, possibly faster (untested) because of better code > > > density, but as mentioned I want to shoot for higher. For tidbitmap.c, I > > > want to extend this idea and branch at run-time on a per-value basis, so > > > that a page-table entry that fits in a pointer can go there, and if not, > > > it'll be a full leaf. (This technique enables more flexibility in > > > lossifying pages as well.) Run-time info will require e.g. an additional > > > bit per slot. Since the node header is now 3 bytes, we can spare one more > > > byte in the node3 case. In addition, we can and should also bump it back > > > up to node4, still keeping the metadata within 8 bytes (no struct > > > padding). > > > > Sounds good. > > The additional bit per slot would require per-node logic and additional > branches, which is not great. I'm now thinking a much easier way to get there > is to give up (at least for now) on promising that "run-time embeddable > values" can use the full pointer-size (unlike value types found embeddable at > compile-time). Reserving the lowest pointer bit for a tag "value or > pointer-to-leaf" would have a much smaller code footprint. Do you mean we can make sure that the value doesn't set the lowest bit? Or is it an optimization for TIDStore? > In addition, without a new bitmap, the smallest node can actually be up to a > node5 with no struct padding, with a node2 as a subclass. (Those numbers > coincidentally were also one scenario in the paper, when calculating > worst-case memory usage). That's worth considering. Agreed. FWIW please let me know if there are some experiments I can help with. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com