Re: [PoC] Improve dead tuple storage for lazy vacuum

Masahiko Sawada Tue, 27 Sep 2022 20:49:57 -0700

On Fri, Sep 23, 2022 at 12:11 AM John Naylor
<[email protected]> wrote:
>
>
> On Thu, Sep 22, 2022 at 11:46 AM John Naylor <[email protected]> 
> wrote:
> > One thing I want to try soon is storing fewer than 16/32 etc entries, so 
> > that the whole node fits comfortably inside a power-of-two allocation. That 
> > would allow us to use aset without wasting space for the smaller nodes, 
> > which would be faster and possibly would solve the fragmentation problem 
> > Andres referred to in
>
> > https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de
>
> While calculating node sizes that fit within a power-of-two size, I noticed 
> the current base node is a bit wasteful, taking up 8 bytes. The node kind 
> only has a small number of values, so it doesn't really make sense to use an 
> enum here in the struct (in fact, Andres' prototype used a uint8 for 
> node_kind). We could use a bitfield for the count and kind:
>
> uint16 -- kind and count bitfield
> uint8 shift;
> uint8 chunk;
>
> That's only 4 bytes. Plus, if the kind is ever encoded in a pointer tag, the 
> bitfield can just go back to being count only.


Good point, agreed.

>
> Here are the v6 node kinds:
>
> node4:   8 +   4 +(4)    +   4*8 =   48 bytes
> node16:  8 +  16         +  16*8 =  152
> node32:  8 +  32         +  32*8 =  296
> node128: 8 + 256 + 128/8 + 128*8 = 1304
> node256: 8       + 256/8 + 256*8 = 2088
>
> And here are the possible ways we could optimize nodes for space using aset 
> allocation. Parentheses are padding bytes. Even if my math has mistakes, the 
> numbers shouldn't be too far off:
>
> node3:   4 +   3 +(1)    +   3*8 =   32 bytes
> node6:   4 +   6 +(6)    +   6*8 =   64
> node13:  4 +  13 +(7)    +  13*8 =  128
> node28:  4 +  28         +  28*8 =  256
> node31:  4 + 256 +  32/8 +  31*8 =  512 (XXX not good)
> node94:  4 + 256 +  96/8 +  94*8 = 1024
> node220: 4 + 256 + 224/8 + 220*8 = 2048
> node256:                         = 4096
>
> The main disadvantage is that node256 would balloon in size.

Yeah, node31 and node256 are bloated.  We probably could use slab for
node256 independently. It's worth trying a benchmark to see how it
affects the performance and the tree size.

BTW We need to consider not only aset/slab but also DSA since we
allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses
the following size classes:

static const uint16 dsa_size_classes[] = {
    sizeof(dsa_area_span), 0,   /* special size classes */
    8, 16, 24, 32, 40, 48, 56, 64,  /* 8 classes separated by 8 bytes */
    80, 96, 112, 128,           /* 4 classes separated by 16 bytes */
    160, 192, 224, 256,         /* 4 classes separated by 32 bytes */
    320, 384, 448, 512,         /* 4 classes separated by 64 bytes */
    640, 768, 896, 1024,        /* 4 classes separated by 128 bytes */
    1280, 1560, 1816, 2048,     /* 4 classes separated by ~256 bytes */
    2616, 3120, 3640, 4096,     /* 4 classes separated by ~512 bytes */
    5456, 6552, 7280, 8192      /* 4 classes separated by ~1024 bytes */
};

node256 will be classed as 2616, which is still not good.

Anyway, I'll implement DSA support for radix tree.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to