Re: [PoC] Improve dead tuple storage for lazy vacuum

Masahiko Sawada Fri, 09 Dec 2022 05:33:46 -0800

On Fri, Dec 9, 2022 at 5:53 PM John Naylor <john.nay...@enterprisedb.com> wrote:
>
>
> On Fri, Dec 9, 2022 at 8:20 AM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> > In the meanwhile, I've been working on vacuum integration. There are
> > two things I'd like to discuss some time:
> >
> > The first is the minimum of maintenance_work_mem, 1 MB. Since the
> > initial DSA segment size is 1MB (DSA_INITIAL_SEGMENT_SIZE), parallel
> > vacuum with radix tree cannot work with the minimum
> > maintenance_work_mem. It will need to increase it to 4MB or so. Maybe
> > we can start a new thread for that.
>
> I don't think that'd be very controversial, but I'm also not sure why we'd 
> need 4MB -- can you explain in more detail what exactly we'd need so that the 
> feature would work? (The minimum doesn't have to work *well* IIUC, just do 
> some useful work and not fail).


The minimum requirement is 2MB. In PoC patch, TIDStore checks how big
the radix tree is using dsa_get_total_size(). If the size returned by
dsa_get_total_size() (+ some memory used by TIDStore meta information)
exceeds maintenance_work_mem, lazy vacuum starts to do index vacuum
and heap vacuum. However, when allocating DSA memory for
radix_tree_control at creation, we allocate 1MB
(DSA_INITIAL_SEGMENT_SIZE) DSM memory and use memory required for
radix_tree_control from it. das_get_total_size() returns 1MB even if
there is no TID collected.

>
> > The second is how to limit the size of the radix tree to
> > maintenance_work_mem. I think that it's tricky to estimate the maximum
> > number of keys in the radix tree that fit in maintenance_work_mem. The
> > radix tree size varies depending on the key distribution. The next
> > idea I considered was how to limit the size when inserting a key. In
> > order to strictly limit the radix tree size, probably we have to
> > change the rt_set so that it breaks off and returns false if the radix
> > tree size is about to exceed the memory limit when we allocate a new
> > node or grow a node kind/class.
>
> That seems complex, fragile, and wrong scope.
>
> > Ideally, I'd like to control the size
> > outside of radix tree (e.g. TIDStore) since it could introduce
> > overhead to rt_set() but probably we need to add such logic in radix
> > tree.
>
> Does the TIDStore have the ability to ask the DSA (or slab context) to see 
> how big it is?

Yes, TIDStore can check it using dsa_get_total_size().

> If a new segment has been allocated that brings us to the limit, we can stop 
> when we discover that fact. In the local case with slab blocks, it won't be 
> on nice neat boundaries, but we could check if we're within the largest block 
> size (~64kB) of overflow.
>
> Remember when we discussed how we might approach parallel pruning? I 
> envisioned a local array of a few dozen kilobytes to reduce contention on the 
> tidstore. We could use such an array even for a single worker (always doing 
> the same thing is simpler anyway). When the array fills up enough so that the 
> next heap page *could* overflow it: Stop, insert into the store, and check 
> the store's memory usage before continuing.

Right, I think it's no problem in slab cases. In DSA cases, the new
segment size follows a geometric series that approximately doubles the
total storage each time we create a new segment. This behavior comes
from the fact that the underlying DSM system isn't designed for large
numbers of segments.


Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Improve dead tuple storage for lazy vacuum

Reply via email to