On Fri, Dec 9, 2022 at 5:53 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > > On Fri, Dec 9, 2022 at 8:20 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > In the meanwhile, I've been working on vacuum integration. There are > > two things I'd like to discuss some time: > > > > The first is the minimum of maintenance_work_mem, 1 MB. Since the > > initial DSA segment size is 1MB (DSA_INITIAL_SEGMENT_SIZE), parallel > > vacuum with radix tree cannot work with the minimum > > maintenance_work_mem. It will need to increase it to 4MB or so. Maybe > > we can start a new thread for that. > > I don't think that'd be very controversial, but I'm also not sure why we'd > need 4MB -- can you explain in more detail what exactly we'd need so that the > feature would work? (The minimum doesn't have to work *well* IIUC, just do > some useful work and not fail).
The minimum requirement is 2MB. In PoC patch, TIDStore checks how big the radix tree is using dsa_get_total_size(). If the size returned by dsa_get_total_size() (+ some memory used by TIDStore meta information) exceeds maintenance_work_mem, lazy vacuum starts to do index vacuum and heap vacuum. However, when allocating DSA memory for radix_tree_control at creation, we allocate 1MB (DSA_INITIAL_SEGMENT_SIZE) DSM memory and use memory required for radix_tree_control from it. das_get_total_size() returns 1MB even if there is no TID collected. > > > The second is how to limit the size of the radix tree to > > maintenance_work_mem. I think that it's tricky to estimate the maximum > > number of keys in the radix tree that fit in maintenance_work_mem. The > > radix tree size varies depending on the key distribution. The next > > idea I considered was how to limit the size when inserting a key. In > > order to strictly limit the radix tree size, probably we have to > > change the rt_set so that it breaks off and returns false if the radix > > tree size is about to exceed the memory limit when we allocate a new > > node or grow a node kind/class. > > That seems complex, fragile, and wrong scope. > > > Ideally, I'd like to control the size > > outside of radix tree (e.g. TIDStore) since it could introduce > > overhead to rt_set() but probably we need to add such logic in radix > > tree. > > Does the TIDStore have the ability to ask the DSA (or slab context) to see > how big it is? Yes, TIDStore can check it using dsa_get_total_size(). > If a new segment has been allocated that brings us to the limit, we can stop > when we discover that fact. In the local case with slab blocks, it won't be > on nice neat boundaries, but we could check if we're within the largest block > size (~64kB) of overflow. > > Remember when we discussed how we might approach parallel pruning? I > envisioned a local array of a few dozen kilobytes to reduce contention on the > tidstore. We could use such an array even for a single worker (always doing > the same thing is simpler anyway). When the array fills up enough so that the > next heap page *could* overflow it: Stop, insert into the store, and check > the store's memory usage before continuing. Right, I think it's no problem in slab cases. In DSA cases, the new segment size follows a geometric series that approximately doubles the total storage each time we create a new segment. This behavior comes from the fact that the underlying DSM system isn't designed for large numbers of segments. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com