On Sat, Nov 5, 2022 at 6:23 PM John Naylor <john.nay...@enterprisedb.com> wrote:
>
> On Fri, Nov 4, 2022 at 10:25 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > For parallel heap pruning, multiple workers will insert key-value
> > pairs to the radix tree concurrently. The simplest solution would be a
> > single lock to protect writes but the performance will not be good.
> > Another solution would be that we can divide the tables into multiple
> > ranges so that keys derived from TIDs are not conflicted with each
> > other and have parallel workers process one or more ranges. That way,
> > parallel vacuum workers can build *sub-trees* and the leader process
> > can merge them. In use cases of lazy vacuum, since the write phase and
> > read phase are separated the readers don't need to worry about
> > concurrent updates.
>
> It's a good idea to use ranges for a different reason -- readahead. See 
> commit 56788d2156fc3, which aimed to improve readahead for sequential scans. 
> It might work to use that as a model: Each worker prunes a range of 64 pages, 
> keeping the dead tids in a local array. At the end of the range: lock the tid 
> store, enter the tids into the store, unlock, free the local array, and get 
> the next range from the leader. It's possible contention won't be too bad, 
> and I suspect using small local arrays as-we-go would be faster and use less 
> memory than merging multiple sub-trees at the end.

Seems a promising idea. I think it might work well even in the current
parallel vacuum (ie., single writer). I mean, I think we can have a
single lwlock for shared cases in the first version. If the overhead
of acquiring the lwlock per insertion of key-value is not negligible,
we might want to try this idea.

Apart from that, I'm going to incorporate the comments on 0004 patch
and try a pointer tagging.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to