Re: POC: Cleaning up orphaned files using undo logs

Andres Freund Tue, 20 Aug 2019 10:44:13 -0700

Hi,

On 2019-08-20 17:11:38 +0530, Dilip Kumar wrote:
> On Wed, Aug 14, 2019 at 10:35 PM Andres Freund <and...@anarazel.de> wrote:
> > On 2019-08-14 14:48:07 +0530, Dilip Kumar wrote:
> > > On Wed, Aug 14, 2019 at 12:27 PM Andres Freund <and...@anarazel.de> wrote:
> 
> > I don't think we can normally pin the undo buffers properly at that
> > stage. Without knowing the correct contents of the table page - which we
> > can't know without holding some form of lock preventing modifications -
> > we can't know how big our undo records are going to be. And we can't
> > just have buffers that don't exist on disk in shared memory, and we
> > don't want to allocate undo that we then don't need. So I think what
> > we'd have to do at that stage, is to "pre-allocate" buffers for the
> > maximum amount of UNDO needed, but mark the associated bufferdesc as not
> > yet valid. These buffers would have a pincount > 0, but BM_TAG_VALID
> > would not be set.
> >
> > So at the start of a function that will need to insert undo we'd need to
> > pre-reserve the maximum number of buffers we could potentially
> > need. That reservation stage would
> >
> > a) pin the page with the current end of the undo
> > b) if needed pin the page of older undo that we need to update (e.g. to
> >    update the next pointer)
> > c) perform clock sweep etc to acquire (find or create) enough clean to
> >    hold the maximum amount of undo needed. These buffers would be marked
> >    as !BM_TAG_VALID | BUF_REFCOUNT_ONE.
> >
> > I assume that we'd make a) cheap by keeping it pinned for undo logs that
> > a backend is actively attached to. b) should only be needed once in a
> > transaction, so it's not too bad. c) we'd probably need to amortize
> > across multiple undo insertions, by keeping the unused buffers pinned
> > until the end of the transaction.
> >
> > I assume that having the infrastructure c) might also make some code
> > for already in postgres easier. There's obviously some issues around
> > guaranteeing that the maximum number of such buffers isn't high.
> 
> 
> I have analyzed this further, I think there is a problem if the
> record/s will not fit into the current undo log and we will have to
> switch the log.  Because before knowing the actual record length we
> are not sure whether the undo log will switch or not and which undo
> log we will get.  And, without knowing the logno (rnode) how we are
> going to pin the buffers?  Am I missing something?


That's precisely why I was suggesting (at the start of the quoted block
above) to not associate the buffers with pages at that point. Instead
just have clean, pinned, *unassociated* buffers. Which can be
re-associated without any IO.


> Apart from this while analyzing the other code I have noticed that in
> the current PG code we have few occurrences where try to read buffer
> under the buffer lock held.

> 1.
> In gistplacetopage
> {
> ...
> for (; ptr; ptr = ptr->next)
> {
> /* Allocate new page */
> ptr->buffer = gistNewBuffer(rel);
> GISTInitBuffer(ptr->buffer, (is_leaf) ? F_LEAF : 0);
> ptr->page = BufferGetPage(ptr->buffer);
> ptr->block.blkno = BufferGetBlockNumber(ptr->buffer);
> }
> 2. During page split we find new buffer while holding the lock on the
> current buffer.
> 
> That doesn't mean that we can't do better but I am just referring to
> the existing code where we already have such issues.

Those are pretty clearly edge-cases, whereas the undo case at hand is a
very common path. Note again that heapam.c goes to considerably trouble
to never do this for common cases.

Greetings,

Andres Freund

Re: POC: Cleaning up orphaned files using undo logs

Reply via email to