On 21 December 2015 at 21:28, Alvaro Herrera <alvhe...@2ndquadrant.com>

> Simon Riggs wrote:
> > During VACUUM of btrees, we need to pin all pages, even those where
> tuples
> > are not removed, which I am calling here the "pin scan". This is
> especially
> > a problem during redo, where removing one tuple from a 100GB btree can
> take
> > a minute or more. That causes replication lags. Bad Thing.
> Agreed on there being a problem here.
> As a reminder, this "pin scan" is implemented in the
> HotStandbyActiveInReplay() block in btree_xlog_vacuum; this routine is
> in charge of replaying an action recorded by _bt_delitems_vacuum.  That
> replay works by acquiring cleanup lock on all btree blocks from
> lastBlockVacuumed to "this one" (i.e. the one in which elements are
> being removed).  In essence, this means pinning *all* the blocks in the
> index, which is bad.
> The new code sets lastBlockVacuumed to Invalid; a new test in the replay
> code makes that value not pin anything.  This is supposed to optimize
> the case in question.

Nice summary, thanks.

> One thing that this patch should update is the comment above struct
> xl_btree_vacuum, which currently state that EnsureBlockUnpinned() is one
> of the options to apply to each block, but that routine doesn't exist.


> One possible naive optimization is to avoid locking pages that aren't
> leaf pages, but that would still require fetching the block from disk,
> so it wouldn't actually optimize anything other than the cleanup lock
> acquisition.  (And probably not too useful, since leaf pages are said to
> be 95% - 99% of index pages anyway.)

Agreed, not worth it.

> Also of interest is the comment above the call to _bt_delitems_vacuum in
> btvacuumpage(); that needs an update too.


> It appears to me that your patch changes the call in btvacuumscan() but
> not the one in btvacuumpage().  I assume you should be changing both.

Yes, that was correct. Updated.

> I think the new comment that talks about Toast Index should explain
> *why* we can skip the pinning in all cases except that one, instead of
> just saying we can do it.

Greatly expanded comments.

Thanks for the review.

New patch attached.

Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: avoid_standby_pin_scan.v2.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to