On Thu, Jul 21, 2011 at 12:17 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas
> > I think you can sidestep that
> > if you check that the page's vacuum LSN <= vacuum LSN in pg_class,
> instead
> > of equality.
>
> I don't think that works, because the point of storing the LSN in
> pg_class is to verify that the vacuum completed the index cleanup
> without error.  The fact that a newer vacuum accomplished that goal
> does not mean that all older ones did.
>
>
The way we force the subsequent vacuum to also look at the pages scanned and
pruned by previous failed vacuum, all the pages that have dead-vacuum line
pointers would have a new stamp once the vacuum finishes successfully and
the pg_class would have the same stamp.


> > Ignoring the issue stated in previous paragraph, I think you wouldn't
> > actually need an 64-bit LSN. A smaller counter is enough, as wrap-around
> > doesn't matter. In fact, a single bit would be enough. After a successful
> > vacuum, the counter on each heap page (with dead line pointers) is N, and
> > the value in pg_class is N. There are no other values on the heap,
> because
> > vacuum will have cleaned them up. When you begin the next vacuum, it will
> > stamp pages with N+1. So at any stage, there is only one of two values on
> > any page, so a single bit is enough. (But as I said, that doesn't hold if
> > vacuum skips some pages thanks to the visibility map)
>
> If this can be made to work, it's a very appealing idea.


I thought more about it and for a moment believed that we can do this with
just a bit since we rescan the  pages with dead and dead-vacuum line
pointers after an aborted vacuum, but concluded that a bit or a small
counter is not good enough since other backends might be running with a
stale value and would get fooled into believing that they can collect the
dead-vacuum line pointers before the index pointers are actually removed. We
can still use a 32-bit counter though since the wrap-around for that is
practically very large for any backend to still run with such a stale
counter (you would need more than 1 billion vacuums on the same table in
between for you to hit this).


> The patch as
> submitted uses lp_off to store a single bit, to distinguish between
> vacuum and dead-vacuumed, but we could actually have (for greater
> safety and debuggability) a 15-byte counter that just wraps around
> from 32,767 to 1.  (Maybe it would be wise to reserve a few counter
> values, or a few bits, or both, for future projects.)  That would
> eliminate the need to touch PageRepairFragmentation() or use the
> special space, since all the information would be in the line pointer
> itself.  Not having to rearrange the page to reclaim dead line
> pointers is appealing, too.
>
>
Not sure if I get you here. We need a mechanism to distinguish between dead
and dead-vacuum line pointers. How would the counter (which I assume you
mean 15-bit and not byte) help solve that ? Or are you just suggesting
replacing LSN with the counter in the page header ?


> > Is there something in place to make sure that pruning uses an up-to-date
> > relindxvacxlogid/off value? I guess it doesn't matter if it's
> out-of-date,
> > you'll just miss the opportunity to remove some dead tuples.
>
> This seems like a tricky problem, because it could cause us to
> repeatedly fail to remove the same dead line pointers, which would be
> poor.  We could do something like this: after updating pg_class,
> vacuum send an interrupt to any backend which holds RowExclusiveLock
> or higher on that relation.  The interrupt handler just sets a flag.
> If that backend does heap_page_prune() and sees the flag set, it knows
> that it needs to recheck pg_class.  This is a bit grotty and doesn't
> completely close the race condition (the signal might not arrive in
> time), but it ought to make it narrow enough not to matter in
> practice.
>
>
I am not too excited about adding that complexity to the code. Even if a
backend does not have up-to-date value, it will fail to collect the
dead-vacuum pointers, but soon either it will catch up or some other backend
will remove them or the next vacuum will take care of it.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

Reply via email to