On Thu, Jul 21, 2011 at 12:17 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas > > I think you can sidestep that > > if you check that the page's vacuum LSN <= vacuum LSN in pg_class, > instead > > of equality. > > I don't think that works, because the point of storing the LSN in > pg_class is to verify that the vacuum completed the index cleanup > without error. The fact that a newer vacuum accomplished that goal > does not mean that all older ones did. > > The way we force the subsequent vacuum to also look at the pages scanned and pruned by previous failed vacuum, all the pages that have dead-vacuum line pointers would have a new stamp once the vacuum finishes successfully and the pg_class would have the same stamp. > > Ignoring the issue stated in previous paragraph, I think you wouldn't > > actually need an 64-bit LSN. A smaller counter is enough, as wrap-around > > doesn't matter. In fact, a single bit would be enough. After a successful > > vacuum, the counter on each heap page (with dead line pointers) is N, and > > the value in pg_class is N. There are no other values on the heap, > because > > vacuum will have cleaned them up. When you begin the next vacuum, it will > > stamp pages with N+1. So at any stage, there is only one of two values on > > any page, so a single bit is enough. (But as I said, that doesn't hold if > > vacuum skips some pages thanks to the visibility map) > > If this can be made to work, it's a very appealing idea. I thought more about it and for a moment believed that we can do this with just a bit since we rescan the pages with dead and dead-vacuum line pointers after an aborted vacuum, but concluded that a bit or a small counter is not good enough since other backends might be running with a stale value and would get fooled into believing that they can collect the dead-vacuum line pointers before the index pointers are actually removed. We can still use a 32-bit counter though since the wrap-around for that is practically very large for any backend to still run with such a stale counter (you would need more than 1 billion vacuums on the same table in between for you to hit this). > The patch as > submitted uses lp_off to store a single bit, to distinguish between > vacuum and dead-vacuumed, but we could actually have (for greater > safety and debuggability) a 15-byte counter that just wraps around > from 32,767 to 1. (Maybe it would be wise to reserve a few counter > values, or a few bits, or both, for future projects.) That would > eliminate the need to touch PageRepairFragmentation() or use the > special space, since all the information would be in the line pointer > itself. Not having to rearrange the page to reclaim dead line > pointers is appealing, too. > > Not sure if I get you here. We need a mechanism to distinguish between dead and dead-vacuum line pointers. How would the counter (which I assume you mean 15-bit and not byte) help solve that ? Or are you just suggesting replacing LSN with the counter in the page header ? > > Is there something in place to make sure that pruning uses an up-to-date > > relindxvacxlogid/off value? I guess it doesn't matter if it's > out-of-date, > > you'll just miss the opportunity to remove some dead tuples. > > This seems like a tricky problem, because it could cause us to > repeatedly fail to remove the same dead line pointers, which would be > poor. We could do something like this: after updating pg_class, > vacuum send an interrupt to any backend which holds RowExclusiveLock > or higher on that relation. The interrupt handler just sets a flag. > If that backend does heap_page_prune() and sees the flag set, it knows > that it needs to recheck pg_class. This is a bit grotty and doesn't > completely close the race condition (the signal might not arrive in > time), but it ought to make it narrow enough not to matter in > practice. > > I am not too excited about adding that complexity to the code. Even if a backend does not have up-to-date value, it will fail to collect the dead-vacuum pointers, but soon either it will catch up or some other backend will remove them or the next vacuum will take care of it. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com