On 2013-11-27 14:53:27 -0500, Noah Misch wrote:
> How would you characterize the chances of this happening with default
> *vacuum_freeze_*_age settings?  Offhand, it seems you would need to encounter
> this bug during each of ~10 generations of autovacuum_freeze_max_age before
> the old rows actually become invisible.

I think realistically, to actually trigger the bug, it needs to happen
quite a bit more often. But in some workloads it's pretty darn easy to
hit. E.g. if significant parts of the table are regularly deleted, lots,
if not most, of your vacuums will spuriously increase relfrozenxid above
the actual value. Each time only by a small amount, but due to that
small increase there never will be an actual full table vacuum since
freeze_table_age will never even remotely be reached.

The client that made me look into the issue noticed problems on
pg_attribute - presumably because of temporary table usage primarily
affecting the tail end of pg_attribute.

> On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:
> > With regard to fixing things up, ISTM the best bet is heap_prune_chain()
> > so far. That's executed b vacuum and by opportunistic pruning and we
> > know we have the appropriate locks there. Looks relatively easy to fix
> > up things there. Not sure if there are any possible routes to WAL log
> > this but using log_newpage()?
> > I am really not sure what the best course of action is :(
> 
> Maximizing detection is valuable, and the prognosis for automated repair is
> poor.  I would want a way to extract tuples having xmin outside the range of
> CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible
> page.

I think the likelihood of the problem affecting !all-visible pages is
close to zero. Each vacuum will try to clean those, so they surely will
get vacuumed at some point. I think the only way that could happen is if
the ConditionalLockBufferForCleanup() fails in each vacuum. And that
seems a bit unlikely.

> At first, I supposed we could offer a tool to blindly freeze such tuples.
> However, there's no guarantee that they are in harmony with recent changes to
> the database; transactions that wrongly considered those tuples invisible may
> have made decisions incompatible with their existence.  For example, reviving
> such a tuple could violate a UNIQUE constraint if the user had already
> replaced the missing row manually.

Good point, although since they are all on all-visible pages sequential
scans will currently already find those. It's primarily index scans that
won't. So it's not really reviving them...
The primary reason why I think it might be a good idea to "revive"
automatically is, that an eventual full-table/freeze vacuum will
currently delete them which seems bad.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to