Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Ugh.  Is there another solution to this?  Say, sync the buffer so that
> the hint bits are written to disk?

Yeah.  The original design for all this is explained by the notes for

 * When this is called, we know that the database logically contains no
 * reference to transaction IDs older than oldestXact.  However, we must
 * not truncate the CLOG until we have performed a checkpoint, to ensure
 * that no such references remain on disk either; else a crash just after
 * the truncation might leave us with a problem.

The pre-8.2 coding is actually perfectly safe within a single database,
because TruncateCLOG is only called at the end of a database-wide
vacuum, and so the checkpoint is guaranteed to have flushed valid hint
bits for all tuples to disk.  There is a risk in other databases though.
I think that in the 8.2 structure the equivalent notion must be that
VACUUM has to flush and fsync a table before it can advance the table's

That still leaves us with the problem of hint bits not being updated
during WAL replay.  I think the best solution for this is for WAL replay
to force relvacuumxid to equal relminxid (btw, these field names seem
poorly chosen, and the comment in catalogs.sgml isn't self-explanatory...)
rather than adopting the value shown in the WAL record.  This probably
is best done by abandoning the generic "overwrite tuple" WAL record type
in favor of something specific to minxid updates.  The effect would then
be that a PITR slave would not truncate its clog beyond the freeze
horizon until it had performed a vacuum of its own.

The point about aborted xmax being a risk factor is a good one.  I don't
think the risk is material for ordinary crash recovery scenarios,
because ordinarily we'd have many opportunities to set the hint bit
before anything really breaks, but it's definitely an issue for
long-term PITR replay scenarios.

I'll work on this as soon as I get done with the btree-index issue I'm
messing with now.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?


Reply via email to