Simon Riggs wrote:
> On Wed, 2007-10-17 at 17:36 +0100, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> On Wed, 2007-10-17 at 15:02 +0100, Heikki Linnakangas wrote:
>>>> Simon Riggs wrote:
>>>>> If you've got a better problem statement it would be good to get that
>>>>> right first before we discuss solutions.
>>>> Reusing a relfilenode of a deleted relation, before next checkpoint
>>>> following the commit of the deleting transaction, for an operation that
>>>> doesn't WAL log the contents of the new relation, leads to data loss on
>>> OK, thanks.
>>> I wasn't aware we reused refilenode ids. The code in GetNewOid() doesn't
>>> look deterministic to me, or at least isn't meant to be.
>>> GetNewObjectId() should be cycling around, so although the oid index
>>> scan using SnapshotDirty won't see committed deleted rows that shouldn't
>>> matter for 2^32 oids. So what gives?
>> I don't think you still quite understand what's happening.
> Clearly. It's not a problem to admit that.
>> is not interesting here, look at GetNewRelFileNode() instead. And
>> neither are snapshots or MVCC visibility rules.
> Which calls GetNewOid() in all cases, AFAICS.
> How does the reuse you say is happening come about? Seems like the bug
> is in the reuse, not in how we cope with potential reuse.
After a table is dropped, the dropping transaction has been committed,
and the relation file has been deleted, there's nothing preventing the
reuse. There's no trace of that relfilenode in the system (except in the
WAL, which we never look into except on WAL replay). There's a dead row
in pg_class with that relfilenode, but even that could be vacuumed away
(not that it matters because we don't examine that).
Now the problem is that there's a record in the WAL to delete a relation
file with that relfilenode. If that relfilenode was reused, we delete
the contents of the new relation file when we replay that WAL record.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings