[HACKERS] Why xlog stuff is done after the filetruncate op in smgrtruncate?

2007-04-16 Thread Jacky Leng
Shouldn't we write xlog record before we do a physical operation?

An test case:
1. set full_page_writes off;
2. startup database; create a table; insert 10 rows in it; shutdown
database;
3. startup database again; delete all rows from this table;
4. vacuum this table, and it will come into smgrtruncate; kill postmaster
before smgrtruncate do xlog stuff(set a breakpoint before xlog stuff);
5. startup database the 3rd time, during the recovery, the database will
crash with:
PANIC:  WAL contains references to invalid pages




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Why xlog stuff is done after the filetruncate op in smgrtruncate?

2007-04-16 Thread Tom Lane
Jacky Leng [EMAIL PROTECTED] writes:
 Shouldn't we write xlog record before we do a physical operation?

The reasoning for not doing it that way was that we can't be sure
beforehand that the filesystem operation will succeed.  If we xlog
the truncate first, it fails, and then we crash, we're in deep trouble
because WAL replay will try to do the truncate and likewise fail,
preventing the system from restarting.  Other non-rollbackable
filesystem ops (I think just CREATE/DROP DATABASE/TABLESPACE) are done
the same way.  CREATE DATABASE would be particularly nasty to reverse
the order for, since there are obvious cases like out-of-disk-space
that will make it fail.

 An test case:
 1. set full_page_writes off;
 2. startup database; create a table; insert 10 rows in it; shutdown
 database;
 3. startup database again; delete all rows from this table;
 4. vacuum this table, and it will come into smgrtruncate; kill postmaster
 before smgrtruncate do xlog stuff(set a breakpoint before xlog stuff);
 5. startup database the 3rd time, during the recovery, the database will
 crash with:
 PANIC:  WAL contains references to invalid pages

Hmm.  Maybe we need something like xlog a tentative truncate, do it,
xlog real truncate?  The tentative truncate would merely tell replay
not to be surprised if those blocks aren't there anymore.  Seems a bit
grotty though.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq