Tom Lane wrote:
> I managed to crash the executor in the tablespace.sql test while working
> on a 9.1 patch, and discovered that the postmaster fails to recover
> from that. The end of postmaster.log looks like
>
> LOG: all server processes terminated; reinitializing
> LOG: database system was interrupted; last known up at 2010-07-11 19:30:07
> EDT
> LOG: database system was not properly shut down; automatic recovery in
> progress
> LOG: consistent recovery state reached at 0/EE26F30
> LOG: redo starts at 0/EE26F30
> FATAL: directory
> "/home/postgres/pgsql/src/test/regress/testtablespace/PG_9.1_201004261"
> already in use as a tablespace
> CONTEXT: xlog redo create ts: 127158
> "/home/postgres/pgsql/src/test/regress/testtablespace"
> LOG: startup process (PID 13914) exited with exit code 1
> LOG: aborting startup due to startup process failure
>
> It looks to me like those well-intentioned recent changes in this area
> broke the crash-recovery case. Not good.
Sorry for the delay. I didn't realize this was my code that was broken
until Heikki told me via IM.
The bug is that we can't replay mkdir()/symlink() and assume those will
always succeed. I looked at the createdb redo code and it basically
drops the directory before creating it.
The tablespace directory/symlink setup is more complex, so I just wrote
the attached patch to trigger a redo-'delete' tablespace operation
before the create tablespace redo operation.
Ignoring mkdir/symlink creation failure is not an option because the
symlink might point to some wrong location or something.
--
Bruce Momjian <[email protected]> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ None of us is going to be here forever. +
Index: src/backend/commands/tablespace.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
retrieving revision 1.77
diff -c -c -r1.77 tablespace.c
*** src/backend/commands/tablespace.c 18 Jul 2010 04:47:46 -0000 1.77
--- src/backend/commands/tablespace.c 18 Jul 2010 05:17:23 -0000
***************
*** 1355,1368 ****
/* Backup blocks are not used in tblspc records */
Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
! if (info == XLOG_TBLSPC_CREATE)
! {
! xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
! char *location = xlrec->ts_path;
!
! create_tablespace_directories(location, xlrec->ts_id);
! }
! else if (info == XLOG_TBLSPC_DROP)
{
xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
--- 1355,1365 ----
/* Backup blocks are not used in tblspc records */
Assert(!(record->xl_info & XLR_BKP_BLOCK_MASK));
! /*
! * If we are creating a tablespace during recovery, it is unclear
! * what state it is in, so potentially remove it before creating it.
! */
! if (info == XLOG_TBLSPC_DROP || info == XLOG_TBLSPC_CREATE)
{
xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
***************
*** 1395,1400 ****
--- 1392,1407 ----
}
else
elog(PANIC, "tblspc_redo: unknown op code %u", info);
+
+ /* Now create the tablespace we perhaps just removed. */
+ if (info == XLOG_TBLSPC_CREATE)
+ {
+ xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
+ char *location = xlrec->ts_path;
+
+ create_tablespace_directories(location, xlrec->ts_id);
+ }
+
}
void
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers