Re: [HACKERS] crash-recovery test vs windows

2017-09-28 Thread Andrew Dunstan


On 09/28/2017 12:29 PM, Andrew Dunstan wrote:
> The new crash-recovery check is failing pretty reliably on jacana. It's
> already excluded on MSVC machines, so I'm inclined to exclude it on Msys
> as well.
>
>

Sorry, I meant crash-restart

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] crash-recovery test vs windows

2017-09-28 Thread Andrew Dunstan

The new crash-recovery check is failing pretty reliably on jacana. It's
already excluded on MSVC machines, so I'm inclined to exclude it on Msys
as well.


Thoughts?


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Crash recovery

2013-11-05 Thread Soroosh Sardari
Hi

When PG crashes or the computer turned down unexpectedly, next time
postmaster
starts up, it does the crash recovery, actually redo xlog records, vacuum,
etc.

What module is responsible for crash recovery?

Regards,
Soroosh Sardari


Re: [HACKERS] Crash recovery

2013-11-05 Thread Heikki Linnakangas

On 05.11.2013 13:21, Soroosh Sardari wrote:

When PG crashes or the computer turned down unexpectedly, next time
postmaster
starts up, it does the crash recovery, actually redo xlog records, vacuum,
etc.

What module is responsible for crash recovery?


See src/backend/access/transam/xlog.c. The code to read the WAL is 
there. It calls redo-routines in other modules, see e.g heap_redo for 
replaying heap records, btree_redo for B-tree index records and so forth.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-20 Thread Bruce Momjian
Bruce Momjian wrote:
 Bruce Momjian wrote:
   The attached patch does as suggested.  I added the recovery code to the
   create tablespace function so I didn't have to duplicate all the code
   that computes the path names.
   
   Attached.
  
  Uh, another question.  Looking at the createdb recovery, I see:
  
  /*
   * Our theory for replaying a CREATE is to forcibly drop the target
   * subdirectory if present, then re-copy the source data. This may 
  be
   * more work than needed, but it is simple to implement.
   */
  if (stat(dst_path, st) == 0  S_ISDIR(st.st_mode))
  {
  if (!rmtree(dst_path, true))
  ereport(WARNING,
  (errmsg(some useless files may be left behind in 
  old database directory \%s\,
  dst_path)));
  }
  
  Should I be using rmtree() on the mkdir target?
  
  Also, the original tablespace recovery code did not drop the symlink
  first.  I assume that was not a bug only because we don't support moving
  tablespaces:
 
 For consistency with CREATE DATABASE recovery and for reliablity, I
 coded the rmtree() call instead.  Patch attached.

Attached patch applied to HEAD and 9.0.   9.0 open item moved to
completed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +
Index: src/backend/commands/dbcommands.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/dbcommands.c,v
retrieving revision 1.235
diff -c -c -r1.235 dbcommands.c
*** src/backend/commands/dbcommands.c	26 Feb 2010 02:00:38 -	1.235
--- src/backend/commands/dbcommands.c	20 Jul 2010 18:11:06 -
***
*** 1908,1913 
--- 1908,1914 
  		if (stat(dst_path, st) == 0  S_ISDIR(st.st_mode))
  		{
  			if (!rmtree(dst_path, true))
+ /* If this failed, copydir() below is going to error. */
  ereport(WARNING,
  		(errmsg(some useless files may be left behind in old database directory \%s\,
  dst_path)));
Index: src/backend/commands/tablespace.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
retrieving revision 1.77
diff -c -c -r1.77 tablespace.c
*** src/backend/commands/tablespace.c	18 Jul 2010 04:47:46 -	1.77
--- src/backend/commands/tablespace.c	20 Jul 2010 18:11:07 -
***
*** 562,567 
--- 562,586 
  		 location)));
  	}
  
+ 	if (InRecovery)
+ 	{
+ 		struct stat st;
+ 
+ 		/*
+ 		 * Our theory for replaying a CREATE is to forcibly drop the target
+ 		 * subdirectory if present, and then recreate it. This may be
+ 		 * more work than needed, but it is simple to implement.
+ 		 */
+ 		if (stat(location_with_version_dir, st) == 0  S_ISDIR(st.st_mode))
+ 		{
+ 			if (!rmtree(location_with_version_dir, true))
+ /* If this failed, mkdir() below is going to error. */
+ ereport(WARNING,
+ 		(errmsg(some useless files may be left behind in old database directory \%s\,
+ location_with_version_dir)));
+ 		}
+ 	}
+ 
  	/*
  	 * The creation of the version directory prevents more than one tablespace
  	 * in a single location.
***
*** 580,585 
--- 599,614 
  			location_with_version_dir)));
  	}
  
+ 	/* Remove old symlink in recovery, in case it points to the wrong place */
+ 	if (InRecovery)
+ 	{
+ 		if (unlink(linkloc)  0  errno != ENOENT)
+ 			ereport(ERROR,
+ 	(errcode_for_file_access(),
+ 	 errmsg(could not remove symbolic link \%s\: %m,
+ 			linkloc)));
+ 	}
+ 	
  	/*
  	 * Create the symlink under PGDATA
  	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-18 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Maybe you should check that it points to the right location? Or drop and 
 recreate the symlink, and ignore failure at mkdir.

More specifically, ignore EEXIST failure when replaying mkdir.  Anything
else is still a problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-18 Thread Bruce Momjian
Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  Maybe you should check that it points to the right location? Or drop and 
  recreate the symlink, and ignore failure at mkdir.
 
 More specifically, ignore EEXIST failure when replaying mkdir.  Anything
 else is still a problem.

Thanks for the help.  I tried to find somewhere else in our recovery
code that was similar but didn't find anything.

The attached patch does as suggested.  I added the recovery code to the
create tablespace function so I didn't have to duplicate all the code
that computes the path names.

Attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +
Index: src/backend/commands/tablespace.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
retrieving revision 1.77
diff -c -c -r1.77 tablespace.c
*** src/backend/commands/tablespace.c	18 Jul 2010 04:47:46 -	1.77
--- src/backend/commands/tablespace.c	18 Jul 2010 15:53:34 -
***
*** 568,578 
  	 */
  	if (mkdir(location_with_version_dir, S_IRWXU)  0)
  	{
  		if (errno == EEXIST)
! 			ereport(ERROR,
! 	(errcode(ERRCODE_OBJECT_IN_USE),
! 	 errmsg(directory \%s\ already in use as a tablespace,
! 			location_with_version_dir)));
  		else
  			ereport(ERROR,
  	(errcode_for_file_access(),
--- 568,582 
  	 */
  	if (mkdir(location_with_version_dir, S_IRWXU)  0)
  	{
+ 		/* In recovery, directory might already exists, which is OK */
  		if (errno == EEXIST)
! 		{
! 			if (!InRecovery)
! ereport(ERROR,
! 		(errcode(ERRCODE_OBJECT_IN_USE),
! 		 errmsg(directory \%s\ already in use as a tablespace,
! location_with_version_dir)));
! 		}
  		else
  			ereport(ERROR,
  	(errcode_for_file_access(),
***
*** 580,585 
--- 584,599 
  			location_with_version_dir)));
  	}
  
+ 	/* Remove old symlink in recovery, in case it points to the wrong place */
+ 	if (InRecovery)
+ 	{
+ 		if (unlink(linkloc)  0  errno != ENOENT)
+ 			ereport(ERROR,
+ 	(errcode_for_file_access(),
+ 	 errmsg(could not remove symbolic link \%s\: %m,
+ 			linkloc)));
+ 	}
+ 	
  	/*
  	 * Create the symlink under PGDATA
  	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-18 Thread Bruce Momjian
Bruce Momjian wrote:
 Tom Lane wrote:
  Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
   Maybe you should check that it points to the right location? Or drop and 
   recreate the symlink, and ignore failure at mkdir.
  
  More specifically, ignore EEXIST failure when replaying mkdir.  Anything
  else is still a problem.
 
 Thanks for the help.  I tried to find somewhere else in our recovery
 code that was similar but didn't find anything.
 
 The attached patch does as suggested.  I added the recovery code to the
 create tablespace function so I didn't have to duplicate all the code
 that computes the path names.
 
 Attached.

Uh, another question.  Looking at the createdb recovery, I see:

/*
 * Our theory for replaying a CREATE is to forcibly drop the target
 * subdirectory if present, then re-copy the source data. This may be
 * more work than needed, but it is simple to implement.
 */
if (stat(dst_path, st) == 0  S_ISDIR(st.st_mode))
{
if (!rmtree(dst_path, true))
ereport(WARNING,
(errmsg(some useless files may be left behind in old 
database directory \%s\,
dst_path)));
}

Should I be using rmtree() on the mkdir target?

Also, the original tablespace recovery code did not drop the symlink
first.  I assume that was not a bug only because we don't support moving
tablespaces:

-   /* Create the symlink if not already present */
-   linkloc = (char *) palloc(OIDCHARS + OIDCHARS + 1);
-   sprintf(linkloc, pg_tblspc/%u, xlrec-ts_id);
-
-   if (symlink(location, linkloc)  0)
-   {
-   if (errno != EEXIST)
-   ereport(ERROR,
-   (errcode_for_file_access(),
-errmsg(could not create 
symbolic link \%s\: %m,
-   linkloc)));
-   }

Still, it seems logical to unlink it before creating it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-18 Thread Bruce Momjian
Bruce Momjian wrote:
  The attached patch does as suggested.  I added the recovery code to the
  create tablespace function so I didn't have to duplicate all the code
  that computes the path names.
  
  Attached.
 
 Uh, another question.  Looking at the createdb recovery, I see:
 
 /*
  * Our theory for replaying a CREATE is to forcibly drop the target
  * subdirectory if present, then re-copy the source data. This may be
  * more work than needed, but it is simple to implement.
  */
 if (stat(dst_path, st) == 0  S_ISDIR(st.st_mode))
 {
 if (!rmtree(dst_path, true))
 ereport(WARNING,
 (errmsg(some useless files may be left behind in old 
 database directory \%s\,
 dst_path)));
 }
 
 Should I be using rmtree() on the mkdir target?
 
 Also, the original tablespace recovery code did not drop the symlink
 first.  I assume that was not a bug only because we don't support moving
 tablespaces:

For consistency with CREATE DATABASE recovery and for reliablity, I
coded the rmtree() call instead.  Patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +
Index: src/backend/commands/tablespace.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
retrieving revision 1.77
diff -c -c -r1.77 tablespace.c
*** src/backend/commands/tablespace.c	18 Jul 2010 04:47:46 -	1.77
--- src/backend/commands/tablespace.c	19 Jul 2010 04:59:03 -
***
*** 538,543 
--- 538,544 
  	char	   *linkloc = palloc(OIDCHARS + OIDCHARS + 1);
  	char	   *location_with_version_dir = palloc(strlen(location) + 1 +
     strlen(TABLESPACE_VERSION_DIRECTORY) + 1);
+ 	struct stat st;
  
  	sprintf(linkloc, pg_tblspc/%u, tablespaceoid);
  	sprintf(location_with_version_dir, %s/%s, location,
***
*** 562,567 
--- 563,584 
  		 location)));
  	}
  
+ 	if (InRecovery)
+ 	{
+ 		/*
+ 		 * Our theory for replaying a CREATE is to forcibly drop the target
+ 		 * subdirectory if present, and then recreate it. This may be
+ 		 * more work than needed, but it is simple to implement.
+ 		 */
+ 		if (stat(location_with_version_dir, st) == 0  S_ISDIR(st.st_mode))
+ 		{
+ 			if (!rmtree(location_with_version_dir, true))
+ ereport(WARNING,
+ 		(errmsg(some useless files may be left behind in old database directory \%s\,
+ location_with_version_dir)));
+ 		}
+ 	}
+ 
  	/*
  	 * The creation of the version directory prevents more than one tablespace
  	 * in a single location.
***
*** 580,585 
--- 597,612 
  			location_with_version_dir)));
  	}
  
+ 	/* Remove old symlink in recovery, in case it points to the wrong place */
+ 	if (InRecovery)
+ 	{
+ 		if (unlink(linkloc)  0  errno != ENOENT)
+ 			ereport(ERROR,
+ 	(errcode_for_file_access(),
+ 	 errmsg(could not remove symbolic link \%s\: %m,
+ 			linkloc)));
+ 	}
+ 	
  	/*
  	 * Create the symlink under PGDATA
  	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-17 Thread Bruce Momjian
Tom Lane wrote:
 I managed to crash the executor in the tablespace.sql test while working
 on a 9.1 patch, and discovered that the postmaster fails to recover
 from that.  The end of postmaster.log looks like
 
 LOG:  all server processes terminated; reinitializing
 LOG:  database system was interrupted; last known up at 2010-07-11 19:30:07 
 EDT
 LOG:  database system was not properly shut down; automatic recovery in 
 progress
 LOG:  consistent recovery state reached at 0/EE26F30
 LOG:  redo starts at 0/EE26F30
 FATAL:  directory 
 /home/postgres/pgsql/src/test/regress/testtablespace/PG_9.1_201004261 
 already in use as a tablespace
 CONTEXT:  xlog redo create ts: 127158 
 /home/postgres/pgsql/src/test/regress/testtablespace
 LOG:  startup process (PID 13914) exited with exit code 1
 LOG:  aborting startup due to startup process failure
 
 It looks to me like those well-intentioned recent changes in this area
 broke the crash-recovery case.  Not good.

Sorry for the delay.  I didn't realize this was my code that was broken
until Heikki told me via IM.

The bug is that we can't replay mkdir()/symlink() and assume those will
always succeed.  I looked at the createdb redo code and it basically
drops the directory before creating it.

The tablespace directory/symlink setup is more complex, so I just wrote
the attached patch to trigger a redo-'delete' tablespace operation
before the create tablespace redo operation.

Ignoring mkdir/symlink creation failure is not an option because the
symlink might point to some wrong location or something.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +
Index: src/backend/commands/tablespace.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
retrieving revision 1.77
diff -c -c -r1.77 tablespace.c
*** src/backend/commands/tablespace.c	18 Jul 2010 04:47:46 -	1.77
--- src/backend/commands/tablespace.c	18 Jul 2010 05:17:23 -
***
*** 1355,1368 
  	/* Backup blocks are not used in tblspc records */
  	Assert(!(record-xl_info  XLR_BKP_BLOCK_MASK));
  
! 	if (info == XLOG_TBLSPC_CREATE)
! 	{
! 		xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
! 		char	   *location = xlrec-ts_path;
! 
! 		create_tablespace_directories(location, xlrec-ts_id);
! 	}
! 	else if (info == XLOG_TBLSPC_DROP)
  	{
  		xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
  
--- 1355,1365 
  	/* Backup blocks are not used in tblspc records */
  	Assert(!(record-xl_info  XLR_BKP_BLOCK_MASK));
  
! 	/*
! 	 *	If we are creating a tablespace during recovery, it is unclear
! 	 *	what state it is in, so potentially remove it before creating it.
! 	 */
! 	if (info == XLOG_TBLSPC_DROP || info == XLOG_TBLSPC_CREATE)
  	{
  		xl_tblspc_drop_rec *xlrec = (xl_tblspc_drop_rec *) XLogRecGetData(record);
  
***
*** 1395,1400 
--- 1392,1407 
  	}
  	else
  		elog(PANIC, tblspc_redo: unknown op code %u, info);
+ 
+ 	/* Now create the tablespace we perhaps just removed. */
+ 	if (info == XLOG_TBLSPC_CREATE)
+ 	{
+ 		xl_tblspc_create_rec *xlrec = (xl_tblspc_create_rec *) XLogRecGetData(record);
+ 		char	   *location = xlrec-ts_path;
+ 
+ 		create_tablespace_directories(location, xlrec-ts_id);
+ 	}
+ 
  }
  
  void

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-17 Thread Heikki Linnakangas

On 18/07/10 08:22, Bruce Momjian wrote:

The bug is that we can't replay mkdir()/symlink() and assume those will
always succeed.  I looked at the createdb redo code and it basically
drops the directory before creating it.

The tablespace directory/symlink setup is more complex, so I just wrote
the attached patch to trigger a redo-'delete' tablespace operation
before the create tablespace redo operation.


Redoing a drop talespace assumes the tablespace directory is empty, 
which it necessarily isn't when redoing a create tablespace command:


postgres=# CREATE TABLESPACE t LOCATION '/tmp/t';
CREATE TABLESPACE
postgres=#  CREATE TABLE tfoo (id int4) TABLESPACE t;
CREATE TABLE
postgres=# \q
$ killall -9 postmaster
$ bin/postmaster -D data
LOG:  database system was interrupted; last known up at 2010-07-18 
08:48:32 EEST
LOG:  database system was not properly shut down; automatic recovery in 
progress

LOG:  consistent recovery state reached at 0/5E889C
LOG:  redo starts at 0/5E889C
FATAL:  tablespace 16402 is not empty
CONTEXT:  xlog redo create ts: 16402 /tmp/t
LOG:  startup process (PID 5987) exited with exit code 1
LOG:  aborting startup due to startup process failure

Also, casting the xl_tblspc_create_rec as xl_tblspc_drop_rec is a bit 
questionable. It works because both structs begin with the tablespace 
Oid, but it doesn't look right, and can break in hard-to-notice ways in 
the future if the structure of those structs change in the future.



Ignoring mkdir/symlink creation failure is not an option because the
symlink might point to some wrong location or something.


Maybe you should check that it points to the right location? Or drop and 
recreate the symlink, and ignore failure at mkdir.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] crash-recovery replay of CREATE TABLESPACE is broken in HEAD

2010-07-11 Thread Tom Lane
I managed to crash the executor in the tablespace.sql test while working
on a 9.1 patch, and discovered that the postmaster fails to recover
from that.  The end of postmaster.log looks like

LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2010-07-11 19:30:07 EDT
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  consistent recovery state reached at 0/EE26F30
LOG:  redo starts at 0/EE26F30
FATAL:  directory 
/home/postgres/pgsql/src/test/regress/testtablespace/PG_9.1_201004261 already 
in use as a tablespace
CONTEXT:  xlog redo create ts: 127158 
/home/postgres/pgsql/src/test/regress/testtablespace
LOG:  startup process (PID 13914) exited with exit code 1
LOG:  aborting startup due to startup process failure

It looks to me like those well-intentioned recent changes in this area
broke the crash-recovery case.  Not good.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers