Re: [HACKERS] Assertion failure twophase.c (2) (testing HS/SR)

2010-03-12 Thread Simon Riggs
On Thu, 2010-03-11 at 11:29 +0200, Heikki Linnakangas wrote:

 I'm still not any wiser on what's causing that, but I've fixed the bug
 in KnownAssignedXidsMany() now.

Yeh, I've been mulling this over for a few days now and I can't see a
way that could have happened either.

I agree with your fix and the stronger placement of the Assertion.
Thanks.

I will be doing some further investigation in that area as well, over
next week or so.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure twophase.c (2) (testing HS/SR)

2010-03-11 Thread Heikki Linnakangas
Erik Rijkers wrote:
 in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30
 
 With three patches:
 
   new_smart_shutdown_20100201.patch
   extend_format_of_recovery_info_funcs_v4.20100303.patch

Got a link to these two patches? I couldn't find them with a quick search.

   fix-KnownAssignedXidsRemoveMany-1.patch
 
   pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary
 
 FailedAssertion, File: twophase.c, Line: 1201.
 
 The standby was restarted and seems to catch up OK again.
 ...
 see also:
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg02221.php

I'm still not any wiser on what's causing that, but I've fixed the bug
in KnownAssignedXidsMany() now.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assertion failure twophase.c (2) (testing HS/SR)

2010-03-11 Thread Fujii Masao
On Thu, Mar 11, 2010 at 6:29 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Erik Rijkers wrote:
 in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30

 With three patches:

   new_smart_shutdown_20100201.patch

http://archives.postgresql.org/pgsql-hackers/2010-01/msg03116.php

   extend_format_of_recovery_info_funcs_v4.20100303.patch

http://archives.postgresql.org/pgsql-hackers/2010-03/msg00175.php

 Got a link to these two patches? I couldn't find them with a quick search.

For your convenience, I attached those patches in this post.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***
*** 278,283  typedef enum
--- 278,284 
  	PM_RECOVERY_CONSISTENT,		/* consistent recovery mode */
  	PM_RUN,		/* normal database is alive state */
  	PM_WAIT_BACKUP,/* waiting for online backup mode to end */
+ 	PM_WAIT_READONLY,			/* waiting for read only backends to exit */
  	PM_WAIT_BACKENDS,			/* waiting for live backends to exit */
  	PM_SHUTDOWN,/* waiting for bgwriter to do shutdown ckpt */
  	PM_SHUTDOWN_2,/* waiting for archiver and walsenders to finish */
***
*** 2165,2171  pmdie(SIGNAL_ARGS)
  /* and the walwriter too */
  if (WalWriterPID != 0)
  	signal_child(WalWriterPID, SIGTERM);
! pmState = PM_WAIT_BACKUP;
  			}
  
  			/*
--- 2166,2173 
  /* and the walwriter too */
  if (WalWriterPID != 0)
  	signal_child(WalWriterPID, SIGTERM);
! /* online backup mode is active only when normal processing */
! pmState = (pmState == PM_RUN) ? PM_WAIT_BACKUP : PM_WAIT_READONLY;
  			}
  
  			/*
***
*** 2840,2845  PostmasterStateMachine(void)
--- 2842,2870 
  	}
  
  	/*
+ 	 * If we are in a state-machine state that implies waiting for read only
+ 	 * backends to exit, see if they're all gone, and change state if so.
+ 	 */
+ 	if (pmState == PM_WAIT_READONLY)
+ 	{
+ 		/*
+ 		 * PM_WAIT_READONLY state ends when we have no regular backends that
+ 		 * have been started during recovery. Since those backends might be
+ 		 * waiting for the WAL record that conflicts with their queries to be
+ 		 * replayed, recovery and replication need to remain until all read
+ 		 * only backends have been gone away.
+ 		 */
+ 		if (CountChildren(BACKEND_TYPE_NORMAL) == 0)
+ 		{
+ 			if (StartupPID != 0)
+ signal_child(StartupPID, SIGTERM);
+ 			if (WalReceiverPID != 0)
+ signal_child(WalReceiverPID, SIGTERM);
+ 			pmState = PM_WAIT_BACKENDS;
+ 		}
+ 	}
+ 
+ 	/*
  	 * If we are in a state-machine state that implies waiting for backends to
  	 * exit, see if they're all gone, and change state if so.
  	 */
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 13199,13204  postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
--- 13199,13208 
  This is usually the desired behavior for managing transaction log archiving
  behavior, since the preceding file is the last one that currently
  needs to be archived.
+ These functions also accept as a parameter the string that consists of timeline and
+ location, separated by a slash. In this case a transaction log file name is computed
+ by using the given timeline. On the other hand, if timeline is not supplied, the
+ current timeline is used for the computation.
 /para
  
 para
***
*** 13245,13257  postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
  literalfunctionpg_last_xlog_receive_location/function()/literal
  /entry
 entrytypetext/type/entry
!entryGet last transaction log location received and synced to disk during
! streaming recovery. If streaming recovery is still in progress
  this will increase monotonically. If streaming recovery has completed
  then this value will remain static at the value of the last WAL record
  received and synced to disk during that recovery. When the server has
  been started without a streaming recovery then the return value will be
! InvalidXLogRecPtr (0/0).
 /entry
/row
row
--- 13249,13263 
  literalfunctionpg_last_xlog_receive_location/function()/literal
  /entry
 entrytypetext/type/entry
!entryGet timeline and location of last transaction log received and synced
! to disk during streaming recovery. The return string is separated by a slash,
! the first value indicates the timeline and the other the location.
! If streaming recovery is still in progress
  this will increase monotonically. If streaming recovery has completed
  then this value will remain static at the value of the last WAL record
  received and synced to disk during that recovery. When the server has
  been 

[HACKERS] Assertion failure twophase.c (2) (testing HS/SR)

2010-03-04 Thread Erik Rijkers
in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30

With three patches:

  new_smart_shutdown_20100201.patch
  extend_format_of_recovery_info_funcs_v4.20100303.patch
  fix-KnownAssignedXidsRemoveMany-1.patch

  pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary

FailedAssertion, File: twophase.c, Line: 1201.

The standby was restarted and seems to catch up OK again.


LOG:  database system was interrupted; last known up at 2010-03-04 01:35:23 CET
cp: cannot stat 
`/var/data1/pg_stuff/dump/hotslave/replication_archive/00010001':
No such file or directory
FATAL:  the database system is starting up
LOG:  entering standby mode
LOG:  redo starts at 0/120
LOG:  consistent recovery state reached at 0/200
LOG:  database system is ready to accept read only connections
ERROR:  cannot execute CREATE TABLE in a read-only transaction
STATEMENT:  create table t (c text);
ERROR:  cannot execute SELECT INTO in a read-only transaction
STATEMENT:  create table t as select 1;
ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
STATEMENT:  truncate wal;
ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
STATEMENT:  truncate wal;
ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
STATEMENT:  truncate wal;
ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
STATEMENT:  truncate wal;
ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
STATEMENT:  truncate wal;
ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
STATEMENT:  truncate wal;
ERROR:  cannot execute TRUNCATE TABLE in a read-only transaction
STATEMENT:  truncate wal;
FATAL:  database ms does not exist
TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: twophase.c, 
Line: 1201)
LOG:  startup process (PID 18107) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and
exit, because another server process exited abnormally and possibly corrupted 
shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
LOG:  database system was interrupted while in recovery at log time 2010-03-04 
05:00:24 CET
HINT:  If this has occurred more than once some data might be corrupted and you 
might need to
choose an earlier recovery target.
cp: cannot stat 
`/var/data1/pg_stuff/dump/hotslave/replication_archive/0001001C007F':
No such file or directory
LOG:  entering standby mode
LOG:  redo starts at 1C/7800F8E0
LOG:  consistent recovery state reached at 1C/ADB9C178
LOG:  database system is ready to accept read only connections

The ERRORs (and FATAL) were accidentally issued commands; I can't tell if they 
were causing the
assertion. (database 'ms' indeed was not present on this instance)

see also:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg02221.php


thanks,

Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers