Re: [HACKERS] Assertion failure twophase.c (2) (testing HS/SR)
On Thu, 2010-03-11 at 11:29 +0200, Heikki Linnakangas wrote: > I'm still not any wiser on what's causing that, but I've fixed the bug > in KnownAssignedXidsMany() now. Yeh, I've been mulling this over for a few days now and I can't see a way that could have happened either. I agree with your fix and the stronger placement of the Assertion. Thanks. I will be doing some further investigation in that area as well, over next week or so. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure twophase.c (2) (testing HS/SR)
On Thu, Mar 11, 2010 at 6:29 PM, Heikki Linnakangas wrote: > Erik Rijkers wrote: >> in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 >> >> With three patches: >> >> new_smart_shutdown_20100201.patch http://archives.postgresql.org/pgsql-hackers/2010-01/msg03116.php >> extend_format_of_recovery_info_funcs_v4.20100303.patch http://archives.postgresql.org/pgsql-hackers/2010-03/msg00175.php > Got a link to these two patches? I couldn't find them with a quick search. For your convenience, I attached those patches in this post. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *** *** 278,283 typedef enum --- 278,284 PM_RECOVERY_CONSISTENT, /* consistent recovery mode */ PM_RUN, /* normal "database is alive" state */ PM_WAIT_BACKUP,/* waiting for online backup mode to end */ + PM_WAIT_READONLY, /* waiting for read only backends to exit */ PM_WAIT_BACKENDS, /* waiting for live backends to exit */ PM_SHUTDOWN,/* waiting for bgwriter to do shutdown ckpt */ PM_SHUTDOWN_2,/* waiting for archiver and walsenders to finish */ *** *** 2165,2171 pmdie(SIGNAL_ARGS) /* and the walwriter too */ if (WalWriterPID != 0) signal_child(WalWriterPID, SIGTERM); ! pmState = PM_WAIT_BACKUP; } /* --- 2166,2173 /* and the walwriter too */ if (WalWriterPID != 0) signal_child(WalWriterPID, SIGTERM); ! /* online backup mode is active only when normal processing */ ! pmState = (pmState == PM_RUN) ? PM_WAIT_BACKUP : PM_WAIT_READONLY; } /* *** *** 2840,2845 PostmasterStateMachine(void) --- 2842,2870 } /* + * If we are in a state-machine state that implies waiting for read only + * backends to exit, see if they're all gone, and change state if so. + */ + if (pmState == PM_WAIT_READONLY) + { + /* + * PM_WAIT_READONLY state ends when we have no regular backends that + * have been started during recovery. Since those backends might be + * waiting for the WAL record that conflicts with their queries to be + * replayed, recovery and replication need to remain until all read + * only backends have been gone away. + */ + if (CountChildren(BACKEND_TYPE_NORMAL) == 0) + { + if (StartupPID != 0) + signal_child(StartupPID, SIGTERM); + if (WalReceiverPID != 0) + signal_child(WalReceiverPID, SIGTERM); + pmState = PM_WAIT_BACKENDS; + } + } + + /* * If we are in a state-machine state that implies waiting for backends to * exit, see if they're all gone, and change state if so. */ *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 13199,13204 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); --- 13199,13208 This is usually the desired behavior for managing transaction log archiving behavior, since the preceding file is the last one that currently needs to be archived. + These functions also accept as a parameter the string that consists of timeline and + location, separated by a slash. In this case a transaction log file name is computed + by using the given timeline. On the other hand, if timeline is not supplied, the + current timeline is used for the computation. *** *** 13245,13257 postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup()); pg_last_xlog_receive_location() text !Get last transaction log location received and synced to disk during ! streaming recovery. If streaming recovery is still in progress this will increase monotonically. If streaming recovery has completed then this value will remain static at the value of the last WAL record received and synced to disk during that recovery. When the server has been started without a streaming recovery then the return value will be ! InvalidXLogRecPtr (0/0). --- 13249,13263 pg_last_xlog_receive_location() text !Get timeline and location of last transaction log received and synced ! to disk during streaming recovery. The return string is separated by a slash, ! the first value indicates the timeline and the other the location. ! If streaming recovery is still in progress this will increase monotonically. If streaming recovery has completed then this value will remain static at the value of the last WAL record received and synced to disk during that recovery. When the server has been started without a streaming recovery then the return value will be ! 0/0/0. *** *** 13259,13270 postgres=# SELECT *
Re: [HACKERS] Assertion failure twophase.c (2) (testing HS/SR)
Erik Rijkers wrote: > in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 > > With three patches: > > new_smart_shutdown_20100201.patch > extend_format_of_recovery_info_funcs_v4.20100303.patch Got a link to these two patches? I couldn't find them with a quick search. > fix-KnownAssignedXidsRemoveMany-1.patch > > pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary > > FailedAssertion, File: "twophase.c", Line: 1201. > > The standby was restarted and seems to catch up OK again. > ... > see also: > http://archives.postgresql.org/pgsql-hackers/2010-02/msg02221.php I'm still not any wiser on what's causing that, but I've fixed the bug in KnownAssignedXidsMany() now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Assertion failure twophase.c (2) (testing HS/SR)
in a 9.0devel, primary+standby, cvs from 2010.03.04 01:30 With three patches: new_smart_shutdown_20100201.patch extend_format_of_recovery_info_funcs_v4.20100303.patch fix-KnownAssignedXidsRemoveMany-1.patch pg_dump -d $db8.4.2 | psql -d $db9.0devel-primary FailedAssertion, File: "twophase.c", Line: 1201. The standby was restarted and seems to catch up OK again. LOG: database system was interrupted; last known up at 2010-03-04 01:35:23 CET cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/00010001': No such file or directory FATAL: the database system is starting up LOG: entering standby mode LOG: redo starts at 0/120 LOG: consistent recovery state reached at 0/200 LOG: database system is ready to accept read only connections ERROR: cannot execute CREATE TABLE in a read-only transaction STATEMENT: create table t (c text); ERROR: cannot execute SELECT INTO in a read-only transaction STATEMENT: create table t as select 1; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; ERROR: cannot execute TRUNCATE TABLE in a read-only transaction STATEMENT: truncate wal; FATAL: database "ms" does not exist TRAP: FailedAssertion("!(((xid) != ((TransactionId) 0)))", File: "twophase.c", Line: 1201) LOG: startup process (PID 18107) was terminated by signal 6: Aborted LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: database system was interrupted while in recovery at log time 2010-03-04 05:00:24 CET HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/0001001C007F': No such file or directory LOG: entering standby mode LOG: redo starts at 1C/7800F8E0 LOG: consistent recovery state reached at 1C/ADB9C178 LOG: database system is ready to accept read only connections The ERRORs (and FATAL) were accidentally issued commands; I can't tell if they were causing the assertion. (database 'ms' indeed was not present on this instance) see also: http://archives.postgresql.org/pgsql-hackers/2010-02/msg02221.php thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers