Just a note to say that Hot Standby patch is now on git repository git://git.postgresql.org/git/users/simon/postgres Branch name: hot_standby
The complete contents of that repository are BSD licenced contributions to the PostgreSQL project. Any further changes to that will be by agreement here on hackers. From now, I will be submitting each individual change as patch-on-patch to allow people to see and discuss them and to confirm them as open source contributions. I request anybody else interested to do the same to allow us to work together. All contributions welcome. My record of agreed changes is here http://wiki.postgresql.org/wiki/Hot_Standby#Remaining_Work_Items You'll notice that I've already completed 8 changes (10 commits); those are all fairly minor changes, so submitted here as a combined patch. There are 9 pending changes, so far, none of which appear to be major obstacles to resolve. Many thanks to Heikki for a thorough review which has identified nearly all of those change requests. I estimate that making the remaining changes noted on the Wiki and fully testing them will take at least 2 weeks. Gabriele Bartolini is assisting in this area, though neither of us are able to work full time on this. We still have ample time to complete the project in this release. Many thanks to Magnus and Aidan for helping me resolve my git-wrestling contest and apologies for the delay while that bout happened. -- Simon Riggs www.2ndQuadrant.com
*** a/doc/src/sgml/backup.sgml --- b/doc/src/sgml/backup.sgml *************** *** 1934,1941 **** if (!triggered) </para> <para> ! Read-only here means "no writes to the permanent database tables". So ! there are no problems with queries that make use of temporary sort and work files will be used. Temporary tables cannot be created and therefore cannot be used at all in recovery mode. </para> --- 1934,1941 ---- </para> <para> ! Read-only here means "no writes to the permanent database tables". ! There are no problems with queries that make use of temporary sort and work files will be used. Temporary tables cannot be created and therefore cannot be used at all in recovery mode. </para> *************** *** 1983,1989 **** if (!triggered) </listitem> <listitem> <para> ! LOCK, with restrictions, see later </para> </listitem> <listitem> --- 1983,1989 ---- </listitem> <listitem> <para> ! LOCK TABLE, though only when explicitly IN ACCESS SHARE MODE </para> </listitem> <listitem> *************** *** 2000,2014 **** if (!triggered) </para> <para> ! These actions will produce error messages <itemizedlist> <listitem> <para> ! DML - Insert, Update, Delete, COPY FROM, Truncate which all write data. ! Any RULE which generates DML will throw error messages as a result. ! Note that there is no action possible that can result in a trigger ! being executed. </para> </listitem> <listitem> --- 2000,2013 ---- </para> <para> ! These actions produce error messages <itemizedlist> <listitem> <para> ! DML - Insert, Update, Delete, COPY FROM, Truncate. ! Note that there are no actions that result in a trigger ! being executed during recovery. </para> </listitem> <listitem> *************** *** 2024,2029 **** if (!triggered) --- 2023,2041 ---- </listitem> <listitem> <para> + RULEs on SELECT statements that generate DML commands. RULEs on DML + commands that produce only SELECT statements are already disallowed + during read-only transactions. + </para> + </listitem> + <listitem> + <para> + LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE. + LOCK TABLE that explicitly requests a lock other than ACCESS SHARE MODE. + </para> + </listitem> + <listitem> + <para> Transaction management commands that explicitly set non-read only state <itemizedlist> <listitem> *************** *** 2069,2077 **** if (!triggered) <para> Note that current behaviour of read only transactions when not in ! recovery is to allow the last two actions, so there is a small and ! subtle difference in behaviour between standby read-only transactions ! and read only transactions during normal running. It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and temporary tables may be lifted in a future release, if their internal implementation is altered to make this possible. --- 2081,2089 ---- <para> Note that current behaviour of read only transactions when not in ! recovery is to allow the last two actions, so there are small and ! subtle differences in behaviour between read-only transactions ! run on standby and during normal running. It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and temporary tables may be lifted in a future release, if their internal implementation is altered to make this possible. *************** *** 2082,2088 **** if (!triggered) processing mode. Sessions will remain connected while the server changes mode. Current transactions will continue, though will remain read-only. After this, it will be possible to initiate read-write ! transactions, though users must *manually* reset their default_transaction_read_only setting first, if they want that behaviour. </para> --- 2094,2100 ---- processing mode. Sessions will remain connected while the server changes mode. Current transactions will continue, though will remain read-only. After this, it will be possible to initiate read-write ! transactions, though users must explicitly reset their default_transaction_read_only setting first, if they want that behaviour. </para> *************** *** 2098,2107 **** if (!triggered) </para> <para> ! In recovery, transactions will not be permitted to take any lock higher ! other than AccessShareLock or AccessExclusiveLock. In addition, ! transactions may never assign a TransactionId and may never write WAL. ! The LOCK TABLE command by default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on the standby and requests a specific lock type other than AccessShareLock will be rejected. </para> --- 2110,2118 ---- </para> <para> ! In recovery, transactions will not be permitted to take any table lock ! higher than AccessShareLock. In addition, transactions may never assign ! a TransactionId and may never write WAL. Any LOCK TABLE command that runs on the standby and requests a specific lock type other than AccessShareLock will be rejected. </para> *************** *** 2168,2175 **** if (!triggered) <para> An example of the above would be an Administrator on Primary server ! runs a DROP TABLE command that refers to a table currently in use by ! a User query on the standby server. </para> <para> --- 2179,2186 ---- <para> An example of the above would be an Administrator on Primary server ! runs a DROP TABLE command on a table that's currently being queried ! in the standby server. </para> <para> *************** *** 2198,2206 **** if (!triggered) <para> We have a number of choices for resolving query conflicts. The default is that we wait and hope the query completes. If the recovery is not paused, ! then the server will wait automatically until the server the lag between primary and standby is at most max_standby_delay seconds. Once that grace ! period expires, we then take one of the following actions: <itemizedlist> <listitem> --- 2209,2217 ---- <para> We have a number of choices for resolving query conflicts. The default is that we wait and hope the query completes. If the recovery is not paused, ! then the server will wait automatically until the lag between primary and standby is at most max_standby_delay seconds. Once that grace ! period expires, we take one of the following actions: <itemizedlist> <listitem> *************** *** 2213,2219 **** if (!triggered) <para> If the conflict is caused by cleanup records we tell the standby query that a conflict has occurred and that it must cancel itself to avoid the ! risk that it attempts to silently fails to read relevant data because that data has been removed. (This is very similar to the much feared error message "snapshot too old"). </para> --- 2224,2230 ---- <para> If the conflict is caused by cleanup records we tell the standby query that a conflict has occurred and that it must cancel itself to avoid the ! risk that it silently fails to read relevant data because that data has been removed. (This is very similar to the much feared error message "snapshot too old"). </para> *************** *** 2222,2228 **** if (!triggered) Note also that this means that idle-in-transaction sessions are never canceled except by locks. Users should be clear that tables that are regularly and heavily updated on primary server will quickly cause ! cancellation of any longer running queries made against those tables. </para> <para> --- 2233,2239 ---- Note also that this means that idle-in-transaction sessions are never canceled except by locks. Users should be clear that tables that are regularly and heavily updated on primary server will quickly cause ! cancellation of any longer running queries in the standby. </para> <para> *************** *** 2235,2241 **** if (!triggered) </para> <para> ! Other remdial actions exist if the number of cancelations is unacceptable. The first option is to connect to primary server and keep a query active for as long as we need to run queries on the standby. This guarantees that a WAL cleanup record is never generated and we don't ever get query --- 2246,2252 ---- </para> <para> ! Other remedial actions exist if the number of cancelations is unacceptable. The first option is to connect to primary server and keep a query active for as long as we need to run queries on the standby. This guarantees that a WAL cleanup record is never generated and we don't ever get query *************** *** 2283,2289 **** if (!triggered) <title>Administrator's Overview</title> <para> ! If there is a recovery.conf file present then the will start in Hot Standby mode by default, though this can be disabled by setting "recovery_connections = off" in recovery.conf. The server may take some time to enable recovery connections since the server must first complete --- 2294,2300 ---- <title>Administrator's Overview</title> <para> ! If there is a recovery.conf file present the server will start in Hot Standby mode by default, though this can be disabled by setting "recovery_connections = off" in recovery.conf. The server may take some time to enable recovery connections since the server must first complete *************** *** 2308,2314 **** LOG: database system is ready to accept read only connections The setting of max_connections on the standby should be equal to or greater than the setting of max_connections on the primary. This is to ensure that standby has sufficient resources to manage incoming ! transactions. </para> <para> --- 2319,2325 ---- The setting of max_connections on the standby should be equal to or greater than the setting of max_connections on the primary. This is to ensure that standby has sufficient resources to manage incoming ! transactions. max_prepared_transactions already has this restriction. </para> <para> *************** *** 2329,2335 **** LOG: database system is ready to accept read only connections A set of functions allow superusers to control the flow of recovery are described in <xref linkend="functions-recovery-control-table">. These functions allow you to pause and continue recovery, as well ! as dynamically set new recovery targets wile recovery progresses. Note that when a server is paused the apparent delay between primary and standby will continue to increase. </para> --- 2340,2346 ---- A set of functions allow superusers to control the flow of recovery are described in <xref linkend="functions-recovery-control-table">. These functions allow you to pause and continue recovery, as well ! as dynamically set new recovery targets while recovery progresses. Note that when a server is paused the apparent delay between primary and standby will continue to increase. </para> *************** *** 2342,2348 **** LOG: database system is ready to accept read only connections themselves. Users will be able to write large sort temp files and re-generate relcache info files, so there is no part of the database that is truly read-only during hot standby mode. There is no restriction ! on use of set returning functions, or other users of tuplestore/tuplesort code. Note also that writes to remote databases will still be possible, even though the transaction is read-only locally. </para> --- 2353,2359 ---- themselves. Users will be able to write large sort temp files and re-generate relcache info files, so there is no part of the database that is truly read-only during hot standby mode. There is no restriction ! on the use of set returning functions, or other users of tuplestore/tuplesort code. Note also that writes to remote databases will still be possible, even though the transaction is read-only locally. </para> *************** *** 2354,2360 **** LOG: database system is ready to accept read only connections </para> <para> ! The following types of administrator command will not be accepted during recovery mode <itemizedlist> --- 2365,2371 ---- </para> <para> ! The following types of administrator command are not be accepted during recovery mode <itemizedlist> *************** *** 2558,2563 **** LOG: database system is ready to accept read only connections --- 2569,2583 ---- available for use when running queries during recovery. </para> </listitem> + <listitem> + <para> + Full knowledge of running transactions is required before snapshots + may be taken. Transactions that take use large numbers of subtransactions + (currently greater than 64) will delay the start of read only + connections until the completion of the longest running write transaction. + If this situation occurs explanatory messages will be sent to server log. + </para> + </listitem> </itemizedlist> </para> *** a/src/backend/access/gin/ginxlog.c --- b/src/backend/access/gin/ginxlog.c *************** *** 622,628 **** gin_redo(XLogRecPtr lsn, XLogRecord *record) uint8 info = record->xl_info & ~XLR_INFO_MASK; /* ! * GIN indexes do not require any conflict processing. XXX really? */ if (InHotStandby) RecordKnownAssignedTransactionIds(record->xl_xid); --- 622,630 ---- uint8 info = record->xl_info & ~XLR_INFO_MASK; /* ! * GIN indexes do not require any conflict processing. The GIN ! * posting tree is scanned in logical order during VACUUM and ! * no additional processing is required. */ if (InHotStandby) RecordKnownAssignedTransactionIds(record->xl_xid); *** a/src/backend/access/gist/gistxlog.c --- b/src/backend/access/gist/gistxlog.c *************** *** 397,403 **** gist_redo(XLogRecPtr lsn, XLogRecord *record) MemoryContext oldCxt; /* ! * GIST indexes do not require any conflict processing. XXX really? */ if (InHotStandby) RecordKnownAssignedTransactionIds(record->xl_xid); --- 397,406 ---- MemoryContext oldCxt; /* ! * GIST indexes do not require any conflict processing. This is ! * because GIST does not remove killed tuples when it performs ! * page splits in the same way b-trees do. Also VACUUMs of ! * GIST indexes occur in logical not physical order. */ if (InHotStandby) RecordKnownAssignedTransactionIds(record->xl_xid); *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 947,952 **** begin:; --- 947,971 ---- FIN_CRC32(rdata_crc); record->xl_crc = rdata_crc; + #ifdef WAL_DEBUG + if (XLOG_DEBUG) + { + StringInfoData buf; + + initStringInfo(&buf); + appendStringInfo(&buf, "INSERT @ %X/%X: ", + RecPtr.xlogid, RecPtr.xrecoff); + xlog_outrec(&buf, record); + if (rdata->data != NULL) + { + appendStringInfo(&buf, " - "); + RmgrTable[record->xl_rmid].rm_desc(&buf, record->xl_info, rdata->data); + } + elog(LOG, "%s", buf.data); + pfree(buf.data); + } + #endif + /* Record begin of record in appropriate places */ ProcLastRecPtr = RecPtr; Insert->PrevRecord = RecPtr; *** a/src/backend/commands/lockcmds.c --- b/src/backend/commands/lockcmds.c *************** *** 49,61 **** LockTableCommand(LockStmt *lockstmt) /* * During recovery we only accept these variations: ! * ! * LOCK TABLE foo -- implicitly, AccessExclusiveLock ! * LOCK TABLE foo IN ACCESS SHARE MODE ! * LOCK TABLE foo IN ACCESS EXCLUSIVE MODE */ ! if (lockstmt->mode != AccessShareLock ! && lockstmt->mode != AccessExclusiveLock) PreventCommandDuringRecovery(); LockTableRecurse(reloid, relation, --- 49,57 ---- /* * During recovery we only accept these variations: ! * LOCK TABLE foo IN ACCESS SHARE MODE which is effectively a no-op */ ! if (lockstmt->mode != AccessShareLock) PreventCommandDuringRecovery(); LockTableRecurse(reloid, relation, *** a/src/backend/storage/ipc/procarray.c --- b/src/backend/storage/ipc/procarray.c *************** *** 502,509 **** ProcArrayApplyRecoveryInfo(XLogRecPtr lsn, xl_xact_running_xacts *xlrec) if (!xlrec->subxid_overflow) recoverySnapshotValid = true; else ! elog(trace_recovery(DEBUG2), ! "running xact data has incomplete subtransaction data"); xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt)); nxids = 0; --- 502,509 ---- if (!xlrec->subxid_overflow) recoverySnapshotValid = true; else ! ereport(LOG, ! (errmsg("consistent state delayed because recovery snapshot incomplete"))); xids = palloc(sizeof(TransactionId) * (xlrec->xcnt + xlrec->subxcnt)); nxids = 0; *************** *** 1502,1508 **** HaveTransactionsInCommit(TransactionId *xids, int nxids) /* * BackendPidGetProc -- get a backend's PGPROC given its PID ! * * Returns NULL if not found. Note that it is up to the caller to be * sure that the question remains meaningful for long enough for the * answer to be used ... --- 1502,1508 ---- /* * BackendPidGetProc -- get a backend's PGPROC given its PID ! * * Returns NULL if not found. Note that it is up to the caller to be * sure that the question remains meaningful for long enough for the * answer to be used ... *************** *** 1536,1576 **** BackendPidGetProc(int pid) } /* - * BackendXidGetProc -- get a backend's PGPROC given its XID - * - * Returns NULL if not found. Note that it is up to the caller to be - * sure that the question remains meaningful for long enough for the - * answer to be used ... - */ - PGPROC * - BackendXidGetProc(TransactionId xid) - { - PGPROC *result = NULL; - ProcArrayStruct *arrayP = procArray; - int index; - - if (xid == InvalidTransactionId) /* never match invalid xid */ - return 0; - - LWLockAcquire(ProcArrayLock, LW_SHARED); - - for (index = 0; index < arrayP->numProcs; index++) - { - PGPROC *proc = arrayP->procs[index]; - - if (proc->xid == xid) - { - result = proc; - break; - } - } - - LWLockRelease(ProcArrayLock); - - return result; - } - - /* * BackendXidGetPid -- get a backend's pid given its XID * * Returns 0 if not found or it's a prepared transaction. Note that --- 1536,1541 ---- *** a/src/backend/tcop/postgres.c --- b/src/backend/tcop/postgres.c *************** *** 2695,2705 **** ProcessInterrupts(void) * idle-in-transaction session, so make it FATAL instead. */ case CONFLICT_MODE_ERROR: ! cancelMode = CONFLICT_MODE_FATAL; break; case CONFLICT_MODE_ERROR_IF_NOT_IDLE: ! cancelMode = CONFLICT_MODE_NOT_SET; break; default: --- 2695,2713 ---- * idle-in-transaction session, so make it FATAL instead. */ case CONFLICT_MODE_ERROR: ! cancelMode = CONFLICT_MODE_FATAL; break; case CONFLICT_MODE_ERROR_IF_NOT_IDLE: ! /* ! * If we still have a snapshot then we must ! * cancel, else we are free to go. ! * XXXHS: As above, cancel means FATAL, for now. ! */ ! if (MyProc->xmin == 0) ! cancelMode = CONFLICT_MODE_NOT_SET; ! else ! cancelMode = CONFLICT_MODE_FATAL; break; default: *** a/src/backend/utils/time/tqual.c --- b/src/backend/utils/time/tqual.c *************** *** 1259,1265 **** XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot) /* * Data lives in different places depending upon when snapshot taken */ ! if (snapshot->takenDuringRecovery) { /* * If the snapshot contains full subxact data, the fastest way to check --- 1259,1265 ---- /* * Data lives in different places depending upon when snapshot taken */ ! if (!snapshot->takenDuringRecovery) { /* * If the snapshot contains full subxact data, the fastest way to check *** a/src/include/access/nbtree.h --- b/src/include/access/nbtree.h *************** *** 536,545 **** typedef BTScanOpaqueData *BTScanOpaque; #define SK_BT_DESC (INDOPTION_DESC << SK_BT_INDOPTION_SHIFT) #define SK_BT_NULLS_FIRST (INDOPTION_NULLS_FIRST << SK_BT_INDOPTION_SHIFT) - /* XXX probably needs new RMgr call to do this cleanly */ - extern bool btree_is_cleanup_record(uint8 info); - extern bool btree_needs_cleanup_lock(uint8 info); - /* * prototypes for functions in nbtree.c (external entry points for btree) */ --- 536,541 ---- *** a/src/include/storage/procarray.h --- b/src/include/storage/procarray.h *************** *** 54,60 **** extern int GetTransactionsInCommit(TransactionId **xids_p); extern bool HaveTransactionsInCommit(TransactionId *xids, int nxids); extern PGPROC *BackendPidGetProc(int pid); - extern PGPROC *BackendXidGetProc(TransactionId xid); extern int BackendXidGetPid(TransactionId xid); extern bool IsBackendPid(int pid); --- 54,59 ----
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers