On Mon, 2007-02-26 at 23:07 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > On Mon, 2007-02-26 at 18:14 -0500, Tom Lane wrote: > >> What does this accomplish other than adding syntactic sugar over a > >> feature that really doesn't work well anyway? > > > This patch doesn't intend to implement group commit. I've changed the > > meaning of commit_delay, sorry if that confuses. > > Ah. The patch was pretty much unintelligible without the discussion > (which got here considerably later :-(). I've still got misgivings > about how safe it really is, but at least this is better than what > commit_delay wishes it could do.
Latest WIP version of patch now ready for performance testing. Applies cleanly to CVS HEAD, with two additional files: src/backend/postmaster/walwriter.c src/include/postmaster/walwriter.h Patch passes make installcheck in these cases - no options set - wal_writer_delay = 100000 - wal_writer_delay = 100000 and transaction_guarantee = off for all transactions by default in postgresql.conf. Normal checkpoints and restarts work without problem after these runs. What this patch does -------------------- Implements unguaranteed transactions, which skip the XLogFlush step when they commit. The flush point is updated in shared memory so that a separate WAL writer process will perform the flush each time it cycles. These parameters control this behaviour transaction_guarantee = on (default) | off USERSET wal_writer_delay = 0 (default, ==off) SIGHUP log_transaction_guarantee = on (default) | off SIGHUP (the default for this would be off in later production version) WAL writer will start/stop when wal_writer_delay is non-zero/zero. Unguaranteed transactions are only allowed for - Execute message - Fastpath message - Sync message - simple query implicit-commit-at-end and explicit COMMITs All other transaction commits will always use guaranteed commit path. These include things like VACUUM, various DDL and about a dozen other places that execute commits. The abort path is never fast in any case. In addition, any transaction that is deleting files follows guaranteed commit path, however it was requested. The interlock between commits and checkpoints is maintained. After the CheckpointStartLock has been gained by bgwriter, all unguaranteed transactions are flushed. (In addition the fsync GUC has been removed from postgresql.conf.sample, but not actually removed. If this patch goes ahead, I suggest we deprecate it for one release then remove it next...) What this patch doesn't do yet ------------------------------ Crash recovery does not yet work, but can be made to do so with TODO items (1) and (2) below. 1. The interlock between buffer manager and WAL is maintained, but not sufficiently to avoid problems in all cases. Specifically, commit hint bits must not be written to disk ahead of a transaction commit. Two approaches are possible 1. avoid setting the hint bits for unguaranteed transactions 2. set the hint bits *and* update the LSN of the page to be the LSN of the unguaranteed transaction for which we are setting the hint bits. Either way, we need to maintain a list of unguaranteed transactions in shared memory that can be accessed when hint bits are set. The list would need to contain the Xid and the LSN of each unguaranteed transaction. This would necessitate keeping the list of unguaranteed transactions fairly small, so some care is required to ensure this. That can be achieved by keeping commit_fsync_delay small or putting in a trigger point at which an wannabe unguaranteed transaction is forced to flush WAL instead. Some testing has shown that committing every 8 transactions has a considerable leap in performance in many cases. 2. As originally discussed, during crash recovery any in-flight transactions would need to be explicitly aborted in clog, to override the possibility that an unguaranteed transaction would have been marked committed. An alternative would be to flush all unguaranteed transactions prior to flushing dirty clog and multitrans pages. That could be achieved by keeping the LSN of the last write to those pages and performing XLogFlush up to that LSN when we write dirty pages. I'm leaning towards the new alternative version now, since its cleaner and it fits better with the way the rest of the server works. 3. WAL Writer could be used for various additional tasks, such as doing the WAL cache-half-filled check. Those options have been ignored until now, to avoid complicating discussion and review. 4. We probably need more padding in XLogCtlData to ensure that data protected by WALInsertLock, WALWriteLock and infolck are in separate cache lines to avoid CPU false sharing. That should be done whether or not this patch goes ahead. Tests, reviews and comments please? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/transam/xact.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v retrieving revision 1.234 diff -c -r1.234 xact.c *** src/backend/access/transam/xact.c 9 Feb 2007 03:35:33 -0000 1.234 --- src/backend/access/transam/xact.c 11 Mar 2007 22:37:09 -0000 *************** *** 58,63 **** --- 58,66 ---- int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ + bool DefaultXactCommitGuarantee = true; /* USERSET GUC: what user wants */ + static bool XactCommitGuarantee = true; /* the guarantee for this Xid? */ + bool log_transaction_guarantee = true; /* * transaction states - transaction state from server perspective *************** *** 710,715 **** --- 713,719 ---- TransactionId xid = GetCurrentTransactionId(); bool madeTCentries; XLogRecPtr recptr; + bool unsafe = false; /* Tell bufmgr and smgr to prepare for commit */ BufmgrCommit(); *************** *** 792,812 **** if (MyXactMadeXLogEntry) { /* ! * Sleep before flush! So we can flush more than one commit ! * records per single fsync. (The idea is some other backend may ! * do the XLogFlush while we're sleeping. This needs work still, ! * because on most Unixen, the minimum select() delay is 10msec or ! * more, which is way too long.) ! * ! * We do not sleep if enableFsync is not turned on, nor if there ! * are fewer than CommitSiblings other backends with active ! * transactions. ! */ ! if (CommitDelay > 0 && enableFsync && ! CountActiveBackends() >= CommitSiblings) ! pg_usleep(CommitDelay); ! XLogFlush(recptr); } /* --- 796,830 ---- if (MyXactMadeXLogEntry) { /* ! * If we have chosen to use unguaranteed transactions and we're ! * not doing cleanup of any rels, then we can defer fsync. ! * The WAL writer acts to minimise the window of data loss, ! * and we rely on it to flush WAL soon, but not precisely now. ! */ ! if (XactCommitGuarantee || nrels > 0) ! { ! /* ! * Sleep before flush! So we can flush more than one commit ! * records per single fsync. (The idea is some other backend may ! * do the XLogFlush while we're sleeping. This needs work still, ! * because on most Unixen, the minimum select() delay is 10msec or ! * more, which is way too long.) ! * ! * We do not sleep if enableFsync is not turned on, nor if there ! * are fewer than CommitSiblings other backends with active ! * transactions. ! */ ! if (CommitDelay > 0 && enableFsync && ! CountActiveBackends() >= CommitSiblings) ! pg_usleep(CommitDelay); ! XLogFlush(recptr); ! } ! else ! { ! unsafe = true; ! XLogDeferredFlush(recptr); ! } } /* *************** *** 830,835 **** --- 848,858 ---- LWLockRelease(CheckpointStartLock); END_CRIT_SECTION(); + + if (log_transaction_guarantee && madeTCentries && WALWriterActive()) + elog(LOG,"COMMIT %s insert %X/%X", + (XactCommitGuarantee ? " safe" : "unsafe"), + recptr.xlogid, recptr.xrecoff); } /* Break the chain of back-links in the XLOG records I output */ *************** *** 1388,1393 **** --- 1411,1417 ---- FreeXactSnapshot(); XactIsoLevel = DefaultXactIsoLevel; XactReadOnly = DefaultXactReadOnly; + SetXactCommitGuarantee(true); /* * reinitialize within-transaction counters *************** *** 4092,4097 **** --- 4116,4127 ---- return "UNRECOGNIZED"; } + void + SetXactCommitGuarantee(bool RequestedXactCommitGuarantee) + { + XactCommitGuarantee = RequestedXactCommitGuarantee; + } + /* * xactGetCommittedChildren * Index: src/backend/access/transam/xlog.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.265 diff -c -r1.265 xlog.c *** src/backend/access/transam/xlog.c 3 Mar 2007 20:02:26 -0000 1.265 --- src/backend/access/transam/xlog.c 11 Mar 2007 22:37:15 -0000 *************** *** 301,306 **** --- 301,309 ---- /* Protected by WALWriteLock: */ XLogCtlWrite Write; + /* Protected by commit_lck: */ + XLogwrtRqst CommitLogwrtRqst; + /* * These values do not change after startup, although the pointed-to pages * and xlblocks values certainly do. Permission to read/write the pages *************** *** 313,318 **** --- 316,322 ---- TimeLineID ThisTimeLineID; slock_t info_lck; /* locks shared variables shown above */ + slock_t commit_lck; /* deferred commit lock */ } XLogCtlData; static XLogCtlData *XLogCtl = NULL; *************** *** 1787,1792 **** --- 1791,1851 ---- } /* + * XLogDeferredFlush + * + * Keep track of deferred flush requests by unguaranteed transaction commits + */ + void + XLogDeferredFlush(XLogRecPtr RecPtr) + { + /* + * Update the deferred commit request pointer, if required, then + * return quickly so we can do some other useful work + */ + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + SpinLockAcquire(&xlogctl->commit_lck); + if (!XLByteLE(xlogctl->CommitLogwrtRqst.Write, RecPtr)) + xlogctl->CommitLogwrtRqst.Write = RecPtr; + SpinLockRelease(&xlogctl->commit_lck); + } + + /* Note that there is *no* XLogFlush() here, by design */ + } + + /* + * XLogBackgroundFlush + * + * Flush as far as the deferred commit request pointer, so that all + * unguaranteed commits are known flushed after this returns. + * + * If it hasn't changed or a normal commit has flushed past our pointer + * we will exit quickly from XLogFlush(), so no extra code here + */ + void + XLogBackgroundFlush(void) + { + XLogRecPtr RecPtr; + + /* + * Get the current deferred commit request pointer, + * don't worry about keeping local state information + */ + { + /* use volatile pointer to prevent code rearrangement */ + volatile XLogCtlData *xlogctl = XLogCtl; + + SpinLockAcquire(&xlogctl->commit_lck); + RecPtr = xlogctl->CommitLogwrtRqst.Write; + SpinLockRelease(&xlogctl->commit_lck); + } + + XLogFlush(RecPtr); + } + + /* * Create a new XLOG file segment, or open a pre-existing one. * * log, seg: identify segment to be created/opened. *************** *** 3985,3990 **** --- 4044,4050 ---- XLogCtl->XLogCacheBlck = XLOGbuffers - 1; XLogCtl->Insert.currpage = (XLogPageHeader) (XLogCtl->pages); SpinLockInit(&XLogCtl->info_lck); + SpinLockInit(&XLogCtl->commit_lck); /* * If we are not in bootstrap mode, pg_control should already exist. Read *************** *** 4998,5003 **** --- 5058,5065 ---- XLogCtl->LogwrtRqst.Write = EndOfLog; XLogCtl->LogwrtRqst.Flush = EndOfLog; + XLogCtl->CommitLogwrtRqst.Write = EndOfLog; + XLogCtl->CommitLogwrtRqst.Flush = EndOfLog; freespace = INSERT_FREESPACE(Insert); if (freespace > 0) *************** *** 5389,5394 **** --- 5451,5463 ---- */ LWLockAcquire(CheckpointStartLock, LW_EXCLUSIVE); + /* + * Now confirm that all unguaranteed transactions are written to WAL + * before we proceed further. This may require WALWriteLock and possibly + * WALInsertLock if we need to flush. + */ + XLogBackgroundFlush(); + /* And we need WALInsertLock too */ LWLockAcquire(WALInsertLock, LW_EXCLUSIVE); Index: src/backend/postmaster/Makefile =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/Makefile,v retrieving revision 1.22 diff -c -r1.22 Makefile *** src/backend/postmaster/Makefile 20 Jan 2007 17:16:12 -0000 1.22 --- src/backend/postmaster/Makefile 11 Mar 2007 22:37:17 -0000 *************** *** 12,18 **** top_builddir = ../../.. include $(top_builddir)/src/Makefile.global ! OBJS = bgwriter.o autovacuum.o pgarch.o pgstat.o postmaster.o syslogger.o \ fork_process.o all: SUBSYS.o --- 12,18 ---- top_builddir = ../../.. include $(top_builddir)/src/Makefile.global ! OBJS = bgwriter.o walwriter.o autovacuum.o pgarch.o pgstat.o postmaster.o syslogger.o \ fork_process.o all: SUBSYS.o Index: src/backend/postmaster/postmaster.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/postmaster/postmaster.c,v retrieving revision 1.526 diff -c -r1.526 postmaster.c *** src/backend/postmaster/postmaster.c 7 Mar 2007 13:35:02 -0000 1.526 --- src/backend/postmaster/postmaster.c 11 Mar 2007 22:37:21 -0000 *************** *** 107,112 **** --- 107,113 ---- #include "postmaster/pgarch.h" #include "postmaster/postmaster.h" #include "postmaster/syslogger.h" + #include "postmaster/walwriter.h" #include "storage/fd.h" #include "storage/ipc.h" #include "storage/pg_shmem.h" *************** *** 201,206 **** --- 202,208 ---- /* PIDs of special child processes; 0 when not running */ static pid_t StartupPID = 0, BgWriterPID = 0, + WALWriterPID = 0, AutoVacPID = 0, PgArchPID = 0, PgStatPID = 0; *************** *** 907,913 **** * CAUTION: when changing this list, check for side-effects on the signal * handling setup of child processes. See tcop/postgres.c, * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/autovacuum.c, ! * postmaster/pgarch.c, postmaster/pgstat.c, and postmaster/syslogger.c. */ pqinitmask(); PG_SETMASK(&BlockSig); --- 909,916 ---- * CAUTION: when changing this list, check for side-effects on the signal * handling setup of child processes. See tcop/postgres.c, * bootstrap/bootstrap.c, postmaster/bgwriter.c, postmaster/autovacuum.c, ! * postmaster/pgarch.c, postmaster/pgstat.c, postmaster/syslogger.c ! * and postmaster/walwriter.c */ pqinitmask(); PG_SETMASK(&BlockSig); *************** *** 1250,1255 **** --- 1253,1263 ---- start_autovac_launcher = false; /* signal successfully processed */ } + /* If we have lost the WAL writer, try to start a new one */ + if (WALWriterActive() && WALWriterPID == 0 && + StartupPID == 0 && !FatalError && Shutdown == NoShutdown) + WALWriterPID = StartWALWriter(); + /* If we have lost the archiver, try to start a new one */ if (XLogArchivingActive() && PgArchPID == 0 && StartupPID == 0 && !FatalError && Shutdown == NoShutdown) *************** *** 1822,1827 **** --- 1830,1837 ---- signal_child(BgWriterPID, SIGHUP); if (AutoVacPID != 0) signal_child(AutoVacPID, SIGHUP); + if (WALWriterPID != 0) + signal_child(WALWriterPID, SIGHUP); if (PgArchPID != 0) signal_child(PgArchPID, SIGHUP); if (SysLoggerPID != 0) *************** *** 1891,1896 **** --- 1901,1909 ---- /* And tell it to shut down */ if (BgWriterPID != 0) signal_child(BgWriterPID, SIGUSR2); + /* Tell WALWriter to shut down too; nothing left for it to do */ + if (WALWriterPID != 0) + signal_child(WALWriterPID, SIGQUIT); /* Tell pgarch to shut down too; nothing left for it to do */ if (PgArchPID != 0) signal_child(PgArchPID, SIGQUIT); *************** *** 1947,1952 **** --- 1960,1968 ---- /* And tell it to shut down */ if (BgWriterPID != 0) signal_child(BgWriterPID, SIGUSR2); + /* Tell WALWriter to shut down too; nothing left for it to do */ + if (WALWriterPID != 0) + signal_child(WALWriterPID, SIGQUIT); /* Tell pgarch to shut down too; nothing left for it to do */ if (PgArchPID != 0) signal_child(PgArchPID, SIGQUIT); *************** *** 1972,1977 **** --- 1988,1995 ---- signal_child(StartupPID, SIGQUIT); if (BgWriterPID != 0) signal_child(BgWriterPID, SIGQUIT); + if (WALWriterPID != 0) + signal_child(WALWriterPID, SIGQUIT); if (AutoVacPID != 0) signal_child(AutoVacPID, SIGQUIT); if (PgArchPID != 0) *************** *** 2070,2077 **** /* * Go to shutdown mode if a shutdown request was pending. ! * Otherwise, try to start the archiver, stats collector and ! * autovacuum launcher. */ if (Shutdown > NoShutdown && BgWriterPID != 0) signal_child(BgWriterPID, SIGUSR2); --- 2088,2095 ---- /* * Go to shutdown mode if a shutdown request was pending. ! * Otherwise, try to start the archiver, stats collector, ! * autovacuum launcher and WALWriter. */ if (Shutdown > NoShutdown && BgWriterPID != 0) signal_child(BgWriterPID, SIGUSR2); *************** *** 2081,2086 **** --- 2099,2106 ---- PgArchPID = pgarch_start(); if (PgStatPID == 0) PgStatPID = pgstat_start(); + if (WALWriterPID == 0) + WALWriterPID = StartWALWriter(); if (AutoVacuumingActive() && AutoVacPID == 0) AutoVacPID = StartAutoVacLauncher(); *************** *** 2141,2146 **** --- 2161,2180 ---- } /* + * Was it the WALWriter? Normal exit can be ignored; we'll + * start a new one at the next iteration of the postmaster's main loop, + * if necessary. Any other exit condition is treated as a crash. + */ + if (WALWriterPID != 0 && pid == WALWriterPID) + { + WALWriterPID = 0; + if (!EXIT_STATUS_0(exitstatus)) + HandleChildCrash(pid, exitstatus, + _("WALWriter process")); + continue; + } + + /* * Was it the autovacuum launcher? Normal exit can be ignored; we'll * start a new one at the next iteration of the postmaster's main loop, * if necessary. Any other exit condition is treated as a crash. *************** *** 2236,2241 **** --- 2270,2278 ---- /* And tell it to shut down */ if (BgWriterPID != 0) signal_child(BgWriterPID, SIGUSR2); + /* Tell WALWriter to shut down too; nothing left for it to do */ + if (WALWriterPID != 0) + signal_child(WALWriterPID, SIGQUIT); /* Tell pgarch to shut down too; nothing left for it to do */ if (PgArchPID != 0) signal_child(PgArchPID, SIGQUIT); *************** *** 2384,2389 **** --- 2421,2437 ---- signal_child(AutoVacPID, (SendStop ? SIGSTOP : SIGQUIT)); } + /* Force a power-cycle of the WALWriter process too */ + /* (Shouldn't be necessary, but just for luck) */ + if (WALWriterPID != 0 && !FatalError) + { + ereport(DEBUG2, + (errmsg_internal("sending %s to process %d", + "SIGQUIT", + (int) WALWriterPID))); + signal_child(WALWriterPID, SIGQUIT); + } + /* Force a power-cycle of the pgarch process too */ /* (Shouldn't be necessary, but just for luck) */ if (PgArchPID != 0 && !FatalError) *************** *** 3475,3480 **** --- 3523,3545 ---- AutoVacWorkerMain(argc - 2, argv + 2); proc_exit(0); } + if (strcmp(argv[1], "--forkwalwriter") == 0) + { + /* Close the postmaster's sockets */ + ClosePostmasterPorts(false); + + /* Restore basic shared memory pointers */ + InitShmemAccess(UsedShmemSegAddr); + + /* Need a PGPROC to run CreateSharedMemoryAndSemaphores */ + InitProcess(); + + /* Attach process to shared data structures */ + CreateSharedMemoryAndSemaphores(false, 0); + + WALWriterMain(argc, argv); + proc_exit(0); + } if (strcmp(argv[1], "--forkarch") == 0) { /* Close the postmaster's sockets */ Index: src/backend/tcop/postgres.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/tcop/postgres.c,v retrieving revision 1.527 diff -c -r1.527 postgres.c *** src/backend/tcop/postgres.c 3 Mar 2007 19:32:54 -0000 1.527 --- src/backend/tcop/postgres.c 11 Mar 2007 22:37:24 -0000 *************** *** 2224,2229 **** --- 2224,2231 ---- ereport(DEBUG3, (errmsg_internal("CommitTransactionCommand"))); + SetXactCommitGuarantee(DefaultXactCommitGuarantee); + CommitTransactionCommand(); #ifdef MEMORY_CONTEXT_CHECKING Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.379 diff -c -r1.379 guc.c *** src/backend/utils/misc/guc.c 6 Mar 2007 02:06:14 -0000 1.379 --- src/backend/utils/misc/guc.c 11 Mar 2007 22:37:31 -0000 *************** *** 52,57 **** --- 52,58 ---- #include "postmaster/bgwriter.h" #include "postmaster/postmaster.h" #include "postmaster/syslogger.h" + #include "postmaster/walwriter.h" #include "storage/fd.h" #include "storage/freespace.h" #include "tcop/tcopprot.h" *************** *** 100,105 **** --- 101,107 ---- extern int CommitSiblings; extern char *default_tablespace; extern bool fullPageWrites; + extern bool log_transaction_guarantee; #ifdef TRACE_SORT extern bool trace_sort; *************** *** 145,150 **** --- 147,153 ---- static bool assign_stage_log_stats(bool newval, bool doit, GucSource source); static bool assign_log_stats(bool newval, bool doit, GucSource source); static bool assign_transaction_read_only(bool newval, bool doit, GucSource source); + static bool assign_transaction_guarantee(bool newval, bool doit, GucSource source); static const char *assign_canonical_path(const char *newval, bool doit, GucSource source); static const char *assign_backslash_quote(const char *newval, bool doit, GucSource source); static const char *assign_timezone_abbreviations(const char *newval, bool doit, GucSource source); *************** *** 312,317 **** --- 315,322 ---- gettext_noop("Write-Ahead Log"), /* WAL_SETTINGS */ gettext_noop("Write-Ahead Log / Settings"), + /* WAL_COMMITS */ + gettext_noop("Write-Ahead Log / Commit Behavior"), /* WAL_CHECKPOINTS */ gettext_noop("Write-Ahead Log / Checkpoints"), /* QUERY_TUNING */ *************** *** 568,573 **** --- 573,586 ---- false, NULL, NULL }, { + {"log_transaction_guarantee", PGC_SIGHUP, WAL_COMMITS, + gettext_noop("Logs form of guarantee used at transaction commit."), + NULL + }, + &log_transaction_guarantee, + true, NULL, NULL + }, + { {"log_connections", PGC_BACKEND, LOGGING_WHAT, gettext_noop("Logs each successful connection."), NULL *************** *** 878,883 **** --- 891,904 ---- true, assign_phony_autocommit, NULL }, { + {"transaction_guarantee", PGC_USERSET, WAL_COMMITS, + gettext_noop("Sets the default of wait-for-commit."), + NULL + }, + &DefaultXactCommitGuarantee, + true, assign_transaction_guarantee, NULL + }, + { {"default_transaction_read_only", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Sets the default read-only status of new transactions."), NULL *************** *** 1452,1458 **** }, { ! {"commit_delay", PGC_USERSET, WAL_CHECKPOINTS, gettext_noop("Sets the delay in microseconds between transaction commit and " "flushing WAL to disk."), NULL --- 1473,1479 ---- }, { ! {"commit_delay", PGC_USERSET, WAL_COMMITS, gettext_noop("Sets the delay in microseconds between transaction commit and " "flushing WAL to disk."), NULL *************** *** 1462,1468 **** }, { ! {"commit_siblings", PGC_USERSET, WAL_CHECKPOINTS, gettext_noop("Sets the minimum concurrent open transactions before performing " "commit_delay."), NULL --- 1483,1489 ---- }, { ! {"commit_siblings", PGC_USERSET, WAL_COMMITS, gettext_noop("Sets the minimum concurrent open transactions before performing " "commit_delay."), NULL *************** *** 1472,1477 **** --- 1493,1507 ---- }, { + {"wal_writer_delay", PGC_SIGHUP, WAL_COMMITS, + gettext_noop("Sets the delay in microseconds between regular flushing of WAL " + "to disk by the WALWriter."), + NULL + }, + &WALWriterDelay, + 0, 0, 10000000, NULL, NULL + }, + { {"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE, gettext_noop("Sets the number of digits displayed for floating-point values."), gettext_noop("This affects real, double precision, and geometric data types. " *************** *** 6430,6435 **** --- 6460,6484 ---- return true; } + static bool + assign_transaction_guarantee(bool newval, bool doit, GucSource source) + { + /* + * Transaction guarantee can only be disabled if the + * WALWriter has been activated, allowing us to place + * a sensible time limit on the extent of the data loss window + * for UnGuaranteed Transactions + */ + if (newval == false && !WALWriterActive()) + { + if (source >= PGC_S_INTERACTIVE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("cannot set transaction guarantee when server commit_fsync_delay = 0"))); + } + return true; + } + static const char * assign_canonical_path(const char *newval, bool doit, GucSource source) { Index: src/backend/utils/misc/postgresql.conf.sample =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/misc/postgresql.conf.sample,v retrieving revision 1.212 diff -c -r1.212 postgresql.conf.sample *** src/backend/utils/misc/postgresql.conf.sample 6 Mar 2007 02:06:14 -0000 1.212 --- src/backend/utils/misc/postgresql.conf.sample 11 Mar 2007 22:37:31 -0000 *************** *** 150,156 **** # - Settings - ! #fsync = on # turns forced synchronization on or off #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync --- 150,156 ---- # - Settings - ! #wal_writer_delay = 0 # range 0-10000000, in microseconds #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync *************** *** 161,169 **** --- 161,172 ---- #full_page_writes = on # recover from partial page writes #wal_buffers = 64kB # min 32kB # (change requires restart) + #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 + #transaction_guarantee = on # default: immediate fsync at commit + # - Checkpoints - #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each Index: src/include/access/xact.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/access/xact.h,v retrieving revision 1.84 diff -c -r1.84 xact.h *** src/include/access/xact.h 5 Jan 2007 22:19:51 -0000 1.84 --- src/include/access/xact.h 11 Mar 2007 22:37:33 -0000 *************** *** 16,21 **** --- 16,22 ---- #include "access/xlog.h" #include "nodes/pg_list.h" + #include "postmaster/walwriter.h" #include "storage/relfilenode.h" #include "utils/timestamp.h" *************** *** 41,46 **** --- 42,50 ---- extern bool DefaultXactReadOnly; extern bool XactReadOnly; + /* Deferred Fsync */ + extern bool DefaultXactCommitGuarantee; + extern void SetXactCommitGuarantee(bool RequestedXactCommitGuarantee); /* * start- and end-of-transaction callbacks for dynamically loaded modules */ Index: src/include/access/xlog.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/access/xlog.h,v retrieving revision 1.76 diff -c -r1.76 xlog.h *** src/include/access/xlog.h 5 Jan 2007 22:19:51 -0000 1.76 --- src/include/access/xlog.h 11 Mar 2007 22:37:33 -0000 *************** *** 151,156 **** --- 151,158 ---- extern XLogRecPtr XLogInsert(RmgrId rmid, uint8 info, XLogRecData *rdata); extern void XLogFlush(XLogRecPtr RecPtr); + extern void XLogDeferredFlush(XLogRecPtr RecPtr); + extern void XLogBackgroundFlush(void); extern void xlog_redo(XLogRecPtr lsn, XLogRecord *record); extern void xlog_desc(StringInfo buf, uint8 xl_info, char *rec); Index: src/include/utils/guc_tables.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/utils/guc_tables.h,v retrieving revision 1.30 diff -c -r1.30 guc_tables.h *** src/include/utils/guc_tables.h 5 Jan 2007 22:19:59 -0000 1.30 --- src/include/utils/guc_tables.h 11 Mar 2007 22:37:34 -0000 *************** *** 51,56 **** --- 51,57 ---- RESOURCES_KERNEL, WAL, WAL_SETTINGS, + WAL_COMMITS, WAL_CHECKPOINTS, QUERY_TUNING, QUERY_TUNING_METHOD,
/*------------------------------------------------------------------------- * * walwriter.c * * PostgreSQL WAL Writer * * Initial author: Simon Riggs [EMAIL PROTECTED] * * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION * $PostgreSQL: pgsql/src/backend/postmaster/WALWriter.c,v 1.29 2007/02/10 14:58:54 petere Exp $ * *------------------------------------------------------------------------- */ #include "postgres.h" #include <fcntl.h> #include <signal.h> #include <time.h> #include <sys/time.h> #include <sys/wait.h> #include <unistd.h> #include "access/xact.h" #include "access/xlog.h" #include "libpq/pqsignal.h" #include "miscadmin.h" #include "postmaster/fork_process.h" #include "postmaster/postmaster.h" #include "postmaster/walwriter.h" #include "storage/fd.h" #include "storage/ipc.h" #include "storage/pg_shmem.h" #include "storage/pmsignal.h" #include "storage/proc.h" #include "storage/procarray.h" #include "storage/sinval.h" #include "utils/guc.h" #include "utils/memutils.h" #include "utils/ps_status.h" /* ---------- * Timer definitions. * ---------- */ #define WALWRITER_RESTART_INTERVAL 2 /* How often to attempt to restart a * failed WALWriter; in seconds. */ /* ---------- * Local data * ---------- */ static time_t last_WALWriter_start_time; /* Memory context for long-lived data */ static MemoryContext walwriter_cxt; /* * Flags set by interrupt handlers for later service in the main loop. */ static volatile sig_atomic_t got_SIGHUP = false; static volatile sig_atomic_t wakened = false; /* ---------- * Local function forward declarations * ---------- */ #ifdef EXEC_BACKEND static pid_t WALWriter_forkexec(void); #endif int WALWriterDelay = 0; NON_EXEC_STATIC void WALWriterMain(int argc, char *argv[]); static void WALWriter_exit(SIGNAL_ARGS); static void WALWriterSigHupHandler(SIGNAL_ARGS); static void WALWriter_waken(SIGNAL_ARGS); static void WALWriter_MainLoop(void); /* ------------------------------------------------------------ * Public functions called from postmaster follow * ------------------------------------------------------------ */ /* * WALWriter_start * * Called from postmaster at startup or after an existing WALWriter * died. Attempt to fire up a fresh WALWriter process. * * Returns PID of child process, or 0 if fail. * * Note: if fail, we will be called again from the postmaster main loop. */ int StartWALWriter(void) { time_t curtime; pid_t WALWriterPid; /* * Do nothing if no WALWriter needed */ if (!WALWriterActive()) return 0; /* * Do nothing if too soon since last WALWriter start. This is a safety * valve to protect against continuous respawn attempts if the WALWriter is * dying immediately at launch. Note that since we will be re-called from * the postmaster main loop, we will get another chance later. */ curtime = time(NULL); if ((unsigned int) (curtime - last_WALWriter_start_time) < (unsigned int) WALWRITER_RESTART_INTERVAL) return 0; last_WALWriter_start_time = curtime; #ifdef EXEC_BACKEND switch ((WALWriterPid = WALWriter_forkexec())) #else switch ((WALWriterPid = fork_process())) #endif { case -1: ereport(LOG, (errmsg("could not fork WALWriter: %m"))); return 0; #ifndef EXEC_BACKEND case 0: /* in postmaster child ... */ /* Close the postmaster's sockets */ ClosePostmasterPorts(false); /* Lose the postmaster's on-exit routines */ on_exit_reset(); WALWriterMain(0, NULL); break; #endif default: return (int) WALWriterPid; } /* shouldn't get here */ return 0; } /* ------------------------------------------------------------ * Local functions called by WALWriter follow * ------------------------------------------------------------ */ #ifdef EXEC_BACKEND /* * WALWriter_forkexec() - * * Format up the arglist for, then fork and exec, WALWriter process */ static pid_t WALWriter_forkexec(void) { char *av[10]; int ac = 0; av[ac++] = "postgres"; av[ac++] = "--forkwalwriter"; av[ac++] = NULL; /* filled in by postmaster_forkexec */ av[ac] = NULL; Assert(ac < lengthof(av)); return postmaster_forkexec(ac, av); } #endif /* EXEC_BACKEND */ /* * WALWriterMain * * The argc/argv parameters are valid only in EXEC_BACKEND case. However, * since we don't use 'em, it hardly matters... */ NON_EXEC_STATIC void WALWriterMain(int argc, char *argv[]) { sigjmp_buf local_sigjmp_buf; IsUnderPostmaster = true; /* we are a postmaster subprocess now */ MyProcPid = getpid(); /* reset MyProcPid */ /* * If possible, make this process a group leader, so that the postmaster * can signal any child processes too. */ #ifdef HAVE_SETSID if (setsid() < 0) elog(FATAL, "setsid() failed: %m"); #endif /* * Ignore all signals usually bound to some action in the postmaster, * except for SIGHUP, SIGUSR1 and SIGQUIT. */ pqsignal(SIGHUP, WALWriterSigHupHandler); pqsignal(SIGINT, SIG_IGN); pqsignal(SIGTERM, SIG_IGN); /* Not executing transactions */ pqsignal(SIGQUIT, WALWriter_exit); pqsignal(SIGALRM, SIG_IGN); pqsignal(SIGPIPE, SIG_IGN); pqsignal(SIGUSR1, WALWriter_waken); /* XXX: May want this later */ pqsignal(SIGUSR2, SIG_IGN); pqsignal(SIGCHLD, SIG_DFL); pqsignal(SIGTTIN, SIG_DFL); pqsignal(SIGTTOU, SIG_DFL); pqsignal(SIGCONT, SIG_DFL); pqsignal(SIGWINCH, SIG_DFL); /* * Identify myself via ps */ init_ps_display("WAL writer process", "", "", ""); SetProcessingMode(InitProcessing); /* Early initialization */ BaseInit(); /* * Create a per-backend PGPROC struct in shared memory, except in the * EXEC_BACKEND case where this was done in SubPostmasterMain. We must do * this before we can use LWLocks (and in the EXEC_BACKEND case we already * had to do some stuff with LWLocks). */ #ifndef EXEC_BACKEND InitAuxiliaryProcess(); #endif /* * Create a memory context that we will do all our work in. We do this so * that we can reset the context during error recovery and thereby avoid * possible memory leaks. */ walwriter_cxt = AllocSetContextCreate(TopMemoryContext, "WAL Writer", ALLOCSET_DEFAULT_MINSIZE, ALLOCSET_DEFAULT_INITSIZE, ALLOCSET_DEFAULT_MAXSIZE); MemoryContextSwitchTo(walwriter_cxt); /* * If an exception is encountered, processing resumes here. * * This code is heavily based on bgwriter.c, q.v. */ if (sigsetjmp(local_sigjmp_buf, 1) != 0) { /* since not using PG_TRY, must reset error stack by hand */ error_context_stack = NULL; /* Prevents interrupts while cleaning up */ HOLD_INTERRUPTS(); /* Report the error to the server log */ EmitErrorReport(); /* * These operations are really just a minimal subset of * AbortTransaction(). We don't have very many resources to worry * about, but we do have LWLocks. */ LWLockReleaseAll(); /* * Now return to normal top-level context and clear ErrorContext for * next time. */ MemoryContextSwitchTo(walwriter_cxt); FlushErrorState(); /* Flush any leaked data in the top-level context */ MemoryContextResetAndDeleteChildren(walwriter_cxt); /* Make sure pgstat also considers our stat data as gone */ /* Now we can allow interrupts again */ RESUME_INTERRUPTS(); /* * Sleep at least 1 second after any error. We don't want to be * filling the error logs as fast as we can. */ pg_usleep(1000000L); } /* We can now handle ereport(ERROR) */ PG_exception_stack = &local_sigjmp_buf; ereport(LOG, (errmsg("WAL writer started"))); PG_SETMASK(&UnBlockSig); WALWriter_MainLoop(); ereport(LOG, (errmsg("WAL writer shutting down"))); exit(0); } /* SIGQUIT signal handler for WALWriter process */ static void WALWriter_exit(SIGNAL_ARGS) { /* * For now, we just nail the doors shut and get out of town. */ exit(0); } /* SIGHUP: set flag to re-read config file at next convenient time */ static void WALWriterSigHupHandler(SIGNAL_ARGS) { got_SIGHUP = true; } /* SIGUSR1 signal handler for WALWriter process */ static void WALWriter_waken(SIGNAL_ARGS) { wakened = true; } /* * WALWriter_MainLoop */ static void WALWriter_MainLoop(void) { time_t last_cycle_time; long udelay; wakened = false; do { last_cycle_time = time(NULL); /* Check for config update */ if (got_SIGHUP) { got_SIGHUP = false; ProcessConfigFile(PGC_SIGHUP); if (!WALWriterActive()) break; /* user wants us to shut down */ } /* Do what we're here for, but noting that it may take * two cycles to get all transactions to disk, in some * circumstances. */ XLogBackgroundFlush(); /* * Lock contention may have been delayed our work, so check * what the time is and work out the delay needed, if any. * We don't want to systematically exceed our requested * delay because that widens the window of potential data loss */ udelay = (long) (WALWriterDelay - (int)(time(NULL) - last_cycle_time)); if (!got_SIGHUP && udelay > 0) pg_usleep(udelay); } while (PostmasterIsAlive(true)); }
/*------------------------------------------------------------------------- * * walwriter.h * WALWriter definitions * * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group * * IDENTIFICATION * $PostgreSQL: pgsql/src/backend/postmaster/WALWriter.c,v 1.29 2007/02/10 14:58:54 petere Exp $ * *------------------------------------------------------------------------- */ extern int WALWriterDelay; #define WALWriterActive() (WALWriterDelay > 0) extern int StartWALWriter(void);
---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly