On Fri, 2007-07-13 at 15:01 +0200, Reviewer wrote: > I just read your patch in order to understand a little bit what will > happen in the next release.
Thanks for the review. v2 attached. > About the guc variable wal_writer_delay, you say at the end: > "This parameter can only > + be set in the postgresql.conf file or on the server command line." > > Well, I don't really understand if this parameter is set only at server > startup or if it is possible to modify the postgresql.conf file and then > reload the conf-files. A common confusion. I'm not personally in favour of using that phrase in the docs, though it is the standard description of when to set this type of parameter, so I have followed the convention. > Also, in the part about synchronous commit and fsync=off, there's just a > missing verb, I think you should insist between the two possibilities : > system or database server crash, maybe by using some bold. I've reworded that part. Thanks for spotting this. > From a user point of view, I think the documentation is clear enough, > but I admit beeing a bit lost in the real end of the explanation, the > part about the transaction status hint bits but I think it's not that > important. I think I understood synchronous and asynchronous commits. I've added to and reworked that a bit also. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Index: doc/src/sgml/config.sgml =================================================================== RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.130 diff -r1.130 config.sgml 1414a1415,1454 > > <varlistentry id="guc-wal-writer-delay" xreflabel="wal_writer_delay"> > <term><varname>wal_writer_delay</varname> (<type>integer</type>)</term> > <indexterm> > <primary><varname>wal_writer_delay</> configuration parameter</primary> > </indexterm> > <listitem> > <para> > Specifies the delay between activity rounds for the WAL Writer. In each > round the writer will flush WAL to disk. It then sleeps for > wal_writer_delay milliseconds, and repeats. The default value is 200 > milliseconds (200ms). Note that on many systems, the effective > resolution of sleep delays is 10 milliseconds; setting wal_writer_delay > to a value that is not a multiple of 10 might have the same results as > setting it to the next higher multiple of 10. This parameter can only > be set in the postgresql.conf file or on the server command line. > </para> > </listitem> > </varlistentry> > > <varlistentry id="guc-synchronous-commit" xreflabel="synchronous_commit"> > <term><varname>synchronous_commit</varname> (<type>boolean</type>)</term> > <indexterm> > <primary><varname>synchronous_commit</> configuration parameter</primary> > </indexterm> > <listitem> > <para> > Specifies whether explicit or implicit commit commands will wait > for WAL to be written to disk before the command returns success. > Turning this parameter off will give you asynchronous commits, > which will increase performance for some workloads, though > introduces risk of data loss, see <xref linkend="wal-asynch-commit">. > </para> > <para> > This parameter can be set for individual transactions or sessions, > so it is advisable to use asynchronous commits only for those > transactions for which risk of data loss is acceptable. > </para> > </listitem> > </varlistentry> Index: doc/src/sgml/wal.sgml =================================================================== RCS file: /projects/cvsroot/pgsql/doc/src/sgml/wal.sgml,v retrieving revision 1.44 diff -r1.44 wal.sgml 26c26,30 < transactions will remain intact. --- > transactions will remain intact. We refer to this mode of operation > as synchronous commit, since the user issuing the commit waits for > the writing of the <xref linkend="wal-intro"> to permanent storage. > Asynchronous commit is also possible as a performance option, > described in <xref linkend="wal-asynch-commit"> 396a401,594 > > <sect1 id="wal-asynch-commit"> > <title>Asynchronous Commit</title> > > <indexterm> > <primary>synchronous commit</primary> > </indexterm> > > <indexterm> > <primary>asynchronous commit</primary> > </indexterm> > > <para> > Asynchronous Commit allows a commit command to complete faster, at the > cost that the most recent transactions will be lost if the database > should crash. This feature is particularly useful for real-time > sensor data collection applications, such as RFID tags or other > monitoring applications. > </para> > > <para> > Normal commits, or as we now refer to them, synchronous commits, wait > for the writing of the <acronym>WAL</acronym> to permanent storage > before returning control to the user. An asynchronous commit will > return control to the user <emphasis>before</> <acronym>WAL</acronym> > data has been written and flushed to disk, which gives a significant > performance boost if there has been few other disk accesses over the course > of this transaction. From an SQL perspective, this makes asynchronous > commits particularly useful for shorter, mainly INSERT transactions or > writes on tables small enough to reside mainly in memory. > </para> > > <para> > Asynchronous commits introduce the risk of data loss. There > is a time window between the time that a commit has returned > successfully to the user and the time of the WAL write during which > data will certainly be lost <emphasis>if</> the server crashes. > The data loss is deterministic: if the server stays up then the > commits will be durable, while if the server crashes there will be > certain data loss of the transactions that have most recently > committed. > </para> > > <para> > The user can select the commit mode of their transactions, so that > it is possible to have both synchronous and asynchronous commit > transactions concurrently. The proof that this is safe is presented > later in this section. The commit mode is > controlled by the user settable parameter > <xref linkend="guc-synchronous-commit">. > <varname>synchronous_commit</varname> may be set in > <filename>postgresql.conf</filename>, or for a specific session using > <programlisting> > SET synchronous_commit = off > </programlisting> > which will provide faster, asynchronous commits. > <varname>synchronous_commit</varname> can be set at any point right > up to the final COMMIT statement. This parameter may also be set for > just one individual transaction using > <programlisting> > SET LOCAL synchronous_commit = off > </programlisting> > </para> > > <para> > Whatever the setting of <varname>synchronous_commit</varname>, > commits will always be synchronous for utility commands, such > as VACUUM, as well as for any transaction that created or removed > files. > </para> > > <sect2 id="wal-asynch-commit-risk"> > <title>Risk of Data Loss</title> > > <para> > If the database crashes during the risk window between the > asynchronous commit and the writing of the <acronym>WAL</acronym> > then data written during that transaction <emphasis>will</> be lost. > Only data written by asynchronous transactions will be lost, just as > if the transaction had never actually completed. Transactions are > atomic, so this data loss does not propagate and any already > written data is completely safe. > </para> > > <para> > The window of data loss is limited because the WAL Writer process > regularly writes <acronym>WAL</acronym> to disk every > <xref linkend="guc-wal-writer-delay"> milliseconds. > The actual maximum duration of the risk window is twice the > <varname>wal_writer_delay</varname> because the writes are optimised > to favor writing whole pages at a time during busy periods. > </para> > > <para> > The extent of data loss for asynchronous commit transactions will be > limited to the number of transactions that normally complete in the > duration of the maximum time window. You can calculate this for your > specific application/system, though typically this would be in the > ballpark of 1000 transactions. This aspect means that asynch commits > are a useful technique for applications such as sensor measurements or > web statistics where a steady stream of data must be processed, yet > individual measurements are not particularly important. Data loss > would not be acceptable when the data processed is financial > transactions or other valuable customer commitments. > </para> > > <para> > Asynchronous commit provides different behaviour to setting > <varname>fsync</varname> = off, since that is a server-wide > setting that will alter the behaviour of all transactions, > overriding the setting of <varname>synchronous_commit</varname>, > as well as risking much wider data loss. With <varname>fsync</varname> > = off the WAL written but not fsynced, so data is lost only in case > of a system crash. With asynchronous commit the WAL is not written > to disk at all by the user, so data is lost if there is a database > server crash, as well as when the system crashes. > </para> > > <para> > <varname>commit_delay</varname> also sounds very similar to > asynchronous commit, but it is actually a synchronous commit > with an additional wait that allows a technique known as group > commit. <varname>commit_delay</varname> is also a server-wide setting. > </para> > > </sect2> > > <sect2 id="wal-asynch-commit-proof"> > <title>Proof of safety for concurrent use</title> > > <para> > It is useful that we can run both synchronous and asynchronous > commit transactions concurrently, though there must be a clear proof > to ensure we minimise the possibility of technical error. > </para> > > <para> > We have two transactions, T1 and T2. The > <firstterm>Log Sequence Number</firstterm> (<acronym>LSN</acronym>) > is the point in the WAL sequence where a transaction commit is > recorded, so LSN1 and LSN2 are the commit records of those transactions. > If T2 can see changes made by T1 then when T2 commits it > must be true that LSN2 follows LSN1. Thus when T2 commits it is certain > that all of the changes made by T1 are also now recorded in the WAL. > This situation is true whether or not T1 was asynchronous or > synchronous. As a result, it is safe for asynchronous commits and > synchronous commits to work concurrently without endangering data > written by synchronous commits. Sub-transactions are not important > here since the final write to disk only occurs at the commit of > the top level transaction. > </para> > > <para> > Changes to data blocks cannot reach disk unless <acronym>WAL</acronym> > is flushed up to the point of the <acronym>LSN</acronym> of the data > blocks. Any attempt to write unsafe data to disk will trigger a write > which ensures the safety of all data written by that and prior > transactions. Data blocks and clog pages are both protected by > <acronym>LSN</acronym>s. > </para> > > <para> > Changes to a temp table are also safe. Those changes could reach disk > in advance of T2's commit, but we don't care since temp table contents > don't survive crashes anyway. > </para> > > <para> > Database writes made any of the paths we have introduced > to avoid <acronym>WAL</acronym> overhead for bulk updates are also safe. > In these cases it's entirely possible for the data to reach disk before > T1's commit, because T2 will fsync it down to disk without any sort of > interlock, as soon as it finishes the bulk update. However, all these > paths are designed to write data that no other transaction can see until > after T2 commits. LSN2 follows LSN1, so until it has reached disk > the contents of the bulk-updated file are unimportant after a crash. > </para> > > <para> > Each row version contains the transaction numbers of the transactions > that inserted or deleted them, to allow <acronym>MVCC</acronym> > visibility checks. To reduce accesses to the Commit Log we normally > set transaction status hint bits are normally set on each row on the > first visibility check after the commit or abort of that transaction. > When we attempt to set a commit hint bit for a normal transaction, we > check the <acronym>LSN</acronym> for a transaction and if this has already > been written then we will set the status hint bit. If we haven't yet > flushed <acronym>WAL</acronym> up to that <acronym>LSN</acronym> then > we will defer writing the status hint bit for that row. A later > visibility test will be able to set it correctly within a short time, > as already described for the WAL writer process. > </para> > </sect2> > </sect1>
---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster