Doc patch only...

To aid with the understanding of the synchronous commit technique, I've
completed the docs prior to the completion of the final version of the
patch.

Any review comments, final doubts etc.., please say them now...

Applies cleanly to CVS HEAD, sgml make OK.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.130
diff -c -r1.130 config.sgml
*** doc/src/sgml/config.sgml	30 Jun 2007 19:12:01 -0000	1.130
--- doc/src/sgml/config.sgml	13 Jul 2007 11:17:36 -0000
***************
*** 1412,1417 ****
--- 1412,1457 ----
         </para>
        </listitem>
       </varlistentry>
+ 
+      <varlistentry id="guc-wal-writer-delay" xreflabel="wal_writer_delay">
+       <term><varname>wal_writer_delay</varname> (<type>integer</type>)</term>
+       <indexterm>
+        <primary><varname>wal_writer_delay</> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+        <para>
+ 		Specifies the delay between activity rounds for the WAL Writer. In each
+ 		round the writer will flush WAL to disk. It then sleeps for 
+ 		wal_writer_delay milliseconds, and repeats. The default value is 200 
+ 		milliseconds (200ms). Note that on many systems, the effective 
+ 		resolution of sleep delays is 10 milliseconds; setting wal_writer_delay 
+ 		to a value that is not a multiple of 10 might have the same results as 
+ 		setting it to the next higher multiple of 10. This parameter can only 
+ 		be set in the postgresql.conf file or on the server command line.
+        </para>
+       </listitem>
+      </varlistentry>
+      
+      <varlistentry id="guc-synchronous-commit" xreflabel="synchronous_commit">
+       <term><varname>synchronous_commit</varname> (<type>boolean</type>)</term>
+       <indexterm>
+        <primary><varname>synchronous_commit</> configuration parameter</primary>
+       </indexterm>
+       <listitem>
+        <para>
+ 		Specifies whether explicit or implicit commit commands will wait 
+ 		for WAL to be written to disk before the command returns success.
+ 		Turning this parameter off will give you asynchronous commits, 
+ 		which will increase performance for some workloads, though 
+ 		introduces risk of data loss, see <xref linkend="wal-asynch-commit">.
+        </para>
+        <para>
+ 		This parameter can be set for individual transactions or sessions,
+ 		so it is advisable to use asynchronous commits only for those
+ 		transactions for which risk of data loss is acceptable.
+        </para>
+       </listitem>
+      </varlistentry>
       
       <varlistentry id="guc-wal-buffers" xreflabel="wal_buffers">
        <term><varname>wal_buffers</varname> (<type>integer</type>)</term>
Index: doc/src/sgml/wal.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/wal.sgml,v
retrieving revision 1.44
diff -c -r1.44 wal.sgml
*** doc/src/sgml/wal.sgml	28 Jun 2007 00:02:37 -0000	1.44
--- doc/src/sgml/wal.sgml	13 Jul 2007 11:17:38 -0000
***************
*** 23,29 ****
     ordinarily meets this requirement.  In fact, even if a computer is
     fatally damaged, if the disk drives survive they can be moved to
     another computer with similar hardware and all committed
!    transactions will remain intact.
    </para>
  
    <para>
--- 23,33 ----
     ordinarily meets this requirement.  In fact, even if a computer is
     fatally damaged, if the disk drives survive they can be moved to
     another computer with similar hardware and all committed
!    transactions will remain intact. We refer to this mode of operation 
!    as synchronous commit, since the user issuing the commit waits for 
!    the writing of the <acronym>WAL</acronym> to permanent storage. 
!    Asynchronous commit is also possible as a performance option, 
!    described in <xref linkend="wal-asynch-commit">
    </para>
  
    <para>
***************
*** 394,397 ****
--- 398,586 ----
     seem to be a problem in practice.
    </para>
   </sect1>
+ 
+  <sect1 id="wal-asynch-commit">
+   <title>Asynchronous Commit</title>
+ 
+    <indexterm>
+     <primary>synchronous commit</primary>
+    </indexterm>
+ 
+    <indexterm>
+     <primary>asynchronous commit</primary>
+    </indexterm>
+ 
+   <para>
+    Asynchronous Commit allows a commit command to complete faster, at the 
+    cost that the most recent transactions will be lost if the database 
+    should crash. This feature is particularly useful for real-time
+    sensor data collection applications, such as RFID tags or other
+    monitoring applications.
+   </para>
+ 
+   <para>
+    Normal commits, or as we now refer to them, synchronous commits, wait
+    for the writing of the <acronym>WAL</acronym> to permanent storage
+    before returning control to the user.  An asynchronous commit will
+    return control to the user <emphasis>before</> <acronym>WAL</acronym> 
+    data has been written and flushed to disk, which gives a significant 
+    performance boost if there has been few other disk accesses over the course 
+    of this transaction.  From an SQL perspective, this makes asynchronous 
+    commits particularly useful for shorter, mainly INSERT transactions or 
+    writes on tables small enough to reside mainly in memory.
+   </para>
+ 
+   <para>
+    Asynchronous commits introduce the risk of data loss. There
+    is a time window between the time that a commit has returned
+    successfully to the user and the time of the WAL write during which
+    data will certainly be lost <emphasis>if</> the server crashes.  
+    The data loss is deterministic: if the server stays up then the 
+    commits will be durable, while if the server crashes there will be 
+    certain data loss of the transactions that have most recently
+    committed.
+   </para>
+ 
+   <para>
+    The user can select the commit mode of their transactions, so that 
+    it is possible to have both synchronous and asynchronous commit 
+    transactions concurrently.  The proof that this is safe is presented 
+    later in this section.  The commit mode is 
+    controlled by the user settable parameter 
+    <xref linkend="guc-synchronous-commit">. 
+    <varname>synchronous_commit</varname> may be set in 
+    <filename>postgresql.conf</filename>, or for a specific session using 
+ <programlisting>
+ SET synchronous_commit = off
+ </programlisting>
+    which will provide faster, asynchronous commits. 
+    <varname>synchronous_commit</varname> can be set at any point right 
+    up to the final COMMIT statement.  This parameter may also be set for
+    just one individual transaction using 
+ <programlisting>
+ SET LOCAL synchronous_commit = off
+ </programlisting>
+   </para>
+ 
+   <para>
+    Whatever the setting of <varname>synchronous_commit</varname>, 
+    commits will always be synchronous for utility commands, such 
+    as VACUUM, as well as for any transaction that created or removed 
+    files.
+   </para>
+ 
+   <sect2 id="wal-asynch-commit-risk">
+    <title>Risk of Data Loss</title>
+ 
+    <para>
+ 	If the database crashes during the risk window between the 
+ 	asynchronous commit and the writing of the <acronym>WAL</acronym>
+     then data written during that transaction <emphasis>will</> be lost.
+ 	Only data written by asynchronous transactions will be lost, just as 
+ 	if the transaction had never actually completed.  Transactions are
+ 	atomic, so this data loss does not propagate and any already 
+ 	written data is completely safe.
+    </para>
+ 
+    <para>
+ 	The window of data loss is limited because the WAL Writer process 
+ 	regularly writes <acronym>WAL</acronym> to disk every 
+ 	<xref linkend="guc-wal-writer-delay"> milliseconds.
+ 	The actual maximum duration of the risk window is twice the 
+ 	<varname>wal_writer_delay</varname> because the writes are optimised
+ 	to favor writing whole pages at a time during busy periods.
+    </para>
+ 
+    <para>
+ 	The extent of data loss for asynchronous commit transactions will be
+ 	limited to the number of transactions that normally complete in the
+ 	duration of the maximum time window.  You can calculate this for your 
+ 	specific application/system, though typically this would be in the 
+ 	ballpark of 1000 transactions.  This aspect means that asynch commits
+ 	are a useful technique for applications such as sensor measurements or
+ 	web statistics where a steady stream of data must be processed, yet
+ 	individual measurements are not particularly important.  Data loss 
+ 	would not be acceptable when the data processed is financial
+ 	transactions or other valuable customer commitments.
+    </para>
+ 
+    <para>
+     Asynchronous commit provides different behaviour to setting 
+     <varname>fsync</varname> = off, since that is a server-wide
+     setting that will alter the behaviour of all transactions,
+     overriding the setting of <varname>synchronous_commit</varname>,
+     as well as risking much wider data loss.  With <varname>fsync</varname> 
+ 	= off the WAL written but not fsynced, so data is lost only in case
+ 	of a system crash. With asynchronous commit the WAL is not written
+ 	at all by the user, so data is lost if there is a database server crash.
+    </para>
+ 
+    <para>
+     <varname>commit_delay</varname> also sounds very similar to 
+ 	asynchronous commit, but it is actually a synchronous commit 
+ 	with an additional wait that allows a technique known as group 
+ 	commit. <varname>commit_delay</varname> is also a server-wide setting.
+    </para>
+ 
+   </sect2>
+ 
+   <sect2 id="wal-asynch-commit-proof">
+    <title>Proof of safety for concurrent use</title>
+ 
+    <para>
+ 	It is useful that we can run both synchronous and asynchronous 
+ 	commit transactions concurrently, though there must be a clear proof
+ 	to ensure we minimise the possibility of technical error.
+    </para>
+ 
+    <para>
+ 	We have two transactions, T1 and T2.  The Log Sequence Number (LSN)
+ 	is the point in the WAL sequence where a transaction commit is  
+ 	recorded, so LSN1 and LSN2 are the commit records of those transactions. 
+ 	If T2 can see changes made by T1 then when T2 commits it
+ 	must be true that LSN2 > LSN1.  Thus when T2 commits it is certain
+ 	that all of the changes made by T1 are also now recorded in the WAL.
+ 	This situation is true whether or not T1 was asynchronous or
+ 	synchronous.  As a result, it is safe for asynchronous commits and 
+ 	synchronous commits to work concurrently without endangering data 
+ 	written by synchronous commits.  Sub-transactions are not important 
+ 	here since the final write to disk only occurs at the commit of 
+ 	the top level transaction.
+    </para>
+ 
+    <para>
+ 	Changes to data blocks cannot reach disk unless WAL is flushed up 
+ 	to the point of the LSN of the data blocks.  Any attempt to write
+ 	unsafe data to disk will trigger a write which ensures the safety
+ 	of all data written by that and prior transactions. Data blocks 
+ 	and clog pages are both protected by LSNs.
+    </para>
+ 
+    <para>
+ 	Non-WAL-logged changes to a temp table are also safe.  Those changes
+ 	could reach disk in advance of T2's commit, but we don't care since 
+ 	temp table contents don't survive crashes anyway.
+    </para>
+ 
+    <para>
+ 	Non-WAL-logged change made via one of the paths we have introduced
+ 	to avoid WAL overhead for bulk updates.  In these cases it's entirely
+ 	possible for the data to reach disk before T1's commit, because T2 will
+ 	fsync it down to disk without any sort of interlock, as soon as it
+ 	finishes the bulk update.  However, all these paths are designed to 
+ 	write data that no other transaction can see until after T2 commits. 
+ 	That commit must follow T1's in the WAL log, so until it has reached 
+ 	disk, the contents of the bulk-updated file are unimportant after a
+ 	crash.
+    </para>
+ 
+    <para>
+ 	Transaction status hint bits are normally set for data rows changed 
+ 	by a transaction. When we attempt to set a hint bit, we check the
+ 	LSN for a transaction and if this has already been written then we
+ 	will set the status hint bit.  If we haven't yet flushed WAL up to
+ 	that LSN then we will defer writing the status hint bit for that row.
+    </para>
+   </sect2>
+  </sect1>
  </chapter>
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to