[PATCHES] Fast CLUSTER

Simon Riggs Wed, 07 Feb 2007 00:31:02 -0800

I've used the heap_sync() API call to improve performance of CLUSTER by
avoiding WAL when archive_command is not set.


I've refactored the heap_sync() call very slightly, to hide some of the
complexity that was previously exposed to COPY. It now syncs a TOAST
relation automatically if one exists for the table. That also fixes a
latent bug in CREATE TABLE AS SELECT which would have meant that the
TOAST table wasn't properly synced (my bug), effectively caused by the
original API design.

Patch applies cleanly, passes make check.

No user interface changes. Performance docs updated also to reflect this
change and slightly improve directly related wording.

-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com

Index: doc/src/sgml/perform.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/perform.sgml,v
retrieving revision 1.63
diff -c -r1.63 perform.sgml
*** doc/src/sgml/perform.sgml	1 Feb 2007 19:10:24 -0000	1.63
--- doc/src/sgml/perform.sgml	7 Feb 2007 08:21:59 -0000
***************
*** 908,919 ****
      will perform more slowly when <varname>archive_command</varname>
      is set, as a result of their needing to write large amounts of WAL.
      This applies to the following commands: 
!     <command>CREATE TABLE AS SELECT</command>, 
!     <command>CREATE INDEX</command> and also <command>COPY</command>, when
!     it is executed in the same transaction as a prior 
!     <command>CREATE TABLE</command> or <command>TRUNCATE</command> command.
!    </para>
! 
    </sect2>
  
    <sect2 id="populate-analyze">
--- 908,925 ----
      will perform more slowly when <varname>archive_command</varname>
      is set, as a result of their needing to write large amounts of WAL.
      This applies to the following commands: 
!     <itemizedlist>
!      <listitem><para><command>CREATE TABLE AS SELECT</command></para></listitem>
!      <listitem><para><command>CREATE INDEX</command></para></listitem>
!      <listitem><para><command>ALTER TABLE SET TABLESPACE</command></para></listitem>
!      <listitem><para><command>CLUSTER</command></para></listitem>
!      <listitem><para><command>COPY</command>, when it is executed after one of
! 	  these commands, yet in the same transaction:
! 	  <itemizedlist>
! 	   <listitem><para><command>CREATE TABLE</command></para></listitem>
! 	   <listitem><para><command>TRUNCATE</command></para></listitem>
! 	  </itemizedlist>
!   </itemizedlist>
    </sect2>
  
    <sect2 id="populate-analyze">
***************
*** 950,964 ****
      By default, <application>pg_dump</> uses <command>COPY</>, and when
      it is generating a complete schema-and-data dump, it is careful to
      load data before creating indexes and foreign keys.  So in this case
!     the first several guidelines are handled automatically.  What is left
!     for you to do is to set appropriate (i.e., larger than normal) values
!     for <varname>maintenance_work_mem</varname> and
!     <varname>checkpoint_segments</varname>, as well as unsetting 
!     <varname>archive_command</varname> before loading the dump script,
!     and then to run <command>ANALYZE</> afterwards and resetting
!     <varname>archive_command</varname> if required. All of the 
!     parameters can be reset once the load has completed without needing
!     to restart the server, as described in <xref linkend="config-setting">.
     </para>
  
     <para>
--- 956,989 ----
      By default, <application>pg_dump</> uses <command>COPY</>, and when
      it is generating a complete schema-and-data dump, it is careful to
      load data before creating indexes and foreign keys.  So in this case
!     several guidelines are handled automatically.  What is left
!     for you to do is to:
!     <itemizedlist>
!      <listitem><para>Set appropriate (i.e., larger than normal) values
!       for <varname>maintenance_work_mem</varname> and
!       <varname>checkpoint_segments</varname>.</para></listitem>
!      <listitem><para>Unset <varname>archive_command</varname> before
! 	  loading the dump script, and then reset 
! 	  <varname>archive_command</varname> afterwards, if required.</para>
! 	 </listitem>
!      <listitem>
! 	  <para>
!        Consider whether the whole dump can be restored as a single
! 	   transaction, so that COPY commands can be optimized. Remember that
! 	   this will mean that all restore commands will either fully completed
! 	   or fully rolled back. This mode can be specified by passing the 
! 	   <option>-1</> or <option>--single-transaction</> command-line options
! 	   to <application>psql</> or <application>pg_restore</>. When using this
! 	   mode, be aware that even the smallest of errors can rollback a
!        restore that has already run for many hours. However, that might
!        still be preferable to manually cleaning up a complex database
!        after a partially restored dump.
! 	  </para>
! 	 </listitem>
!      <listitem><para>Run <command>ANALYZE</> afterwards.</para></listitem>
! 	All of the above mentioned parameters can be reset once the load has
! 	completed without needing to restart the server, as described in 
! 	<xref linkend="config-setting">.
     </para>
  
     <para>
Index: src/backend/access/heap/heapam.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.227
diff -c -r1.227 heapam.c
*** src/backend/access/heap/heapam.c	5 Feb 2007 04:22:18 -0000	1.227
--- src/backend/access/heap/heapam.c	7 Feb 2007 08:22:03 -0000
***************
*** 60,65 ****
--- 60,66 ----
  
  static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
  		   ItemPointerData from, Buffer newbuf, HeapTuple newtup, bool move);
+ static void heap_sync_relation(Relation rel);
  
  
  /* ----------------------------------------------------------------
***************
*** 3990,3995 ****
--- 3991,4023 ----
  {
  	if (!rel->rd_istemp)
  	{
+ 		/* main heap */
+ 		heap_sync_relation(rel);
+ 
+ 		/* main heap indexes, if any */
+ 		/* we always use WAL for index inserts, so no need to sync */
+ 
+ 		/* toast heap, if any */
+ 		if (OidIsValid(rel->rd_rel->reltoastrelid))
+ 		{
+ 			 Relation		toastrel;
+ 
+ 			 toastrel = heap_open(rel->rd_rel->reltoastrelid,
+ 								  AccessShareLock);
+ 			 heap_sync_relation(toastrel);
+ 			 heap_close(toastrel, AccessShareLock);
+ 		}
+ 
+ 		/* toast index, if toast heap */
+ 		/* we always use WAL for index inserts, so no need to sync */
+ 	}
+ }
+ 
+ static void
+ heap_sync_relation(Relation rel)
+ {
+ 	if (!rel->rd_istemp)
+ 	{
  		/*
  		 * If we skipped using WAL, and it's not a temp relation,
  		 * we must force the relation down to disk before it's
Index: src/backend/commands/cluster.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/commands/cluster.c,v
retrieving revision 1.156
diff -c -r1.156 cluster.c
*** src/backend/commands/cluster.c	1 Feb 2007 19:10:25 -0000	1.156
--- src/backend/commands/cluster.c	7 Feb 2007 08:22:04 -0000
***************
*** 653,658 ****
--- 653,659 ----
  	char	   *nulls;
  	IndexScanDesc scan;
  	HeapTuple	tuple;
+ 	bool		use_wal;
  
  	/*
  	 * Open the relations we need.
***************
*** 676,681 ****
--- 677,688 ----
  	memset(nulls, 'n', natts * sizeof(char));
  
  	/*
+ 	 * We need to log the copied data in WAL iff WAL archiving is enabled AND
+ 	 * it's not a temp rel.
+ 	 */
+ 	use_wal = XLogArchivingActive() && !NewHeap->rd_istemp;
+ 
+ 	/*
  	 * Scan through the OldHeap on the OldIndex and copy each tuple into the
  	 * NewHeap.
  	 */
***************
*** 722,728 ****
  		if (NewHeap->rd_rel->relhasoids)
  			HeapTupleSetOid(copiedTuple, HeapTupleGetOid(tuple));
  
! 		simple_heap_insert(NewHeap, copiedTuple);
  
  		heap_freetuple(copiedTuple);
  
--- 729,735 ----
  		if (NewHeap->rd_rel->relhasoids)
  			HeapTupleSetOid(copiedTuple, HeapTupleGetOid(tuple));
  
! 		fast_heap_insert(NewHeap, copiedTuple, use_wal);
  
  		heap_freetuple(copiedTuple);
  
***************
*** 736,741 ****
--- 743,752 ----
  
  	index_close(OldIndex, NoLock);
  	heap_close(OldHeap, NoLock);
+ 
+ 	if (!use_wal)
+ 		heap_sync(NewHeap);
+ 
  	heap_close(NewHeap, NoLock);
  }
  
Index: src/backend/commands/copy.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.275
diff -c -r1.275 copy.c
*** src/backend/commands/copy.c	25 Jan 2007 02:17:26 -0000	1.275
--- src/backend/commands/copy.c	7 Feb 2007 08:22:07 -0000
***************
*** 1856,1862 ****
  	 * do not need to worry about inconsistent states.
  	 * As mentioned in comments in utils/rel.h, the in-same-transaction test is
  	 * not completely reliable, since rd_createSubId can be reset to zero in
! 	 * certain cases before the end of the creating transaction. 
  	 * We are doing this for performance only, so we only need to know: 
  	 * if rd_createSubid != InvalidSubTransactionId then it is *always* just 
  	 * created. If we have PITR enabled, then we *must* use_wal
--- 1856,1864 ----
  	 * do not need to worry about inconsistent states.
  	 * As mentioned in comments in utils/rel.h, the in-same-transaction test is
  	 * not completely reliable, since rd_createSubId can be reset to zero in
! 	 * certain cases before the end of the creating transaction.
! 	 * CLUSTER creates a new relfilenode but it removes the old one immediately
! 	 * for other reasons, so we cannot optimise COPY following CLUSTER.
  	 * We are doing this for performance only, so we only need to know: 
  	 * if rd_createSubid != InvalidSubTransactionId then it is *always* just 
  	 * created. If we have PITR enabled, then we *must* use_wal
***************
*** 2120,2147 ****
  	 * If we skipped writing WAL for heaps, then we need to sync
  	 */
  	if (!use_wal)
- 	{
- 		/* main heap */
  		heap_sync(cstate->rel);
  
- 		/* main heap indexes, if any */
- 		/* we always use WAL for index inserts, so no need to sync */
- 
- 		/* toast heap, if any */
- 		if (OidIsValid(cstate->rel->rd_rel->reltoastrelid))
- 		{
- 			 Relation		toastrel;
- 
- 			 toastrel = heap_open(cstate->rel->rd_rel->reltoastrelid,
- 								  AccessShareLock);
- 			 heap_sync(toastrel);
- 			 heap_close(toastrel, AccessShareLock);
- 		}
- 
- 		/* toast index, if toast heap */
- 		/* we always use WAL for index inserts, so no need to sync */
- 	}
- 
  	/* Done, clean up */
  	error_context_stack = errcontext.previous;
  
--- 2122,2129 ----

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

[PATCHES] Fast CLUSTER

Reply via email to