On Sun, 2003-10-05 at 19:58, Tom Lane wrote:
> That's a fairly useless place to put it, though, since someone would
> only think to look at sort_mem if they already had a clue.  It should
> be mentioned under bulk data load (in performance tips chapter)

Attached is a doc patch that does this. The way I've worded it may not
be the best, though.

>  and perhaps also in dump/restore procedures.

It's already mentioned there.

Should we also suggest turning off fsync when doing restores?

(BTW, is there a reason the docs consistently call them "B-tree
indexes", not "B+-tree indexes"?)

-Neil

Index: doc/src/sgml/perform.sgml
===================================================================
RCS file: /var/lib/cvs/pgsql-server/doc/src/sgml/perform.sgml,v
retrieving revision 1.33
diff -c -r1.33 perform.sgml
*** doc/src/sgml/perform.sgml	11 Sep 2003 18:30:38 -0000	1.33
--- doc/src/sgml/perform.sgml	6 Oct 2003 00:21:48 -0000
***************
*** 751,761 ****
  
     <para>
      Use <command>COPY FROM STDIN</command> to load all the rows in one
!     command, instead of using
!     a series of <command>INSERT</command> commands.  This reduces parsing,
!     planning, etc.
!     overhead a great deal. If you do this then it is not necessary to turn
!     off autocommit, since it is only one command anyway.
     </para>
    </sect2>
  
--- 751,760 ----
  
     <para>
      Use <command>COPY FROM STDIN</command> to load all the rows in one
!     command, instead of using a series of <command>INSERT</command>
!     commands.  This reduces parsing, planning, etc.  overhead a great
!     deal. If you do this then it is not necessary to turn off
!     autocommit, since it is only one command anyway.
     </para>
    </sect2>
  
***************
*** 764,772 ****
  
     <para>
      If you are loading a freshly created table, the fastest way is to
!     create the table, bulk-load with <command>COPY</command>, then create any
!     indexes needed 
!     for the table.  Creating an index on pre-existing data is quicker than
      updating it incrementally as each row is loaded.
     </para>
  
--- 763,771 ----
  
     <para>
      If you are loading a freshly created table, the fastest way is to
!     create the table, bulk load the table's data using
!     <command>COPY</command>, then create any indexes needed for the
!     table.  Creating an index on pre-existing data is quicker than
      updating it incrementally as each row is loaded.
     </para>
  
***************
*** 780,785 ****
--- 779,797 ----
     </para>
    </sect2>
  
+   <sect2 id="populate-sort-mem">
+    <title>Increase <varname>sort_mem</varname></title>
+ 
+    <para>
+     Temporarily increasing the <varname>sort_mem</varname>
+     configuration variable when restoring large amounts of data can
+     lead to improved performance. This is because when a B-tree index
+     is created from scratch, the existing content of the table needs
+     to be sorted. Allowing the merge sort to use more buffer pages
+     means that fewer merge passes will be required.
+    </para>
+   </sect2>
+ 
    <sect2 id="populate-analyze">
     <title>Run <command>ANALYZE</command> Afterwards</title>
  
Index: doc/src/sgml/runtime.sgml
===================================================================
RCS file: /var/lib/cvs/pgsql-server/doc/src/sgml/runtime.sgml,v
retrieving revision 1.210
diff -c -r1.210 runtime.sgml
*** doc/src/sgml/runtime.sgml	3 Oct 2003 19:26:49 -0000	1.210
--- doc/src/sgml/runtime.sgml	6 Oct 2003 00:06:45 -0000
***************
*** 928,935 ****
          by <literal>ORDER BY</>, merge joins, and <command>CREATE INDEX</>.
          Hash tables are used in hash joins, hash-based aggregation, and
          hash-based processing of <literal>IN</> subqueries.  Because 
!         <command>CREATE INDEX</> is used when restoring a database, it might
!         be good to temporarily increase this value during a restore.
         </para>
        </listitem>
       </varlistentry>
--- 928,936 ----
          by <literal>ORDER BY</>, merge joins, and <command>CREATE INDEX</>.
          Hash tables are used in hash joins, hash-based aggregation, and
          hash-based processing of <literal>IN</> subqueries.  Because 
!         <command>CREATE INDEX</> is used when restoring a database,
!         increasing <varname>sort_mem</varname> before doing a large
!         restore operation can improve performance.
         </para>
        </listitem>
       </varlistentry>
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to