On Sun, 2003-10-05 at 19:58, Tom Lane wrote: > That's a fairly useless place to put it, though, since someone would > only think to look at sort_mem if they already had a clue. It should > be mentioned under bulk data load (in performance tips chapter)
Attached is a doc patch that does this. The way I've worded it may not be the best, though. > and perhaps also in dump/restore procedures. It's already mentioned there. Should we also suggest turning off fsync when doing restores? (BTW, is there a reason the docs consistently call them "B-tree indexes", not "B+-tree indexes"?) -Neil
Index: doc/src/sgml/perform.sgml =================================================================== RCS file: /var/lib/cvs/pgsql-server/doc/src/sgml/perform.sgml,v retrieving revision 1.33 diff -c -r1.33 perform.sgml *** doc/src/sgml/perform.sgml 11 Sep 2003 18:30:38 -0000 1.33 --- doc/src/sgml/perform.sgml 6 Oct 2003 00:21:48 -0000 *************** *** 751,761 **** <para> Use <command>COPY FROM STDIN</command> to load all the rows in one ! command, instead of using ! a series of <command>INSERT</command> commands. This reduces parsing, ! planning, etc. ! overhead a great deal. If you do this then it is not necessary to turn ! off autocommit, since it is only one command anyway. </para> </sect2> --- 751,760 ---- <para> Use <command>COPY FROM STDIN</command> to load all the rows in one ! command, instead of using a series of <command>INSERT</command> ! commands. This reduces parsing, planning, etc. overhead a great ! deal. If you do this then it is not necessary to turn off ! autocommit, since it is only one command anyway. </para> </sect2> *************** *** 764,772 **** <para> If you are loading a freshly created table, the fastest way is to ! create the table, bulk-load with <command>COPY</command>, then create any ! indexes needed ! for the table. Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded. </para> --- 763,771 ---- <para> If you are loading a freshly created table, the fastest way is to ! create the table, bulk load the table's data using ! <command>COPY</command>, then create any indexes needed for the ! table. Creating an index on pre-existing data is quicker than updating it incrementally as each row is loaded. </para> *************** *** 780,785 **** --- 779,797 ---- </para> </sect2> + <sect2 id="populate-sort-mem"> + <title>Increase <varname>sort_mem</varname></title> + + <para> + Temporarily increasing the <varname>sort_mem</varname> + configuration variable when restoring large amounts of data can + lead to improved performance. This is because when a B-tree index + is created from scratch, the existing content of the table needs + to be sorted. Allowing the merge sort to use more buffer pages + means that fewer merge passes will be required. + </para> + </sect2> + <sect2 id="populate-analyze"> <title>Run <command>ANALYZE</command> Afterwards</title> Index: doc/src/sgml/runtime.sgml =================================================================== RCS file: /var/lib/cvs/pgsql-server/doc/src/sgml/runtime.sgml,v retrieving revision 1.210 diff -c -r1.210 runtime.sgml *** doc/src/sgml/runtime.sgml 3 Oct 2003 19:26:49 -0000 1.210 --- doc/src/sgml/runtime.sgml 6 Oct 2003 00:06:45 -0000 *************** *** 928,935 **** by <literal>ORDER BY</>, merge joins, and <command>CREATE INDEX</>. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of <literal>IN</> subqueries. Because ! <command>CREATE INDEX</> is used when restoring a database, it might ! be good to temporarily increase this value during a restore. </para> </listitem> </varlistentry> --- 928,936 ---- by <literal>ORDER BY</>, merge joins, and <command>CREATE INDEX</>. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of <literal>IN</> subqueries. Because ! <command>CREATE INDEX</> is used when restoring a database, ! increasing <varname>sort_mem</varname> before doing a large ! restore operation can improve performance. </para> </listitem> </varlistentry>
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match