On Sat, Jul 26, 2008 at 01:56:14PM -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > I want to dump tables separately for performance reasons. There are > > documented tests showing 100% gains using this method. There is no gain > > adding this to pg_restore. There is a gain to be had - parallelising > > index creation, but this patch doesn't provide parallelisation. > > Right, but the parallelization is going to happen sometime, and it is > going to happen in the context of pg_restore. So I think it's pretty > silly to argue that no one will ever want this feature to work in > pg_restore. > > To extend the example I just gave to Stephen, I think a fairly probable > scenario is where you only need to tweak some "before" object > definitions, and then you could do > > pg_restore --schema-before-data whole.dump >before.sql > edit before.sql > psql -f before.sql target_db > pg_restore --data-only --schema-after-data -d target_db whole.dump > > which (given a parallelizing pg_restore) would do all the time-consuming > steps in a fully parallelized fashion.
A few thoughts about pg_restore performance: To take advantage of non-logged copy, the create and load should be in the same transaction. To take advantage of file and buffer cache, it would be be good to do indexes immediately after table data. Many tables will be small enough to fit in cache and this will avoid re-reading them for index builds. This effect becomes stronger with more indexes on one table. There may also be some filesytem placement benefit to building the indexes for a table immediately after loading the data. The buffer fan file cache advantage also applies to constraint creation, but this is complicated by the need for indexes and data in the referenced tables. It seems that a high performance restore will want to proced in a different order than the current sort order or that proposed by the before/data/after patch. - The simplest unit of work for parallelism may be the table and its "decorations", eg indexes and relational constraints. - Sort tables by foreign key dependency so that referenced tables are loaded before referencing tables. - Do table creation and data load together in one transaction to use non-logged copy. Index builds, and constraint creation should follow immediately, either as part of the same transaction, or possibly parallelized themselves. Table creation, data load, index builds, and constraint creation could be packaged up as the unit of work to be done in a subprocess which either completes or fails as a unit. The worker process would be called with connection info, a file pointer to the data, and the DDL for the table. pg_restore would keep a work queue of tables to be restored in FK dependency order and also do the other schema operations such as functions and types. -dg -- David Gould [EMAIL PROTECTED] 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches