On 04/23/2013 07:53 PM, Timothy Garnett wrote:
Hi All,
Currently the -j option to pg_restore, which allows for
parallelization in the restore, can only be used if the input file is
a regular file and not, for ex., a pipe. However this is a pretty
common occurrence for us (usually in the form of pg_dump | pg_restore
to copy an individual database or some tables thereof from one machine
to another). While there's no good way to parallelize the data load
steps when reading from a pipe, the index and constraint building can
still be parallelized and as they are generally CPU bound on our
machines we've found quite a bit of speedup from doing so.
Attached is two diffs off of the REL9_2_4 tag that I've been using.
The first is a simple change that serially loads the data section
before handing off the remainder of the restore to the existing
parallelized restore code (the .ALT. diff). The second which gets
more parallelization but is a bit more of a change uses the existing
dependency analysis code to allow index building etc. to occur in
parallel with data loading. The data loading tasks are still performed
serially in the main thread, but non-data loading tasks are scheduled
in parallel as their dependencies are satisfied (with the caveat that
the main thread can only dispatch new tasks between data loads).
Anyways, the question is if people think this is generally useful. If
so I can clean up the preferred choice a bit and rebase it off of
master, etc.
I don't think these are bad ideas at all, and probably worth doing. Note
that there are some fairly hefty changes affecting this code in master,
so your rebasing could be tricky.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers