Re: [HACKERS] Allowing parallel pg_restore from pipe

Andrew Dunstan Wed, 24 Apr 2013 12:34:39 -0700


On 04/23/2013 07:53 PM, Timothy Garnett wrote:

Hi All,
Currently the -j option to pg_restore, which allows forparallelization in the restore, can only be used if the input file isa regular file and not, for ex., a pipe. However this is a prettycommon occurrence for us (usually in the form of pg_dump | pg_restoreto copy an individual database or some tables thereof from one machineto another). While there's no good way to parallelize the data loadsteps when reading from a pipe, the index and constraint building canstill be parallelized and as they are generally CPU bound on ourmachines we've found quite a bit of speedup from doing so.
Attached is two diffs off of the REL9_2_4 tag that I've been using.The first is a simple change that serially loads the data sectionbefore handing off the remainder of the restore to the existingparallelized restore code (the .ALT. diff). The second which getsmore parallelization but is a bit more of a change uses the existingdependency analysis code to allow index building etc. to occur inparallel with data loading. The data loading tasks are still performedserially in the main thread, but non-data loading tasks are scheduledin parallel as their dependencies are satisfied (with the caveat thatthe main thread can only dispatch new tasks between data loads).
Anyways, the question is if people think this is generally useful. Ifso I can clean up the preferred choice a bit and rebase it off ofmaster, etc.

I don't think these are bad ideas at all, and probably worth doing. Notethat there are some fairly hefty changes affecting this code in master,so your rebasing could be tricky.


cheers

andrew


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing parallel pg_restore from pipe

Reply via email to