Re: [HACKERS] patch for parallel pg_dump

Joachim Wieland Wed, 14 Mar 2012 22:29:00 -0700

On Wed, Mar 14, 2012 at 4:39 PM, Andrew Dunstan <aduns...@postgresql.org> wrote:
> I've just started looking at the patch, and I'm curious to know why it
> didn't follow the pattern of parallel pg_restore which created a new worker
> for each table rather than passing messages to looping worker threads as
> this appears to do. That might have avoided a lot of the need for this
> message passing infrastructure, if it could have been done. But maybe I just
> need to review the patch and the discussions some more.


The main reason for this design has now been overcome by the
flexibility of the synchronized snapshot feature, which allows to get
the snapshot of a transaction even if this other transaction has been
running for quite some time already. In other previously proposed
implementations of this feature, workers had to connect at the same
time and then could not close their transactions without losing the
snapshot.

The other drawback of the fork-per-tocentry-approach is the somewhat
limited bandwith of information from the worker back to the master,
it's basically just the return code. That's fine if there is no error,
but if there is, then the master can't tell any further details (e.g.
"could not get lock on table foo", or "could not write to file bar: no
space left on device").

This restriction does not only apply to error messages. For example,
what I'd also like to have in pg_dump would be checksums on a
per-TocEntry basis. The individual workers would calculate the
checksums when writing the file and then send them back to the master
for integration into the TOC. I don't see how such a feature could be
implemented in a straightforward way without a message passing
infrastructure.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch for parallel pg_dump

Reply via email to