Re: [HACKERS] directory archive format for pg_dump

Joachim Wieland Thu, 16 Dec 2010 10:39:45 -0800

On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com> wrote:
> As soon as we have parallel pg_dump, the next big thing is going to be
> parallel dump of the same table using multiple processes. Perhaps we should
> prepare for that in the directory archive format, by allowing the data of a
> single table to be split into multiple files. That way parallel pg_dump is
> simple, you just split the table in chunks of roughly the same size, say
> 10GB each, and launch a process for each chunk, writing to a separate file.


How exactly would you "just split the table in chunks of roughly the
same size" ? Which queries should pg_dump send to the backend? If it
just sends a bunch of WHERE queries, the server would still scan the
same data several times since each pg_dump client would result in a
seqscan over the full table.

Ideally pg_dump should be able to query for all data in only one
relation segment so that each segment is scanned by only one backend
process. However this requires backend support and we would be sending
queries that we'd not want clients other than pg_dump to send...

If you were thinking about WHERE queries to get equally sized
partitions, how would we deal with unindexed and/or non-numerical data
in a large table?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] directory archive format for pg_dump

Reply via email to