Many times I've wanted to export a subset of a database, using some sort of
row filter condition on some of the large tables.  E.g. copying a
production database to a staging environment, but with some time series
data only from the past month.

We have the existing options:
  --include-table=table    (and its -t synonym)
  --exclude-table=table
  --exclude-table-data=table

I propose a new option:
  --include-table-data-where=table:filter_clause

One would use this option as follows:

  pg_dump --include-table-data-where=largetable:"created_at >=
'2018-05-01'" database_name

The filter_clause is used as the contents of a WHERE clause when querying
the data to generate the COPY statement produced by pg_dump.

I've prepared a proposed patch for this, which is attached.  The code
changes are rather straightforward.  I did have to add the ability to carry
around an extra pointer-sized object to the simple_list implementation, in
order to allow the filter clause to be associated to the matching oids of
the table pattern.  It seemed the best way to augment the existing
simple_list implementation, but change as little as possible elsewhere in
the codebase.  (Note that SimpleOidList is actually only used by pg_dump).

Feel free to review and propose any amendments.

Attachment: pgdump-include-table-data-where-v1.patch
Description: Binary data

Reply via email to