> On 23 Jun 2026, at 03:17, Kirk Wolak <[email protected]> wrote:
>
> Comments welcome. I am happy to adjust naming, documentation, or scope based
> on feedback.
Hi Kirk,
+1 on the problem. Excluding a few huge tables to keep the dump fast,
but still wanting a loadable slot so a recent slice can be dropped in
and restored in the normal order (data before indexes/FKs, no manual
reindex) is a real, recurring need. I'd be glad to see it solved.
For the archive: the broader "filter table data on dump" idea has come
up many times, and it may help to frame this patch against that history:
- 2008, Simon Riggs: pg_dump -w for sampling [0]. Redirected -
"why not COPY (SELECT ...) TO STDOUT".
- 2018, Carter Thaxton: --include-table-data-where, then --where [1].
Euler Taveira's concerns: a filtered subset "won't restore" (FKs),
and whether pg_dump is the right place for an ETL-ish parameter;
plus table:clause quoting/colon parsing. Returned with feedback.
- 2020, Surafel Temesgen: --where revival [2]. Daniel Gustafsson
flagged arbitrary user SQL as a search_path / CVE-2018-1058-class
hazard; no tests. Returned with feedback.
- 2022, Nikita Starovoitov: partial data dumps [3]. Stalled on
referential integrity - by the author's own account he could not
solve it within the dump.
- 2025, your own --filter-data PoC [4]. Still open; Tom suggested
partitioning, Greg suggested dumping a view (-t foo=view:fooslice).
Two objections recur across those: "just use COPY (SELECT ...) TO",
and "a raw row filter produces a dump that won't restore (FKs)". The
placeholder approach sits entirely clear of both. There is no user
WHERE clause, so no ETL surface, no search_path hazard, and no
consistency promise: as Carter noted in 2018, --where=table:false
already behaves exactly like --exclude-table-data, and the placeholder
is precisely that corner - identical restore-risk, it just keeps a
loadable empty TABLE DATA entry instead of dropping it. And the reason
this belongs in pg_dump rather than a hand-rolled \copy is the point
David G. Johnston made in 2018: it lets you lean on pg_dump's
dependency resolution - which is exactly what the placeholder buys, the
(externally produced) data lands on the normal restore path, ahead of
indexes and FKs, no post-restore reindex. So I'd pitch it as "reserve a
loadable slot", not "lightweight --where".
On the interface, a few options to discuss:
1. As posted: --create-table-data-placeholders as a global modifier
to --exclude-table-data. Works, but "create" already means CREATE
DATABASE in pg_dump, it's all-or-nothing across excluded tables,
and it couples two options.
2. A positive pattern option, e.g. --empty-table-data=PATTERN (plus
--empty-table-data-and-children), mirroring the --exclude-table-data
family. It folds the two current flags into one, reads as "include
the table, empty", and gives per-table control. Slight preference
here.
3. A matching action for the table_data object type in the --filter
file, for people (like you) who keep the big-table list in a file.
Heavier - it's a grammar change to a shared file format - so I'd
see it as an addition to 2, not a replacement.
WDYT?
Thank you!
Best regards, Andrey Borodin.
[0]
https://www.postgresql.org/message-id/flat/[email protected]
[1]
https://www.postgresql.org/message-id/flat/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=pxktetcxxg4kcut...@mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/calay4q8o00sg5nemi2auqnulvmglar6br+yrvm41zuspjm2...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/flat/cac5einmthnpakugdj0d8-kfn21hpqnouvdamoha4tdmzgyo...@mail.gmail.com
[4]
https://www.postgresql.org/message-id/flat/CACLU5mS07WGPpq6=m8ac5tumusnpl8fhbpnyem86isawegc...@mail.gmail.com