Hello Stepan!

I can see you tested my patch by itself, and with the following command:

On Monday 2025-07-21 07:58, Stepan Neretin wrote:

      pg_restore -d "$DB_NAME" -j "$JOBS" --clean --if-exists --freeze 
"$DUMP_FILE" > /dev/null

On the other hand, I have tested combining --freeze with --data-only --clean, as implemented by another patch of mine at:

https://www.postgresql.org/message-id/flat/413c1cd8-1d6d-90ba-ac7b-b226a4dad5ed%40gmx.net#44f06ce85a14a6238125a0ad8c9767db

In fact I was considering that patch as a requirement to this one, but I see that you are using my patch independently to issue --clean --freeze without --data-only. As far as I understand, this will also issue a TRUNCATE before each COPY, and the benefits of FREEZE will be the same. Thank you for revealing this to me.


The main area of concern is the implementation method in 
`pg_backup_archiver.c`. The patch introduces a `do_copy` function that modifies 
the `COPY` statement string to inject the `WITH (FREEZE)` option.

```c
if (cp_end - 10 > cp &&
    strncmp(cp_end - 10, "FROM stdin", 10) == 0
    )
    cp_is_well_formed = true;
```

This approach of string parsing is quite fragile. It makes a strong assumption 
that the `COPY` statement generated by `pg_dump` will always end with `... FROM 
stdin;` (preceded by optional whitespace). While this is true now, it creates a 
tight and brittle coupling between `pg_dump`'s output format and
`pg_restore`'s parsing logic. Any future changes to `pg_dump` that might add 
options or slightly alter the `COPY` syntax could break this feature silently 
or with a non-obvious warning.

It troubled me too. At least I have a pg_log_warning() message in this case, and nothing detrimental will happen, it's just that the FREEZE option will not be used and the message will be logged.


A more robust solution would be to avoid string manipulation entirely. 
`pg_restore` should assemble the `COPY` command from its constituent parts 
(table name, column list, etc.) and then conditionally append the `WITH 
(FREEZE)` clause before the final `FROM stdin;`. This would decouple the 
feature from
the exact string representation in the dump archive.

Totally agree, but this sounds impossible to implement. Unfortunately pg_dump generates SQL commands, and pg_restore edits them and executes them. I wouldn't know where to start to change this whole architecture.


An alternative—and arguably cleaner—approach might be to shift this logic to 
pg_dump.

Thinking it more thoroughly, pg_dump is not the place to compose the restoration SQL commands. The proper place is pg_restore. I guess it's being done in pg_dump for historical reasons, but I don't think pg_dump should complicate its commands further (e.g. by adding WITH FREEZE), as this is a choice to make during restore. I think I've seen more places in pg_restore that edit the commands before executing them.


One important consideration that needs to be highlighted in the documentation 
for this feature is the impact on WAL generation. `COPY ... WITH (FREEZE)` is 
known to generate significantly more WAL traffic because it must log the 
freezing of tuples, which can be a surprise for users. Maybe we can insert
already frozen pages?

There should be no WAL traffic at all. If the transaction starts with a TRUNCATE TABLE then COPY skips the wal completely. This gives a big performance and stability gain when bulk-loading data, and this is what my 2 patches try to leverage in more cases.

And as far as I can tell, if there is no TRUNCATE in the same transaction, then pg_restore will output error like the following:

ERROR:  cannot perform COPY FREEZE because the table was not created or 
truncated in the current subtransaction

I hope such an erroris acceptable, since the sysadmin can just remove --freeze then and re-run the command.

Additionally, it should be noted that the freeze option only works correctly 
when performing a fresh load of data into empty tables.

Do you think this patch has chances of going in by itself? If so then yes, I should definitely update the docs and submit it again individually.


Thank you for the rebase and review.

Dimitris

Reply via email to