On Mon, 24 Mar 2025, Adrian Klaver wrote:
On 3/24/25 07:24, Dimitrios Apostolou wrote:
On Sun, 23 Mar 2025, Laurenz Albe wrote:
On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
Performance issues: (important as my db size is >5TB)
* WAL writes: I didn't manage to avoid writing to the WAL, despite
having
setting wal_level=minimal. I even wrote my own function to ALTER all
tables to UNLOGGED, but failed with "could not change table T to
unlogged because it references logged table". I'm out of ideas on
this
one.
You'd have to create an load the table in the same transaction, that is,
you'd have to run pg_restore with --single-transaction.
That would restore the schema from the dump, while I want to create the
schema from the SQL code in version control.
I am not following, from your original post:
"
... create a
clean database by running the SQL schema definition from version control, and
then copy the data for only the tables created.
For this case, I choose to run pg_restore --data-only, and run it as the user
who owns the database (dbowner), not as a superuser, in order to avoid
changes being introduced under the radar.
"
You are running the process in two steps, where the first does not involve
pg_restore. Not sure why doing the pg_restore --data-only portion in single
transaction is not possible?
Laurenz informed me that I could avoid writing to the WAL if I "create and
load the table in a single transaction".
I haven't tried, but here is what I would do to try --single-transaction:
Transaction 1: manually issuing all of CREATE TABLE etc.
Transaction 2: pg_restore --single-transaction --data-only
The COPY command in transaction 2 would still need to write to WAL, since
it's separate from the CREATE TABLE.
Am I wrong somewhere?
Something that might work, would be for pg_restore to issue a TRUNCATE
before the COPY. I believe this would require superuser privelege though,
that I would prefer to avoid. Currently I issue TRUNCATE for all tables
manually before running pg_restore, but of course this is in a different
transaction so it doesn't help.
By the way do you see potential problems with using --single-transaction
to restore billion-rows tables?
COPY is all or none(version 17+ caveat(see
https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so if the
data dump fails in --single-transaction everything rolls back.
So if I restore all tables, then an error about a "table not found" would
not roll back already copied tables, since it's not part of a COPY?
Thank you for the feedback,
Dimitris