I did some further tests:
All numbers reported here are for the ways table only. The simple COPY took 57
minutes. Tests were done using PostgreSQL 15. Supposedly there are some
improvements in COPY performance in Pg 16, so we should also test with Pg 16
and/or 17.
## Using binary format
COPY can be done with the text format (that we currently use) or a binary
format. Using binary format the COPY time drops to 40 minutes, so we are about
30% faster. The binary format is not well documented and the documentation says
it might change, but it hasn't changed since PostgreSQL 7 as far as I can see,
so it is unlikely that this will be a problem. More so because we are only
writing into the database, not reading, so we have more control over the format
than if we have to parse the format the database generates.
This should also save use some time on the osm2pgsql side, because generating
the binary format is probably faster than the text format. For geometries which
don't have to be hex encoded any more, the number of bytes transfered will be
only half, that should also help. I can really see no downside, we should
consider switching.
## Parallel COPYs
I tried simulating parallel COPYs by splitting a COPY file into 2 (or 4) pieces
and doing 2 (or 4) copies simultaneously. This is with the text format. The
import times were 33 minutes (or 23 minutes for 4 COPYs). It is hard to tell
how well this would work in practice, but it is definitly worth a shot. The CPU
usage for the postgres process doing the COPY went from 100% with one COPY to
something like 70%, so it looks like we are not CPU-bound any more but either
I/O bound or hitting some limits on locks or so.
This might make the data format on disk not so efficient though, maybe creating
the index will take longer or usage of the resulting table is slightly slower.
## With FREEZE
Using COPY FREEZE the import takes 49 minutes, using FREEZE and binary format
we are at 32 minutes, almost halfing the current time.
Unfortunately using COPY FREEZE does not work together with parallel COPYs,
because you need to create the table in the same transaction that you do the
COPY FREEZE in. I tried using [snapshot
synchronization](https://www.postgresql.org/docs/17/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION)
to overcome this, but it didn't work. Maybe I didn't do it right, but there is
probably something in there that prevents this from working, my transactions
always got rolled back when I tried doing this.
Using COPY FREEZE would be simpler to implement that having multiple
connections for parallel copies, so this is still something to consider.
## UNLOGGED TABLE
Somewhere on the Internet I found the suggestion to create the table as
UNLOGGED TABLE, then do the import and the ALTER the table to LOGGED. This does
not help. The COPY took about the same time and the ALTER TABLE took quite some
time, so this is a dead end.
--
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/issues/2110#issuecomment-2568252878
You are receiving this because you are subscribed to this thread.
Message ID: <osm2pgsql-dev/osm2pgsql/issues/2110/[email protected]>
_______________________________________________
Tile-serving mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/tile-serving