I did some further tests:

All numbers reported here are for the ways table only. The simple COPY took 57 
minutes. Tests were done using PostgreSQL 15. Supposedly there are some 
improvements in COPY performance in Pg 16, so we should also test with Pg 16 
and/or 17.

## Using binary format

COPY can be done with the text format (that we currently use) or a binary 
format. Using binary format the COPY time drops to 40 minutes, so we are about 
30% faster. The binary format is not well documented and the documentation says 
it might change, but it hasn't changed since PostgreSQL 7 as far as I can see, 
so it is unlikely that this will be a problem. More so because we are only 
writing into the database, not reading, so we have more control over the format 
than if we have to parse the format the database generates.

This should also save use some time on the osm2pgsql side, because generating 
the binary format is probably faster than the text format. For geometries which 
don't have to be hex encoded any more, the number of bytes transfered will be 
only half, that should also help. I can really see no downside, we should 
consider switching. 

## Parallel COPYs

I tried simulating parallel COPYs by splitting a COPY file into 2 (or 4) pieces 
and doing 2 (or 4) copies simultaneously. This is with the text format. The 
import times were 33 minutes (or 23 minutes for 4 COPYs). It is hard to tell 
how well this would work in practice, but it is definitly worth a shot. The CPU 
usage for the postgres process doing the COPY went from 100% with one COPY to 
something like 70%, so it looks like we are not CPU-bound any more but either 
I/O bound or hitting some limits on locks or so.

This might make the data format on disk not so efficient though, maybe creating 
the index will take longer or usage of the resulting table is slightly slower.

## With FREEZE

Using COPY FREEZE the import takes 49 minutes, using FREEZE and binary format 
we are at 32 minutes, almost halfing the current time.

Unfortunately using COPY FREEZE does not work together with parallel COPYs, 
because you need to create the table in the same transaction that you do the 
COPY FREEZE in. I tried using [snapshot 
synchronization](https://www.postgresql.org/docs/17/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION)
 to overcome this, but it didn't work. Maybe I didn't do it right, but there is 
probably something in there that prevents this from working, my transactions 
always got rolled back when I tried doing this.

Using COPY FREEZE would be simpler to implement that having multiple 
connections for parallel copies, so this is still something to consider.

## UNLOGGED TABLE

Somewhere on the Internet I found the suggestion to create the table as 
UNLOGGED TABLE, then do the import and the ALTER the table to LOGGED. This does 
not help. The COPY took about the same time and the ALTER TABLE took quite some 
time, so this is a dead end.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/issues/2110#issuecomment-2568252878
You are receiving this because you are subscribed to this thread.

Message ID: <osm2pgsql-dev/osm2pgsql/issues/2110/[email protected]>
_______________________________________________
Tile-serving mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/tile-serving

Reply via email to