Re: [Tile-serving] [osm2pgsql-dev/osm2pgsql] Improving efficiency of the node location store (Issue #2403)

Sarah Hoffmann via Tile-serving Fri, 22 Aug 2025 11:49:45 -0700

lonvia left a comment (osm2pgsql-dev/osm2pgsql#2403)

(copied from 
https://github.com/osm2pgsql-dev/osm2pgsql/issues/1466#issuecomment-2882929246)


Here are some additional considerations regarding memory efficiency for the 
node location store:

The current RAM middle implements the node location store roughly as this: 
Locations are stored in blocks of 32 nodes. Inside each block the ids of nodes 
and the x and y coordinates are stored delta encoded and using varints. Overall 
we need about 8 to 9 bytes per stored node for this, for larger files (planet) 
it is a bit less, for smaller files a bit more.

This doesn't look that great. The naive implementation (which we use for the on 
disk flat node files) uses 8 bytes x the highest node id. So for the planet 
file we are really close to that. Of course for smaller files the RAM middle is 
much more efficient because it only needs the 8 bytes for actually stored nodes 
and does not depend on the highest node id.

Can we find a more efficient encoding than the delta/varint encoding? What 
makes the delta/varint work reasonably well is that we have runs of nodes that 
are really close to each other, because they were created together for OSM 
ways. But they are not that close either, so varint might not be the best 
encoding. And we don't always have these runs of close nodes. And varint 
encoding for essentially random numbers does not work well.

We should explore other encoding options. One possible idea is to encode the x 
and y coordinates together, interleaving the bits from the x and y coordinates, 
like the index in the main OSM database does it. Then we can store the higher 
order bits that are the same for all (or many) of the nodes in a block 
separately from the lower order bits. This could be done either with a fixed or 
a dynamic separation between those higher/lower order bits. Some 
back-of-the-envelope calculation shows that this might save some space, but 
we'd need to implement it to see whether that's the case with actual data.

We could also have different implementations for the blocks depending on the 
node locations. For each block the most efficient encoding could be chosen, 
similar to how compression engines sometimes work. Of course this adds more 
fixed overhead to store that information...

-- 
Reply to this email directly or view it on GitHub:
https://github.com/osm2pgsql-dev/osm2pgsql/issues/2403#issuecomment-3215292467
You are receiving this because you are subscribed to this thread.

Message ID: <osm2pgsql-dev/osm2pgsql/issues/2403/[email protected]>

_______________________________________________
Tile-serving mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/tile-serving

Re: [Tile-serving] [osm2pgsql-dev/osm2pgsql] Improving efficiency of the node location store (Issue #2403)

Reply via email to