Re: [OSM-talk] Distribution of OSM ids could be much more useful!

Victor Shcherb Sun, 25 Nov 2018 14:05:29 -0800

Yes, I threw the idea slightly unprepared to create a discussion. May be it
shouldn't be that revolutionary as using 30 bits for geo-location index but *16
bits *wouldn't change much.


As I see, we are talking *density vs geo-index.* I understand, your point
that most of (or some) software is built to optimize density but it should
be able to take advantage from geo-index as well.

> 33 bits of ids will mean 56-64-bit space for geo-index cache (wouldn't
fit operating memory)
As of today we are approaching 33 bits and we may never approach 36 bits.
Though to build a geolocation cache we need to associate each id at least
with its location i.e. int representing a tile. So we need to add to 33
bits - 30 or 32 bits and we end up around 64 bits, so it is almost
impossible to keep a geo index in the memory.
The operation of extraction is the most popular in OSM and there it could
benefit the most i.e. iterating over Way nodes you can immediately say if
it is valuable or not for your dataset which might fit the operating memory
well. By the second run you can combine the ways you are interested with
with the nodes.

> Z-curve locality
I don't see any problem with locality of Z-curve cause it would not be used
in any algorithm I see. The algorithm would build Z-tiles index which are
interested for data set extraction.

>Density issue, how many bits is the best to store
Of course, we could write an algorithm and find the best-ratio between
id-distribution and bits allocated for geo-index. I would try to speculate
with 16 bits.
If we take only 16 bits, the most dense area I would see as Munich and its
suburbs (8th tile zoom). As I see that tile takes around *60 MB in osm.pbf*.
And it brings roughly 5 000 000 - 10 000 000 ids and it is *23 extra bits. *So
we could safely assume that we will stay in *26 bits* and 16 + 26 = *42
bits *which falls under your assumption of dense software, I guess.

The most important to say that difference between 42 and 34 bits is not
huge for software at all cause there is always alignment by 8 bits.

BTW: I could imagine that working with Whole planet is different use case
where you need to maintain all global indexes and so on. Of course, by
taking extra work on OSM DB and OSM API, it should help a lot 3rd party
apps which don't process whole planet.

Best Regards,
Victor

On 25 Nov 2018 06:36:39 -0800, Paul Norman wrote


> It would be terrible for most software that I am aware of that can
> process the full planet. Current assumptions about density would be
> broken, vastly inflating memory usage and slowing down processing.
>
> The benefits aren't great as I see them. Using a z-order curve encoded
> in the first 30 bits will help cache locality, but like all z-order
> curves, it doesn't guarantee that two nearby places in 2d space have
> nearby places on the curve. This means that an implementation still
> needs to be able to search through the nodes for nearby ones.
>
> Two other problems come to mind. The first of these is implementation.
> IDs are a PostgreSQL bigserial, and to write something custom that
> assigns IDs based on location would be difficult as it would need to get
> MVCC right. The second is the number of bits. Some software is limited
> to 53-59 bits, and other to 63 bits. We're using about 75% of 33 bits
> right now.
>

_______________________________________________
talk mailing list
talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/talk

Re: [OSM-talk] Distribution of OSM ids could be much more useful!

Reply via email to