Yes, I threw the idea slightly unprepared to create a discussion. May be it shouldn't be that revolutionary as using 30 bits for geo-location index but *16 bits *wouldn't change much.
As I see, we are talking *density vs geo-index.* I understand, your point that most of (or some) software is built to optimize density but it should be able to take advantage from geo-index as well. > 33 bits of ids will mean 56-64-bit space for geo-index cache (wouldn't fit operating memory) As of today we are approaching 33 bits and we may never approach 36 bits. Though to build a geolocation cache we need to associate each id at least with its location i.e. int representing a tile. So we need to add to 33 bits - 30 or 32 bits and we end up around 64 bits, so it is almost impossible to keep a geo index in the memory. The operation of extraction is the most popular in OSM and there it could benefit the most i.e. iterating over Way nodes you can immediately say if it is valuable or not for your dataset which might fit the operating memory well. By the second run you can combine the ways you are interested with with the nodes. > Z-curve locality I don't see any problem with locality of Z-curve cause it would not be used in any algorithm I see. The algorithm would build Z-tiles index which are interested for data set extraction. >Density issue, how many bits is the best to store Of course, we could write an algorithm and find the best-ratio between id-distribution and bits allocated for geo-index. I would try to speculate with 16 bits. If we take only 16 bits, the most dense area I would see as Munich and its suburbs (8th tile zoom). As I see that tile takes around *60 MB in osm.pbf*. And it brings roughly 5 000 000 - 10 000 000 ids and it is *23 extra bits. *So we could safely assume that we will stay in *26 bits* and 16 + 26 = *42 bits *which falls under your assumption of dense software, I guess. The most important to say that difference between 42 and 34 bits is not huge for software at all cause there is always alignment by 8 bits. BTW: I could imagine that working with Whole planet is different use case where you need to maintain all global indexes and so on. Of course, by taking extra work on OSM DB and OSM API, it should help a lot 3rd party apps which don't process whole planet. Best Regards, Victor On 25 Nov 2018 06:36:39 -0800, Paul Norman wrote > It would be terrible for most software that I am aware of that can > process the full planet. Current assumptions about density would be > broken, vastly inflating memory usage and slowing down processing. > > The benefits aren't great as I see them. Using a z-order curve encoded > in the first 30 bits will help cache locality, but like all z-order > curves, it doesn't guarantee that two nearby places in 2d space have > nearby places on the curve. This means that an implementation still > needs to be able to search through the nodes for nearby ones. > > Two other problems come to mind. The first of these is implementation. > IDs are a PostgreSQL bigserial, and to write something custom that > assigns IDs based on location would be difficult as it would need to get > MVCC right. The second is the number of bits. Some software is limited > to 53-59 bits, and other to 63 bits. We're using about 75% of 33 bits > right now. >
_______________________________________________ talk mailing list talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/talk