> > could sqlite be a suitable format? > > Databases added a lot of data to each record to fasilitate fast insertions and deletions. If you multiply that with the billions of segments that will be in the planet file by the end of next year, it becomes infeasible.
The new version of gosmore will compile the planet into 4 "arrays" or "files" : 1. A UTF-8 text file for storing all way info each way will start with a fixed size header followed by a number of strings : a) way type (highway) integer, b) oneway boolean, c) centerpoint (latitude and longitude) of bbox in mercator space. d) size of bbox e) an empty line which will serve as an end of header marker, f) name (1st variable length string) g) other variable length strings for storing tags like ref=... or amenity=... h) end of way marker (empty line) 2. 20-byte Binary structures called "double half segments" (dhSegs) aka "node parts" : a) Latitude integer in mercator space, b) Longitude integer in mercator space, c) Pointer to way header i.e. offset into way file (1) d) Pointer to next dhSegs (offset into this file) that forms part of this way, or some invalid value if this is the first node in the way. e) Pointer to previous dhSegs i.e. offset into this file, or some invalid value if this is the first node in the way. So gosmore will now be able to traverse ways as doubly linked lists. All dhSegs with the same latitude and longitude are stored next to each other, i.e. a node a number of consequtive dhSegs. For example a normal T junction or 4way stop will be represented by 2 dhSegs. Routing will use this together with knowledge of the ways to work. When routing be between to arbitrary locations, we also need to find the staring and ending segments. To do this (and rendrering) efficiently, nodes are grouped into tiles of say a few km^2 each (simply mask lower bits out). 3. Since most tiles will be empty (unmapped or sea) and there are billions of tiles, we group a number of tiles into buckets using a hash function f(tile_lat, tile_lon)->bucket number. The hash function means all buckets will have approximately the same number of hdSegs. Hipothetically we can now find the largest bucket and reserve that amount of space for each bucket and then we wouldn't need a 3rd array / file. But it's more efficient to remove the unused space and have a table that tells us where each bucket begins and ends in the hdSeg array / file (2). 4. This is an array of indexes into the way file (1). Each index points to a variable length string. The indexes are sorted according to the strings they point to. Where 2 strings are identical the sorting falls back to comparing the center latitude and longitude of the associated ways according to Z-order : The string with the way that lies first on the Z-curve is first. So if you want to find something very specific, like "Gert Potgieter Street", it's a simple binary search. But if you want to find something very general like "amenity=fuel", your binary search will depend on the current lattitude and longitude the user is looking at : 1. Take the range [a,b] of all possible indexes and start with an empty collection of results r. 2. If the range is empty, stop. 3. Choose a pivot p < b and p > a, ussually at the center. 4. If the string at c is less "amenity=fuel", set the range to [p,b] and goto 2 5. If the string at c is more than "amenity=fuel", set the range to [a,p] and goto 2 6. If we still need more results (collection too small), or the way at p is closer to the current location than any member of r, add p to r (possibly removing something from r) 7. Now that the strings are equal, compare the Z-value of current location with Z-value of the way associated with p. (a) If it's less (i) Set the range to [a,p] and recurse. (ii) Compute a bounding for [p,b]. For this purpose b can be the last point on the Z-curve. (iii) If we still need more results (collection too small), or any point in the bounding box is closer to the current location than any member of r, set the range to [p,b] and recurse. (b) Just like (a) So it's like a depth first search, except that 4,5,7(a)(iii) and 7(b)(iii) helps to prune dead branches. If a user wants to search for "amenity=fuel" near Timbaktu, he will first have to search for Timbaktu, select it (to change the current location) and then for "amenity=fuel". I want to make the string comparison "weak", so that words like "Street" and "North" are ignored. But care will be needed to make sure the string comparison is still a proper ordering.
_______________________________________________ Routing mailing list Routing@openstreetmap.org http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/routing