Re: [Routing] New gosmore format

Nic Roets Mon, 17 Dec 2007 05:38:53 -0800

>
> could sqlite be a suitable format?
>
>
Databases added a lot of data to each record to fasilitate fast insertions
and deletions. If you multiply that with the billions of segments that will
be in the planet file by the end of next year, it becomes infeasible.


The new version of gosmore will compile the planet into 4 "arrays" or
"files" :
1. A UTF-8 text file for storing all way info each way will start with a
fixed size header followed by a number of strings :
a) way type (highway) integer,
b) oneway boolean,
c) centerpoint (latitude and longitude) of bbox in mercator space.
d) size of bbox
e) an empty line which will serve as an end of header marker,
f) name (1st variable length string)
g) other variable length strings for storing tags like ref=... or
amenity=...
h) end of way marker (empty line)

2. 20-byte Binary structures called "double half segments" (dhSegs) aka
"node parts" :
a) Latitude integer in mercator space,
b) Longitude integer in mercator space,
c) Pointer to way header i.e. offset into way file (1)
d) Pointer to next dhSegs (offset into this file) that forms part of this
way, or some invalid value if this is the first node in the way.
e) Pointer to previous dhSegs i.e. offset into this file, or some invalid
value if this is the first node in the way.

So gosmore will now be able to traverse ways as doubly linked lists.

All dhSegs with the same latitude and longitude are stored next to each
other, i.e. a node a number of consequtive dhSegs. For example a normal T
junction or 4way stop will be represented by 2 dhSegs. Routing will use this
together with knowledge of the ways to work.

When routing be between to arbitrary locations, we also need to find the
staring and ending segments. To do this (and rendrering) efficiently, nodes
are grouped into tiles of say a few km^2 each (simply mask lower bits out).

3. Since most tiles will be empty (unmapped or sea) and there are billions
of tiles, we group a number of tiles into buckets using a hash function
f(tile_lat, tile_lon)->bucket number. The hash function means all buckets
will have approximately the same number of hdSegs. Hipothetically we can now
find the largest bucket and reserve that amount of space for each bucket and
then we wouldn't need a 3rd array / file. But it's more efficient to remove
the unused space and have a table that tells us where each bucket begins and
ends in the hdSeg array / file (2).

4. This is an array of indexes into the way file (1). Each index points to a
variable length string. The indexes are sorted according to the strings they
point to.

Where 2 strings are identical the sorting falls back to comparing the center
latitude and longitude of the associated ways according to Z-order :  The
string with the way  that lies first on the Z-curve is first.

So if you want to find something very specific, like "Gert Potgieter
Street", it's a simple binary search. But if you want to find something very
general like "amenity=fuel", your binary search will depend on the current
lattitude and longitude the user is looking at :
1. Take the range [a,b] of all possible indexes and start with an empty
collection of results r.
2. If the range is empty, stop.
3. Choose a pivot p < b and p > a, ussually at the center.
4. If the string at c is less "amenity=fuel", set the range to [p,b] and
goto 2
5. If the string at c is more than "amenity=fuel", set the range to [a,p]
and goto 2
6. If we still need more results (collection too small), or the way at p is
closer to the current location than any member of r, add p to r (possibly
removing something from r)
7. Now that the strings are equal, compare the Z-value of current location
with Z-value of the way associated with p.
(a) If it's less
(i) Set the range to [a,p] and recurse.
(ii) Compute a bounding for [p,b]. For this purpose b can be the last point
on the Z-curve.
(iii) If we still need more results (collection too small), or any point in
the bounding box is closer to the current location than any member of r, set
the range to [p,b] and recurse.
(b) Just like (a)

So it's like a depth first search, except that 4,5,7(a)(iii) and 7(b)(iii)
helps to prune dead branches.

If a user wants to search for "amenity=fuel" near Timbaktu, he will first
have to search for Timbaktu, select it (to change the current location) and
then for "amenity=fuel".

I want to make the string comparison "weak", so that words like "Street" and
"North" are ignored. But care will be needed to make sure the string
comparison is still a proper ordering.

_______________________________________________
Routing mailing list
Routing@openstreetmap.org
http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/routing

Re: [Routing] New gosmore format

Reply via email to