Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Jochen Topf
On Fri, Oct 17, 2008 at 10:30:41AM +0200, Marcus Wolschon wrote: I am trying to create a binary-format for storing OSM-data. ( http://wiki.openstreetmap.org/index.php/User:MarcusWolschon%5Cosmbin_draft#nodes.obm ) I am looking for advise on how to create an on-disk index in one dimension

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe
On Fri, Oct 17, 2008 at 10:30:41AM +0200, Marcus Wolschon wrote: I am looking for advise on how to create an on-disk index in one dimension (element-id-offset where it is stored) and 2 dimensions (bounding-box-nodeIDs and boundingBox-intersecting bouding-boxes of ways). For the second point

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe
On Fri, Oct 17, 2008 at 11:30:07AM +0200, Marcus Wolschon wrote: Do you have any Idea how such an index can be constructed, given that * the indexed dataset is mutable * At no time can the complete index be loaded into memory (e.g. for rebuilding the index). Unfortunately not. I suppose once

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Freek
On Friday 17 October 2008, Sascha Silbe wrote: On Fri, Oct 17, 2008 at 11:30:07AM +0200, Marcus Wolschon wrote: Do you have any Idea how such an index can be constructed, given that * the indexed dataset is mutable See for example this follow-up paper on Hilbert R-trees:

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe
On Fri, Oct 17, 2008 at 02:31:26PM +0200, Freek wrote: See for example this follow-up paper on Hilbert R-trees: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.9180 Interesting paper. Thanks for the pointer! The whole point of R-trees is that they are efficient when stored in

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Freek
On Friday 17 October 2008, Sascha Silbe wrote: Actually it seems I've been limited in view by my own application. The problem I'd have with updating data isn't how to map data updates to tree updates (the topic covered by the paper mentioned above) but how to efficiently do inserts and deletes

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe
On Fri, Oct 17, 2008 at 03:25:59PM +0200, Freek wrote: Generally you keep a bit of free space in all nodes to accommodate a number of insertions and deletions, and only split or merge nodes when they overflow or underflow so that you don't need to move around all kinds of elements all the

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Freek
On Friday 17 October 2008, Sascha Silbe wrote: On Fri, Oct 17, 2008 at 03:25:59PM +0200, Freek wrote: Generally you keep a bit of free space in all nodes to accommodate a number of insertions and deletions, and only split or merge nodes when they overflow or underflow so that you don't need

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe
On Fri, Oct 17, 2008 at 05:20:52PM +0200, Freek wrote: But how do you do those splits, i.e. node inserts? Just pick a new block somewhere else where you have free space (for example at the end of your mmap()'ed file), then split the pointers to the child nodes evenly over the two blocks (old

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Freek
On Friday 17 October 2008, Sascha Silbe wrote: On Fri, Oct 17, 2008 at 05:20:52PM +0200, Freek wrote: But how do you do those splits, i.e. node inserts? Just pick a new block somewhere else where you have free space (for example at the end of your mmap()'ed file), then split the pointers

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Sascha Silbe
On Fri, Oct 17, 2008 at 06:10:51PM +0200, Freek wrote: OK, so you just use pointers (= index gets much larger) Ah, you're using the fact that your tree is a complete fan-out-8 tree (except for the last part)? Exactly. and often give up locality (even if you spread some free blocks across

Re: [OSM-dev] on-disk indexing of geodata

2008-10-17 Thread Freek
On Friday 17 October 2008, Sascha Silbe wrote: [constant bulk database with invalidation + dynamic database with pointer-based indices for updates] That might be quite a good idea in OSM practice, you just rebuild the bulk database every so often. Perhaps you can even just use an