[OSM-dev] Distributed Data Store Follow-Up
Hi all, Earlier I posted about how my friend and I were creating a distributed data store for OSM data. We've finished our project and gotten the most difficult queries going. All of our code is freely available along with a report about our design and findings on or github wiki at http://wiki.github.com/tannewt/menzies. As it says in our report we were able to do bounding box and regular get queries faster than the production 0.5 OSM server. We, however, did not manage to get our own instance of the OSM api running on machines we had because of a number of planet import errors. Thus, we only have a rough idea of how well we do latency wise and no idea how the two solutions differ under varying loads. Please let us know what you think. We firmly believe that distributing the data over a number of computers is a far better solution than one single supercomputer. Thanks, Scott ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Distributed Data Store Follow-Up
Scott Shawcroft wrote: Please let us know what you think. We firmly believe that distributing the data over a number of computers is a far better solution than one single supercomputer. This conclusion (divide and conquer) is right for fetch. What was your update performance? Did you explore the performance of within queries? Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Distributed Data Store Follow-Up
Stefan, Our update performance shouldn't be too different. We simply send the update request to all the node machines. By within do you mean a bounding box query? Could you be more specific? Thanks, Scott Stefan de Konink wrote: Scott Shawcroft wrote: Please let us know what you think. We firmly believe that distributing the data over a number of computers is a far better solution than one single supercomputer. This conclusion (divide and conquer) is right for fetch. What was your update performance? Did you explore the performance of within queries? Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Distributed Data Store Follow-Up
Scott Shawcroft wrote: Our update performance shouldn't be too different. We simply send the update request to all the node machines. And your node machines do not cache their partition results? (Thus is a scan always required?) By within do you mean a bounding box query? Could you be more specific? For bbox you will have results for this: || | o-+--o || for within/touches you will have results for this: || o--++--o || Now the above example is trivial to support the interesting case is diagonal lines. This would allow perfect viewport calls. Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Distributed Data Store Follow-Up
Stefan de Konink wrote: Scott Shawcroft wrote: Our update performance shouldn't be too different. We simply send the update request to all the node machines. And your node machines do not cache their partition results? (Thus is a scan always required?) We don't do any caching ourselves but the underlying BerkeleyDB does. Therefore, we can update as we please. By within do you mean a bounding box query? Could you be more specific? For bbox you will have results for this: || | o-+--o || for within/touches you will have results for this: || o--++--o || Now the above example is trivial to support the interesting case is diagonal lines. This would allow perfect viewport calls. We don't do within. It is purely node based. I suppose a spacial way index could be built to do within queries though. Stefan ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev