[OSM-dev] Distributed Data Store Follow-Up

2009-03-24 Thread Scott Shawcroft
Hi all,
Earlier I posted about how my friend and I were creating a distributed 
data store for OSM data.  We've finished our project and gotten the most 
difficult queries going.  All of our code is freely available along with 
a report about our design and findings on or github wiki at 
http://wiki.github.com/tannewt/menzies.

As it says in our report we were able to do bounding box and regular get 
queries faster than the production 0.5 OSM server.  We, however, did not 
manage to get our own instance of the OSM api running on machines we had 
because of a number of planet import errors.  Thus, we only have a rough 
idea of how well we do latency wise and no idea how the two solutions 
differ under varying loads.

Please let us know what you think.  We firmly believe that distributing 
the data over a number of computers is a far better solution than one 
single supercomputer.

Thanks,
Scott

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Distributed Data Store Follow-Up

2009-03-24 Thread Stefan de Konink
Scott Shawcroft wrote:
 Please let us know what you think.  We firmly believe that distributing 
 the data over a number of computers is a far better solution than one 
 single supercomputer.

This conclusion (divide and conquer) is right for fetch. What was your 
update performance?

Did you explore the performance of within queries?


Stefan

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Distributed Data Store Follow-Up

2009-03-24 Thread Scott Shawcroft
Stefan,
Our update performance shouldn't be too different.  We simply send the 
update request to all the node machines.

By within do you mean a bounding box query?  Could you be more specific?
Thanks,
Scott

Stefan de Konink wrote:
 Scott Shawcroft wrote:
 Please let us know what you think.  We firmly believe that 
 distributing the data over a number of computers is a far better 
 solution than one single supercomputer.

 This conclusion (divide and conquer) is right for fetch. What was your 
 update performance?

 Did you explore the performance of within queries?


 Stefan


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Distributed Data Store Follow-Up

2009-03-24 Thread Stefan de Konink
Scott Shawcroft wrote:
 Our update performance shouldn't be too different.  We simply send the 
 update request to all the node machines.

And your node machines do not cache their partition results? (Thus is a 
scan always required?)

 By within do you mean a bounding box query?  Could you be more specific?

For bbox you will have results for this:
  
||
|  o-+--o
||

for within/touches you will have results for this:
 
||
o--++--o
||

Now the above example is trivial to support the interesting case is 
diagonal lines. This would allow perfect viewport calls.


Stefan

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Distributed Data Store Follow-Up

2009-03-24 Thread Scott Shawcroft
Stefan de Konink wrote:
 Scott Shawcroft wrote:
 Our update performance shouldn't be too different.  We simply send 
 the update request to all the node machines.

 And your node machines do not cache their partition results? (Thus is 
 a scan always required?)
We don't do any caching ourselves but the underlying BerkeleyDB does.  
Therefore, we can update as we please.


 By within do you mean a bounding box query?  Could you be more specific?

 For bbox you will have results for this:
  
 ||
 |  o-+--o
 ||

 for within/touches you will have results for this:
 
||
 o--++--o
||

 Now the above example is trivial to support the interesting case is 
 diagonal lines. This would allow perfect viewport calls.
We don't do within.  It is purely node based.  I suppose a spacial way 
index could be built to do within queries though.


 Stefan


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev