Re: [OSM-dev] Spatial vs. multi-column indexes for points
On Fri, Sep 12, 2008 at 04:50:00PM +0200, Andreas Kalsch wrote: Subject: Re: [OSM-dev] Spatial vs. multi-column indexes for points Thanks! And this really outperforms a simple query like ... latitude between y1 and y2 and longitude between x1 and x2 ? This would either need a combined multi-column index or be expensive because 2 seperated indexes and a merge of the results which the database will probably do as a full_seq_scan. I will use 2^24 tiles as length. It looks like tiles_for_area goes through all tiles which are in the bounding box. So if I have an area with e.g. 5x5 kilometers, it will loop through 2000x2000 = 4 Million tiles. And so the SQL for this are will be quite big, won't it? IIRC it uses the top most 16 bit of lat and lon and shifts them together in an int32. As lat lon bits get shifted in round-robin you have most likely a very good distribution and a very efficient index on lat and lon. For the query bbox=8.3426546875,51.88222734375,8.4305453125,51.92617265625 it results in this: tile BETWEEN 3496212715 AND 3496212719 OR tile BETWEEN 3496212728 AND 3496212735 OR tile BETWEEN 3496212904 AND 3496212911 OR tile BETWEEN 3496212920 AND 3496212927 OR tile BETWEEN 3496212968 AND 3496212971 OR tile BETWEEN 3496213059 AND 3496213063 OR tile BETWEEN 3496213067 AND 3496213087 OR tile BETWEEN 3496213091 AND 3496213095 OR tile BETWEEN 3496213099 AND 3496213119 OR tile BETWEEN 3496213187 AND 3496213191 OR tile BETWEEN 3496213195 AND 3496213215 OR tile BETWEEN 3496213219 AND 3496213223 OR tile BETWEEN 3496213232 AND 3496213241 OR tile BETWEEN 3496213248 AND 3496213315 OR tile BETWEEN 3496213320 AND 3496213323 OR tile BETWEEN 3496213344 AND 3496213347 OR tile BETWEEN 3496213352 AND 3496213355 OR tile BETWEEN 3496213376 AND 3496213417 OR tile BETWEEN 3496213424 AND 3496213433 OR tile BETWEEN 3496213440 AND 3496213443 OR tile BETWEEN 3496213448 AND 3496213451 OR tile BETWEEN 3496213472 AND 3496213475 OR tile IN (3496212713,3496213057,3496213065,3496213089, 3496213097,3496213185,3496213193,3496213217,3496213225,3496213228, 3496213229,3496213244,3496213245,3496213420,3496213421,3496213436, 3496213437,3496213480,3496213481) Anyway - postgres/postgis is faster than mysql and i prefer postgres anyway so i'd not have to deal with this. Flo -- Florian Lohoff [EMAIL PROTECTED] +49-171-2280134 Those who would give up a little freedom to get a little security shall soon have neither - Benjamin Franklin signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
On Thu, Sep 11, 2008 at 08:31:23PM +0200, Andreas Kalsch wrote: All, thanks for your quick responses! Quad tiles look like a smart way to create an index. So to lookup a single point or a quad tile, this is fine. But for my application I need another lookup - by bounding box with any ratio and size. Is there a way to look up a special bounding box with this index? I think this will be a little more complicated without conventional multi-column indices. Hmm, I think you could take a maximum number of quad lookups which contain the requested box, what do you think? The code in the api calculates all possible quadtile areas and does a quadtile in ( x, y, z, a, b, c, d, e, f ) or quadtile between x and y etc Can you give me the position of this code in SVN, I am sure will understand it more deeply then. -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
My current optimization includes: - using mediumint for lat/lon - enough for ~2 meters resolution - using a bounding box first for point+radius calculation and then selecting the circle with a Pythagoras approximation, which is exact enough For the first, I want to use MySQL. If you are interested in some benchmark data, I can post ... As an aside, there is currently a lack of good benchmarks on the different optimizations hacks for spatial querying, in different circumstances, so anything contributing towards guidelines for developers would be most helpful OK, some things I have quickly writte down while testing, made on: - MySQL 5.0.67 - MacBook Intel Core 2 Duo - 2 GHz - Mac OS 10.4 Time in secs There are several ways to get the points within a circle ... - I precompute some values which will be constant during the query - this is quicker - I use mediumints for lat/lon 1) correct, bust most expensive: SET @latitude=48; SET @longitude=13; SET @latitudeM = getMed(@latitude); /* MySQL function */ SET @longitudeM = getMed(@longitude); SET @latitudeSin=sin(radians(@latitude)); SET @latitudeCos=cos(radians(@latitude)); SELECT BENCHMARK( 100, degrees(acos( @latitudeSin*sin(radians(230)) + @latitudeCos*cos(radians(230))*cos(radians([EMAIL PROTECTED])) ))); .62 secs 2) approximation - really good results for radius up to 10-20°: SET @longitudeFactor=cos(radians(@latitude)); SELECT BENCHMARK( 100, sqrt(pow(@latitudeM - 230, 2) + pow(@longitudeFactor * (@longitudeM - 25), 2)) ); .40 secs 3) OK, sqrt can be omitted in the resulting query SELECT BENCHMARK( 100, pow(@latitudeM - 230, 2) + pow(@longitudeFactor * (@longitudeM - 25), 2) ); .36 secs If there are more people interested in concrete computation based on 500.000 rows I can post them, too. Andi -- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
On Fri, Sep 12, 2008 at 08:20:18AM +0200, Andreas Kalsch wrote: Subject: Re: [OSM-dev] Spatial vs. multi-column indexes for points On Thu, Sep 11, 2008 at 08:31:23PM +0200, Andreas Kalsch wrote: All, thanks for your quick responses! Quad tiles look like a smart way to create an index. So to lookup a single point or a quad tile, this is fine. But for my application I need another lookup - by bounding box with any ratio and size. Is there a way to look up a special bounding box with this index? I think this will be a little more complicated without conventional multi-column indices. Hmm, I think you could take a maximum number of quad lookups which contain the requested box, what do you think? The code in the api calculates all possible quadtile areas and does a quadtile in ( x, y, z, a, b, c, d, e, f ) or quadtile between x and y etc Can you give me the position of this code in SVN, I am sure will understand it more deeply then. applications/utils/export/osm2ai/osm2ai.pl Is a perl variant - look for sql_for_area In the rails_ports its done with a C helper - look in sites/rails_port/lib/quad_tile/ Flo -- Florian Lohoff [EMAIL PROTECTED] +49-171-2280134 Those who would give up a little freedom to get a little security shall soon have neither - Benjamin Franklin signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
On Thu, Sep 11, 2008 at 01:49:55PM +0200, Andreas Kalsch wrote: Has anybody used GIS successfully in MySQL or PGSQL and can tell me how the performance compares between the two techniques? At least in PostgreSQL/PostGIS, the geometric indices are WAY faster. Just remember to either use the ST_* family of predicates (implicit bounding box scan) or add an term to do an explicit bounding box scan so the index is actually used. CU Sascha -- http://sascha.silbe.org/ http://www.infra-silbe.de/ signature.asc Description: Digital signature ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
All, thanks for your quick responses! Quad tiles look like a smart way to create an index. So to lookup a single point or a quad tile, this is fine. But for my application I need another lookup - by bounding box with any ratio and size. Is there a way to look up a special bounding box with this index? I think this will be a little more complicated without conventional multi-column indices. Hmm, I think you could take a maximum number of quad lookups which contain the requested box, what do you think? My current optimization includes: - using mediumint for lat/lon - enough for ~2 meters resolution - using a bounding box first for point+radius calculation and then selecting the circle with a Pythagoras approximation, which is exact enough For the first, I want to use MySQL. If you are interested in some benchmark data, I can post ... Instead of indexing by precise coordinates you could index by virtual tiles as it is done in OSM's main DB since a year ago with nice performance boost: http://wiki.openstreetmap.org/index.php/QuadTiles good luck, Stefan On Thu, Sep 11, 2008 at 1:49 PM, Andreas Kalsch [EMAIL PROTECTED] wrote: Hey, last week I made some experiments with huge datasets of lat/lon points. I use MySQL 5.0, which partially support GIS extensions, including R-trees. But it is still not able to make queries based on the GIS features, so I have to use the normal way - multi-column indexes on lat/lon columns. It works well but probably there is a way to make it even quicker ;) Has anybody used GIS successfully in MySQL or PGSQL and can tell me how the performance compares between the two techniques? Thanks, Andi -- Pt! Schon das coole Video vom GMX MultiMessenger gesehen? Der Eine für Alle: http://www.gmx.net/de/go/messenger03 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev -- Pt! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
On Thu, Sep 11, 2008 at 1:31 PM, Andreas Kalsch [EMAIL PROTECTED]wrote: All, thanks for your quick responses! Quad tiles look like a smart way to create an index. So to lookup a single point or a quad tile, this is fine. But for my application I need another lookup - by bounding box with any ratio and size. Is there a way to look up a special bounding box with this index? Take a look at the MySQL 5 documentation here: http://dev.mysql.com/doc/refman/5.0/en/relations-on-geometry-mbr.html When I was working on a read-only mirror of the OSM API, I created a LINESTRING with the top left and bottom right as the end points. This created a bounding box in MySQL that I could then use in one of the MBR functions on that page. Check out http://svn.yellowbkpk.com/geo/trunk/bbox.php to see the SQL queries I was using. They aren't very good, but they'll show you how to use the MBR stuff in MySQL. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Spatial vs. multi-column indexes for points
My current optimization includes: - using mediumint for lat/lon - enough for ~2 meters resolution - using a bounding box first for point+radius calculation and then selecting the circle with a Pythagoras approximation, which is exact enough For the first, I want to use MySQL. If you are interested in some benchmark data, I can post ... As an aside, there is currently a lack of good benchmarks on the different optimizations hacks for spatial querying, in different circumstances, so anything contributing towards guidelines for developers would be most helpful ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev