Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)
On 2/6/07, Mark Stosberg [EMAIL PROTECTED] wrote: It's also notable that the units used are meters, not miles like geo_distance(). That's what the magic number of 16093.44 is-- 10 miles converted to meters. When I benchmarked this query against the old geo_distance() variation, it was about 200 times faster (~100ms vs .5ms). However, my next step was to try a more real world query that involved a more complex where clause and a couple of table joins. So far, that result is coming out /slower/ with the new approach, even though the index is being used. I believe this may be cause of the additional results found that are outside of the sphere, but inside the cube. This causes additional rows that need processing in the joined tables. Could someone post an example of how to further refine this so the results more closely match what geo_distance returns() ? I agree with bruno...the extra time is probably not what you are thinking...please post explain analyze results, etc. However bruno's ratio, while correct does not tell the whole story because you have to recheck distance to every point in the returned set. There is a small optimization you can make. The query you wrote automatically excludes points within a certain box. you can also include points in the set which is the largest box that fits in the circle: select * from zipcodes where earth_box('(436198.322855334, 4878562.8732218, 4085386.43843934)'::cube,inner_radius) @ earth_coords or ( earth_box('(436198.322855334, 4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords and geo_dist... ); you can also choose to omit the earth_coords column and calculate it on the fly...there is no real performance hit for this but it does make the sql a bit ugly. merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)
Merlin Moncure wrote: On 2/5/07, Mark Stosberg [EMAIL PROTECTED] wrote: Bruno Wolff III wrote: On Sat, Feb 03, 2007 at 14:00:26 -0500, Mark Stosberg [EMAIL PROTECTED] wrote: I'm using geo_distance() from contrib/earthdistance would like to find a way to spend up the geo distance calculation if possible. This is for a proximity search: Show me adoptable pets within 250 miles of this zipcode. If you are using the cube based part of the earth distance package, then you can use gist indexes to speed those searches up. Thanks for the tip. Any idea what kind of improvement I can expect to see, compared to using geo_distance()? a lot. be aware that gist takes longer to build than btree, but very fast to search. Index search and filter to box is basically an index lookup (fast!). for mostly static datasets that involve a lot of searching, gist is ideal. The documentation in contrib/ didn't provide examples of how to create or the index or actually a the proximity search. Here's what I figured out to do: I added a new column as type 'cube': ALTER table zipcodes add column earth_coords cube; Next I converted the old lat/lon data I had stored in a 'point' type to the new format: -- Make to get lat/lon in the right order for your data model! UPDATE zipcodes set earth_coords = ll_to_earth( lon_lat[1], lon_lat[0] ); Now I added a GIST index on the field: CREATE index earth_coords_idx on zipcodes using gist (earth_coords); Finally, I was able to run a query, which I could see used the index (by checking EXPLAIN ANALYZE ... select * from zipcodes where earth_box('(436198.322855334, 4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords; It's also notable that the units used are meters, not miles like geo_distance(). That's what the magic number of 16093.44 is-- 10 miles converted to meters. When I benchmarked this query against the old geo_distance() variation, it was about 200 times faster (~100ms vs .5ms). However, my next step was to try a more real world query that involved a more complex where clause and a couple of table joins. So far, that result is coming out /slower/ with the new approach, even though the index is being used. I believe this may be cause of the additional results found that are outside of the sphere, but inside the cube. This causes additional rows that need processing in the joined tables. Could someone post an example of how to further refine this so the results more closely match what geo_distance returns() ? Any other indexing or optimization tips would be appreciated. Mark ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)
On Mon, Feb 05, 2007 at 18:01:05 -0500, Mark Stosberg [EMAIL PROTECTED] wrote: It's also notable that the units used are meters, not miles like geo_distance(). That's what the magic number of 16093.44 is-- 10 miles converted to meters. You can change the earth() function in earthdistance.sql before running it to use some other unit other than meters: -- earth() returns the radius of the earth in meters. This is the only -- place you need to change things for the cube base distance functions -- in order to use different units (or a better value for the Earth's radius). CREATE OR REPLACE FUNCTION earth() RETURNS float8 LANGUAGE 'sql' IMMUTABLE AS 'SELECT ''6378168''::float8'; However, my next step was to try a more real world query that involved a more complex where clause and a couple of table joins. So far, that result is coming out /slower/ with the new approach, even though the index is being used. I believe this may be cause of the additional results found that are outside of the sphere, but inside the cube. This causes additional rows that need processing in the joined tables. This is unlikely to be the cause. The ratio of the area of the cube to the circle for small radii (compared to the radius of the earth, so that we can consider thinsg flat) is 4/pi = 1.27 which shouldn't cause that much of a change. It might be that you are getting a bad plan. The guess on the selectivity of the gist constraint may not be very good. Some people here may be able to tell you more if you show us explain analyze output. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster