Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-06 Thread Merlin Moncure

On 2/6/07, Mark Stosberg [EMAIL PROTECTED] wrote:

It's also notable that the units used are meters, not miles like
geo_distance(). That's what the magic number of 16093.44 is-- 10 miles
converted to meters.

When I benchmarked this query against the old geo_distance() variation,
it was about 200 times faster (~100ms vs .5ms).

However, my next step was to try a more real world query that involved
 a more complex where clause and a couple of table joins. So far, that
result is coming out /slower/ with the new approach, even though the
index is being used. I believe this may be cause of the additional
results found that are outside of the sphere, but inside the cube. This
causes additional rows that need processing in the joined tables.

Could someone post an example of how to further refine this so the
results more closely match what geo_distance returns() ?


I agree with bruno...the extra time is probably  not what you are
thinking...please post explain analyze results, etc.  However bruno's
ratio, while correct does not tell the whole story because you have to
recheck distance to every point in the returned set.

There is a small optimization you can make.  The query you wrote
automatically excludes points within a certain box.  you can also
include points in the set which is the largest box that fits in the
circle:

select * from zipcodes
where
earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,inner_radius) @ earth_coords
or
(
earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords
and
geo_dist...
);

you can also choose to omit the earth_coords column and calculate it
on the fly...there is no real performance hit for this but it does
make the sql a bit ugly.

merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-05 Thread Mark Stosberg
Merlin Moncure wrote:
 On 2/5/07, Mark Stosberg [EMAIL PROTECTED] wrote:
 Bruno Wolff III wrote:
  On Sat, Feb 03, 2007 at 14:00:26 -0500,
Mark Stosberg [EMAIL PROTECTED] wrote:
  I'm using geo_distance() from contrib/earthdistance would like to
 find a
  way to spend up the geo distance calculation if possible. This is
 for a
  proximity search: Show me adoptable pets within 250 miles of this
  zipcode.
 
  If you are using the cube based part of the earth distance package,
  then you can use gist indexes to speed those searches up.

 Thanks for the tip. Any idea what kind of improvement I can expect to
 see, compared to using geo_distance()?
 
 a lot. be aware that gist takes longer to build than btree, but very
 fast to search.  Index search and filter to box is basically an index
 lookup (fast!). for mostly static datasets that involve a lot of
 searching, gist is ideal.

The documentation in contrib/ didn't provide examples of how to create
or the index or actually a the proximity search. Here's what I figured
out to do:

I added a new column as type 'cube':

 ALTER table zipcodes add column earth_coords cube;

Next I converted the old lat/lon data I had stored in a 'point'
type to the new format:

-- Make to get lat/lon in the right order for your data model!
 UPDATE zipcodes set earth_coords = ll_to_earth( lon_lat[1], lon_lat[0] );

Now I added a GIST index on the field:

 CREATE index earth_coords_idx on zipcodes using gist (earth_coords);

Finally, I was able to run a query, which I could see used the index (by
checking EXPLAIN ANALYZE ...

   select * from zipcodes where earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords;

It's also notable that the units used are meters, not miles like
geo_distance(). That's what the magic number of 16093.44 is-- 10 miles
converted to meters.

When I benchmarked this query against the old geo_distance() variation,
it was about 200 times faster (~100ms vs .5ms).

However, my next step was to try a more real world query that involved
 a more complex where clause and a couple of table joins. So far, that
result is coming out /slower/ with the new approach, even though the
index is being used. I believe this may be cause of the additional
results found that are outside of the sphere, but inside the cube. This
causes additional rows that need processing in the joined tables.

Could someone post an example of how to further refine this so the
results more closely match what geo_distance returns() ?

Any other indexing or optimization tips would be appreciated.

   Mark

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-05 Thread Bruno Wolff III
On Mon, Feb 05, 2007 at 18:01:05 -0500,
  Mark Stosberg [EMAIL PROTECTED] wrote:
 
 It's also notable that the units used are meters, not miles like
 geo_distance(). That's what the magic number of 16093.44 is-- 10 miles
 converted to meters.

You can change the earth() function in earthdistance.sql before running it
to use some other unit other than meters:

-- earth() returns the radius of the earth in meters. This is the only
-- place you need to change things for the cube base distance functions
-- in order to use different units (or a better value for the Earth's radius).

CREATE OR REPLACE FUNCTION earth() RETURNS float8
LANGUAGE 'sql' IMMUTABLE
AS 'SELECT ''6378168''::float8';

 However, my next step was to try a more real world query that involved
  a more complex where clause and a couple of table joins. So far, that
 result is coming out /slower/ with the new approach, even though the
 index is being used. I believe this may be cause of the additional
 results found that are outside of the sphere, but inside the cube. This
 causes additional rows that need processing in the joined tables.

This is unlikely to be the cause. The ratio of the area of the cube to
the circle for small radii (compared to the radius of the earth, so that
we can consider thinsg flat) is 4/pi = 1.27 which shouldn't cause that
much of a change.
It might be that you are getting a bad plan. The guess on the selectivity
of the gist constraint may not be very good.
Some people here may be able to tell you more if you show us explain
analyze output.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster