Re: [postgis-users] ST_Distance_Sphere too slow

Stephen Woodbridge Thu, 24 Mar 2011 21:46:04 -0700

On 3/24/2011 7:35 PM, Julian Perelli wrote:

2011/3/24 Stephen Woodbridge<[email protected]>:

On 3/24/2011 6:14 PM, Julian Perelli wrote:


Thanks!

I don't know if st_expand is faster than st_distance_sphere but with
EXPLAIN I saw 2 seq scan, and with your approach I see now one seq
scan, and one index scan, so I suppose your approach is much better,
and I think that it couldn't be better.

Could you explain me why that happens? why now is a index scan where
before was a seq scan?


Spatial indexes only work on the bbox of the geometry and the&&  is try if
the two geometry bboxes interact with one another.

So if you hav geometry A and B then A&&  B compares their bboxes. But in
your case you want to find any two geometries that might be as far apart as
300 meters. If A and B are 300 meters apart then their bboxes will not
interact and you will never be able to select them for computing the
distance. But if we expand the bbox of one of the geometries by 300 meters,
then it will interact with the bbox in question and we will check the
distance. If the distance is greater then 300m we will throw it out anyway.


Ok, I understand now.&&  works with bboxes, and st_expand doesn't have
almost any cost.


Right, st_expand only grows the bbox size, so no cost in doing that.


So&&  is a fast select using indexes and bboxes.
The distance is not performed on all the pairs that are obviously too far
apart, and since this is an expense computation this is good.

Performance is based more on the false index matches that get rejected by
the more costly distance calculation. If there is not a high percentage of
these then index searchs are extremely fast.>
-Steve


Unfortunately, I have too many overlapping paths, but you are giving
me here an idea:

I can first see what paths cross each other, and that (I suppose) Is
faster than the distance calculation. Then with the rest of the
results I can take the distance. (I don't know if the internals on
ST_distance do that already, then my idea is useless :P) I'll try it,
post the code and say how it goes.

No this will not work, intersects is more costly than distance. Are youpaths, straight line segments or linstrings with multiple vertices? Areyou paths fairly random or mostly parallel at a mostly common angle?

45 degrees lines have a large bbox, but horizontal and vertical lineshave small bboxes. If you rotate your geometry so most of the paths arehorizontal or vertical and save it in a temp table and index it, itwould then probably run much faster even given building the temp table.


-Steve

Thanks!

I made a subset table (only 10 paths!!) to test.
with your SQL it takes 2,8 sec
with mine it takes 5,0 sec to complete.

with a subset of 100 paths I let it for an hour and it doesn't finished.
the problem is that it's almost unacceptable, with 1000 paths it will
take eternity!
(I think it takes exponential time as the subset grows up because of
the self join)

maybe is there another way to do this faster... I dont know...

I want to run this just once, to make a table with this temporary
results for a larger query in my application, so if it takes 12 hours
to complete this operation, is ok.

2011/3/24 Stephen Woodbridge<[email protected]>:


I would make sure there is an gist index on path and use a query like:

  select
    t1.id,
    t2.id
  from
    table t1
      inner join table t2
        on (t1.path&&    st_expand(t2.path, 300/111120) and t1.id!=t2.id and
ST_Distance_Sphere(t1.path, t2.path)<    300);

the 300/111120 is to convert 300 meters into approximately degrees. If
you
are worried about some near misses you can expand it a little more and
distance_sphere will filter extras out of the results.

-Steve W

On 3/24/2011 3:43 PM, Julian Perelli wrote:


Hello Postgis List!

I'm trying to get the pair of paths that crosses each other or are 300
meters or less distant from each other, making this SQL.

   select
     t1.id,
     t2.id
   from
     table t1
       inner join table t2
         on (t1.id!=t2.id and ST_Distance_Sphere(t1.path, t2.path)<
  300);

it was 14 hours running and it doesn't finish... I have 1200+ rows in
the table, each path has between 100 and 500 points.

I tried to make an index on the path column, but when I use explain on
the query, it seems that pg doesn't use the index.

should I increase the memory assigned to pgsql?

I don't know where to begin, what to do, to make this query faster.
Maybe I have an error and it just hangs up.
It would be nice to know how to debug the query.. to see it running or
something like that. EXPLAIN helps, but not too much.

Regards!
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users


_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] ST_Distance_Sphere too slow

Reply via email to