Hi Lars, I'm just a user, but we have handled some pretty big datasets so I can understand your problem. I might be misunderstanding, but to confirm you are asking the system to cross join all of the possible points (something like 63483 * 4 * 542 * 4), use st_distance to convert them to geography and calculate the distance between every combination of these points, store this in an unindexed in memory table and and then sort it by a different but as yet uncalculated value (using the knn operator) so that you can find the largest value?
I wonder if the search strategy could be refined. a) The Explain output suggests to me that most of the time in the query is calculating all the distances in the join table (and I'm guessing calculations). Having done this, why then invoke the knn approach to the same points to find the distances to sort, when you have already calculated this in the first join operation. It may be more efficient to sort by the result of the st_distance function. b) rather than force the calculation of distance between all the points geographically, find the furthest apart polygons using their inherent geometry, then calculated the exact distance for only the two points. I'd play around a bit, but the srid you are using covers a small area, so differences between using spheroid and not should not create massive errors. cheers Ben On Wed, 22 Jun 2022 at 22:22, Lars Aksel Opsahl <[email protected]> wrote: > Hi > > I have two simple tables. > > Table "pg_temp_30.g1temp" > Column | Type | Collation | Nullable | > Default > > --------+------------------------+-----------+----------+------------------------------------ > geo | geometry(Polygon,4258) | | | > id | integer | | not null | nextval( > 'g1temp_id_seq'::regclass) > Indexes: > "g1temp_pkey" PRIMARY KEY, btree (id) > "g1temp_geo_idx" gist (geo) > > > Table "pg_temp_30.g2temp" > Column | Type | Collation | Nullable | > Default > > --------+------------------------+-----------+----------+------------------------------------ > geo | geometry(Polygon,4258) | | | > id | integer | | not null | nextval( > 'g2temp_id_seq'::regclass) > Indexes: > "g2temp_pkey" PRIMARY KEY, btree (id) > "g2temp_geo_idx" gist (geo) > > > And I need to find the polygon in g2temp that has the longest distance to > any polygon in d1temp. > g1temp has 542 rows and table g2temp has 63483 rows. All polygons are > simple rectangles. > > I run analyze > analyze g1temp; > analyze g2temp; > > > And then run this query > > EXPLAIN ANALYZE > WITH index_query AS ( > SELECT g2.geo , ST_Distance(g1.geo, g2.geo,true) AS d, 1 as runnning_jos > FROM > g1temp AS g1, > g2temp AS g2 > ORDER BY g1.geo <-> g2.geo desc > LIMIT 1 > ) > SELECT * > FROM index_query; > > And I get this > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------------------------------- > Subquery Scan on index_query (cost=43613127.36..43613153.64 rows=1 > width=132) (actual time=69029.773..69029.775 rows=1 loops=1) > -> Limit (cost=43613127.36..43613153.63 rows=1 width=140) (actual > time=69029.771..69029.772 rows=1 loops=1) > -> Result (cost=43613127.36..947419646.12 rows=34407786 > width=140) (actual time=69029.769..69029.770 rows=1 loops=1) > -> Sort (cost=43613127.36..43699146.83 rows=34407786 > width=292) (actual time=69019.575..69019.576 rows=1 loops=1) > Sort Key: ((g1.geo <-> g2.geo)) DESC > Sort Method: top-N heapsort Memory: 29kB > -> Nested Loop (cost=0.00..43441088.43 > rows=34407786 width=292) (actual time=0.184..58446.116 rows=34407786 > loops=1) > -> Seq Scan on g2temp g2 (cost=0.00..1245.33 > rows=63483 width=120) (actual time=0.020..31.744 rows=63483 loops=1) > -> Materialize (cost=0.00..14.63 rows=542 > width=160) (actual time=0.000..0.030 rows=542 loops=63483) > -> Seq Scan on g1temp g1 (cost=0.00..11.92 > rows=542 width=160) (actual time=0.014..0.202 rows=542 loops=1) > Planning Time: 0.320 ms > Execution Time: 69029.872 ms > (12 rows) > > I do understand that this may take time, but I wonder why I see this "Seq > scan" both on g1 and g2 ? > > From what I read on https://postgis.net/docs/geometry_distance_knn.html it > should use indexes in some way. > > Have also tested with casting geography when creating table g1 and g2 and > that did not make any difference. > > Tested also with big workmem (1000 MB ) and the result was the same. > > I am running on POSTGIS="3.3.0dev 3.1.0alpha2-1532-gc8eedf3ae" > [EXTENSION] PGSQL="120" GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" > LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 > (Internal)" TOPOLOGY > (1 row) > > Thanks. > > Lars > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users > -- [image: Ausvet Logo] <https://www.ausvet.com.au/> Dr Ben Madin BVMS MVPHMgmt PhD MANZCVS GAICD Managing Director Mobile: +61 448 887 220 <+61448887220> E-mail: [email protected] Website: www.ausvet.com.au Skype: benmadin Address: 5 Shuffrey Street Fremantle, WA 6160 Australia
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
