I didn’t realize you are building the geography from longitude / latitude of your building.
Is there a reason you don’t have a geography column in your building table, instead of building it from scratch each time? Right now you query can’t use a spatial index at all. If you had the geography in your building table: ALTER TABLE building ADD geog geography(POINT, 4326); UPDATE building SET geog = ST_Point(longitude, latitude)::geography; CREATE INDEX ix_building_geog ON building USING gist(geog); SELECT c.* FROM buildings AS b INNER JOIN c ON b.id = c.building_id WHERE ST_DWithin(b.geog, st_makepoint(-96.7804060, 33.2471770)::geography, 50000); Then that should use a spatial index and be much faster Now why one of your queries is running at 5000 ms vs. 35 ms, since I can’t see the TEST EXPRESSIOn you are using, I’m not sure if those are equivalent. I assume the test expression is true. If for some reason, you can’t have a geography column in your building table, you could try to force the planners behavior with a subselect AS FROM (SELECT *, ST_MakePoint(longitude, latitude) AS geog FROM building ) AS b INNER JOIN customer AS c ON b.id = c.building_id Also make sure you have a primary key on your building.id column and an index on your customer.building_id. Your plans don’t appear to be using a building_id index either, which might be the correct thing to do, or could be cause you don’t have an index on those columns. From: Cameron McCloud <[email protected]> Sent: Tuesday, November 28, 2023 9:45 AM To: Regina Obe <[email protected]> Subject: Re: [postgis-users] ST_DWithin slow if query is joined to another table, but fast if ST_DWithin cast to an integer This is what I get from that query: IMMUTABLE PARALLEL SAFE STRICT COST 10000 SUPPORT postgis_index_supportfn Putting the ST_DWithin into the join gives the same result. On Mon, Nov 27, 2023 at 5:15 PM Regina Obe <[email protected] <mailto:[email protected]> > wrote: I’m guessing it has to do with the costing we have on ST_DWithin. I recall we increased that back recently because of some complaints but can’t recall the version. What does your costing read from ST_DWithin(geography, geography, …) read? Run this query to see the code: SELECT pg_get_functiondef(oid) FROM pg_proc WHERE proname = 'st_dwithin' AND prosrc = 'geography_dwithin'; Mine reads COST 5000 IMMUTABLE STRICT PARALLEL SAFE SUPPORT postgis_index_supportf Also what happens if you switch your query to put the ST_Dwithin in the JOIN clause instead of the WHERE? from building b join customer c ON (b.id <http://b.id> = c.building_id AND st_dwithin(st_makepoint(b.longitude, b.latitude)::geography, st_makepoint(-96.7804060, 33.2471770)::geography, 50000) ); From: postgis-users <[email protected] <mailto:[email protected]> > On Behalf Of Cameron McCloud via postgis-users Sent: Monday, November 27, 2023 8:32 AM To: PostGIS Users Discussion <[email protected] <mailto:[email protected]> > Cc: Cameron McCloud <[email protected] <mailto:[email protected]> > Subject: Re: [postgis-users] ST_DWithin slow if query is joined to another table, but fast if ST_DWithin cast to an integer Hi, Just pinging this again to see if anyone has an idea of what's going on. >From the EXPLAIN output of the second query in the DBFiddle example it does seem that there is a cross-join going on: Rows Removed by Join Filter: 101705643 , even though there is an "ON" clause in the join. Cam. On Thu, Nov 9, 2023 at 10:27 AM Cameron McCloud <[email protected] <mailto:[email protected]> > wrote: https://dbfiddle.uk/qJDW-DjP On Thu, Nov 9, 2023 at 9:11 AM Cameron McCloud <[email protected] <mailto:[email protected]> > wrote: Here's the EXPLAIN for the second query with the join and WHERE on ST_DWithin: Nested Loop (cost=0.00..403093.00 rows=2 width=20) Join Filter: (b.id <http://b.id> = c.building_id) -> Seq Scan on customer c (cost=0.00..231.00 rows=16000 width=4) -> Materialize (cost=0.00..402382.01 rows=2 width=20) -> Seq Scan on building b (cost=0.00..402382.00 rows=2 width=20) Filter: st_dwithin((st_makepoint((longitude)::double precision, (latitude)::double precision))::geography, '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography, '50000'::double precision, true) JIT: Functions: 7 Options: Inlining false, Optimization false, Expressions true, Deforming true Here's the EXPLAIN ANALYZE: Nested Loop (cost=0.00..403093.00 rows=2 width=20) (actual time=29.391..5196.713 rows=6453 loops=1) Join Filter: (b.id <http://b.id> = c.building_id) Rows Removed by Join Filter: 82424169 -> Seq Scan on customer c (cost=0.00..231.00 rows=16000 width=4) (actual time=0.004..2.484 rows=16000 loops=1) -> Materialize (cost=0.00..402382.01 rows=2 width=20) (actual time=0.000..0.149 rows=5152 loops=16000) -> Seq Scan on building b (cost=0.00..402382.00 rows=2 width=20) (actual time=3.292..28.549 rows=6453 loops=1) Filter: st_dwithin((st_makepoint((longitude)::double precision, (latitude)::double precision))::geography, '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography, '50000'::double precision, true) Rows Removed by Filter: 9547 Planning Time: 0.182 ms JIT: Functions: 7 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.256 ms, Inlining 0.000 ms, Optimization 0.158 ms, Emission 3.133 ms, Total 3.547 ms Execution Time: 5197.586 ms Here's the EXPLAIN for the 3rd query (JOIN but convert ST_DWithin to integer) Hash Join (cost=431.00..402854.10 rows=80 width=20) Hash Cond: (b.id <http://b.id> = c.building_id) -> Seq Scan on building b (cost=0.00..402422.00 rows=80 width=20) Filter: (CASE st_dwithin((st_makepoint((longitude)::double precision, (latitude)::double precision))::geography, '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography, '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1) -> Hash (cost=231.00..231.00 rows=16000 width=4) -> Seq Scan on customer c (cost=0.00..231.00 rows=16000 width=4) JIT: Functions: 12 Options: Inlining false, Optimization false, Expressions true, Deforming true And the EXPLAIN ANALYZE: Hash Join (cost=431.00..402854.10 rows=80 width=20) (actual time=6.470..33.691 rows=6453 loops=1) Hash Cond: (b.id <http://b.id> = c.building_id) -> Seq Scan on building b (cost=0.00..402422.00 rows=80 width=20) (actual time=4.280..30.210 rows=6453 loops=1) Filter: (CASE st_dwithin((st_makepoint((longitude)::double precision, (latitude)::double precision))::geography, '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography, '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1) Rows Removed by Filter: 9547 -> Hash (cost=231.00..231.00 rows=16000 width=4) (actual time=2.173..2.174 rows=16000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 691kB -> Seq Scan on customer c (cost=0.00..231.00 rows=16000 width=4) (actual time=0.007..0.931 rows=16000 loops=1) Planning Time: 0.132 ms JIT: Functions: 12 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.311 ms, Inlining 0.000 ms, Optimization 0.184 ms, Emission 4.085 ms, Total 4.580 ms Execution Time: 34.239 ms On Wed, Nov 8, 2023 at 6:10 PM Regina Obe <[email protected] <mailto:[email protected]> > wrote: First of all is that really your join clause? You seem to be missing an ON from building b join customer c Also please output EXPLAIN And the EXPLAIN ANALYZE of each query From: postgis-users <[email protected] <mailto:[email protected]> > On Behalf Of Cameron McCloud via postgis-users Sent: Wednesday, November 8, 2023 5:38 AM To: [email protected] <mailto:[email protected]> Cc: Cameron McCloud <[email protected] <mailto:[email protected]> > Subject: Re: [postgis-users] ST_DWithin slow if query is joined to another table, but fast if ST_DWithin cast to an integer Test code is here: https://github.com/cameronmccloud/postgis_slow_st_dwithin/blob/main/test.sql On Wed, Nov 8, 2023 at 9:25 AM Cameron McCloud <[email protected] <mailto:[email protected]> > wrote: Hi, Postgis Version: 3.3.4 Postgres Version: 14.9 We have 2 tables, "building" with lat/long and "customer" with a FK to building. There's a 1:1 relationship between the two. The test tables we're using have 16K rows each. Our production data has a lot more, but we could reproduce this on a smaller dataset. We found some odd behaviour when using ST_DWITHIN in a WHERE clause but only when the "building" table is joined to the "customer" table. We also found that converting the result of ST_DWITHIN to an integer (1/0 for true/false) and using the integer in the WHERE is fast. -- This query works as expected and takes 60ms returning 6K rows. select b.* from building b where st_dwithin(st_makepoint(b.longitude, b.latitude)::geography, st_makepoint(-96.7804060, 33.2471770)::geography, 50000); -- This query is orders of magnitude slower - 3000ms, even though joining the two tables without the WHERE takes 30ms select b.* from building b join customer c where st_dwithin(st_makepoint(b.longitude, b.latitude)::geography, st_makepoint(-96.7804060, 33.2471770)::geography, 50000); -- This query converts the result of ST_DWITHIN to an integer. It's fast and takes 80ms select b.* from building b join customer c where case st_dwithin(st_makepoint(b.longitude, b.latitude)::geography, st_makepoint(-96.7804060, 33.2471770)::geography, 50000) when true then 1 else 0 end = 1; We have no idea why this is the case, but we're curious. In our production scenario using the "case...when" brought a query down from 6 minutes to 6 seconds. We'd love to know why this might be and if there are other scenarios like this that could increase the performance of our Postgis queries. Thanks, Cam.
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
