I have performance related question as I noticed strange PostGIS behaviour I can not explain, that degrades query performance.
I do have a table with North America continent split into 1mln irregular sectors based on population density. Some of the sectors are 100mx100m (cities) and some are 100kmx100km (north Canada) in size. Table definition: db=> \d+ table Table "table” Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description -----------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+------------- id | character(36) | | not null | | extended | | | polygon | geometry(MultiPolygon,4326) | | | | main | | 10000 | … Indexes: "table_pkey" PRIMARY KEY, btree (id) "table_gist_idx" gist (polygon) … Access method: heap db=> select version(); version -------------------------------------------------------------------------------------------------------------- PostgreSQL 14.7 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit (1 row) db=> select postgis_full_version(); postgis_full_version ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- POSTGIS="3.1.7 aafe1ff" [EXTENSION] PGSQL="120" (procs need upgrade for use with PostgreSQL "140") GEOS="3.9.1-CAPI-1.14.2" PROJ="8.0.1" GDAL="GDAL 3.4.3, released 2022/04/22" LIBXML="2 .9.1" LIBJSON="0.15" LIBPROTOBUF="1.3.2" WAGYU="0.5.0 (Internal)" (core procs from "3.1.5 c60e4e3" need upgrade) RASTER (raster procs from "3.1.5 c60e4e3" need upgrade) (1 row) db=> select count(*) from table; count --------- 1083887 (1 row) db=> select oid::regclass, pg_size_pretty(pg_relation_size(oid)) from pg_class WHERE oid::regclass::text like ‘table%’; oid | pg_size_pretty ---------------------------------------------+---------------- table_pkey | 61 MB table_gist_idx | 63 MB table | 774 MB (6 rows) And now the problem - the first symptoms are visible with ST_Contains function - the index is used correctly but the row count estimate (1075) is way too high. I think it’s causing parallelism to kick off breaking query performance: db=> EXPLAIN ANALYZE SELECT id FROM table WHERE ST_Contains(polygon, ST_PointFromText('POINT(-84.68998 39.1622)', 4326)); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1044.35..20805.92 rows=1 width=36) (actual time=0.479..52.232 rows=1 loops=1) Workers Planned: 1 Workers Launched: 1 -> Parallel Bitmap Heap Scan on table (cost=44.35..19805.82 rows=1 width=36) (actual time=0.154..0.154 rows=0 loops=2) Filter: st_contains(polygon, '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) Rows Removed by Filter: 0 Heap Blocks: exact=1 -> Bitmap Index Scan on table_gist_idx (cost=0.00..44.35 rows=1075 width=0) (actual time=0.066..0.066 rows=2 loops=1) Index Cond: (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) Planning Time: 0.287 ms Execution Time: 52.282 ms The root cause of the problem with ST_Contains row estimation sources from bounding box ~ operator estimate (1075 as before). The parallelism doesn’t kick in here however as the BB operation is much cheaper then ST_Contains function: db=> EXPLAIN ANALYZE SELECT id FROM table WHERE polygon ~ ST_PointFromText('POINT(-84.68998 39.1622)', 4326); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on table (cost=44.62..4004.37 rows=1075 width=36) (actual time=0.068..0.069 rows=2 loops=1) Recheck Cond: (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) Heap Blocks: exact=1 -> Bitmap Index Scan on table_gist_idx (cost=0.00..44.35 rows=1075 width=0) (actual time=0.063..0.063 rows=2 loops=1) Index Cond: (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) Planning Time: 0.210 ms Execution Time: 0.106 ms (7 rows) And here comes the surprise - theoretically less restrictive && operator does it kind a correct predicting single row: db=> EXPLAIN ANALYZE SELECT id FROM table WHERE polygon && ST_PointFromText('POINT(-84.68998 39.1622)', 4326); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Index Scan using table_gist_idx on table (cost=0.29..8.30 rows=1 width=36) (actual time=0.068..0.069 rows=2 loops=1) Index Cond: (polygon && '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) Planning Time: 0.371 ms Execution Time: 0.117 ms (4 rows) As a workaround combining && with ST_Contains does the job for me but imaybe there is room for improvement? db=> EXPLAIN ANALYZE SELECT id FROM table WHERE polygon && ST_PointFromText('POINT(-84.68998 39.1622)', 4326) AND ST_Contains(polygon, ST_PointFromText('POINT(-84.68998 39.1622)', 4326)); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using table_gist_idx on table (cost=0.29..33.30 rows=1 width=36) (actual time=0.077..0.078 rows=1 loops=1) Index Cond: ((polygon && '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) AND (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry)) Filter: st_contains(polygon, '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) Rows Removed by Filter: 1 Planning Time: 0.451 ms Execution Time: 0.141 ms (6 rows) So the question is - why less restrictive intersect && operator gives lower row count prediction than more restrictive contains ~ operator? Shouldn’t the row count estimate for ~ operator be the same or lower as for &&? Is it a bug in PostGIS (or PostgreSQL core)? Best regards Wojtek Strzałka
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/postgis-users