Re: [HACKERS] Another try at reducing repeated detoast work for PostGIS

Mark Cave-Ayland Tue, 18 Aug 2009 02:33:35 -0700

Tom Lane wrote:

There was recently another go-round on the postgis-devel list about
the same problem Mark Cave-Ayland complained about last year:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php
Basically, what is happening is a nestloop join where the inner
indexscan gets a comparison argument from the outer table, and that
argument is toasted.  Every single call of an index support function
detoasts the argument again :-(, leading to upwards of 90% of the
runtime being spent in detoasting.


I made a proposal to fix it
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00709.php
but that failed due to excessive ambition --- the cost/benefit/risk
tradeoffs just weren't good enough.

Thinking about it again, it seems to me that a much narrower patch
could solve the specific forms of the problem that the PostGIS folk
are seeing.  Instead of trying to have a general-purpose method of
preventing repeat de-toasting, we could just prevent it for inner
indexscans by having ExecIndexEvalRuntimeKeys() detoast anything it's
passing to the index AM.  The attached patch accomplishes this with
a net addition of about three statements.  (It looks bigger, because
I had to move a hunk of code to have the datatype info available when
needed.)  Paul Ramsey reports that this fixes the problem for him:
http://postgis.refractions.net/pipermail/postgis-devel/2009-August/006659.html

The only downside I can see offhand is that it will detoast short-header
values that might not actually need to be detoasted.  But we don't have
any very good way to know whether a datatype's index support functions
use PG_DETOAST_DATUM or PG_DETOAST_DATUM_PACKED.  In the former case we
do need to detoast short-header values.  The extra overhead in this case
amounts to only one palloc and a fairly short memcpy, which should be
pretty negligible in comparison to the other setup costs of an
indexscan, so I'm not very concerned about it.

Comments?  Better ideas?

                        regards, tom lane



Hi Tom,

Thanks for the patch. Fortunately enough I was able to find the datasetfrom the original report above, and so I've tested your patch againstPostgreSQL 8.4. Unfortunately in the original test case, it doesn't seemto give the same performance improvement for me that Paul was seeing :(



postgis13=# \d geography
                                   Table "public.geography"
   Column   |          Type          |                        Modifiers
------------+------------------------+---------------------------------------------------------

gid | integer | not null defaultnextval('geography_gid_seq'::regclass)

 type       | character varying(1)   |
 id         | numeric(20,0)          |
 name       | character varying(32)  |
 population | numeric(20,0)          |
 abbreviati | character varying(2)   |
 po_name    | character varying(100) |
 id_geo_sta | numeric(20,0)          |
 the_geom   | geometry               |
 centroid   | geometry               |
Indexes:
    "geography_pkey" PRIMARY KEY, btree (gid)
    "idx_geography_centroid_z" gist (centroid)
Check constraints:
    "enforce_dims_the_geom" CHECK (ndims(the_geom) = 2)

"enforce_geotype_the_geom" CHECK (geometrytype(the_geom) ='MULTIPOLYGON'::text OR the_geom IS NULL)

    "enforce_srid_the_geom" CHECK (srid(the_geom) = (-1))


PostgreSQL 8.4 vanilla:

postgis13=# explain analyze select count(*) from geography wheretype='Z' and centroid && (select the_geom from geography where id=69495);


QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actualtime=1770.333..1770.334 rows=1 loops=1)

   InitPlan 1 (returns $0)

-> Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4432)(actual time=19.529..43.893 rows=1 loops=1)

           Filter: (id = 69495::numeric)

-> Index Scan using idx_geography_centroid_z on geography(cost=0.00..8.28 rows=1 width=0) (actual time=48.897..1730.004rows=29687 loops=1)

         Index Cond: (centroid && $0)
         Filter: ((type)::text = 'Z'::text)
 Total runtime: 1770.417 ms
(8 rows)


PostgreSQL 8.4 with your patch applied:

postgis13=# explain analyze select count(*) from geography wheretype='Z' and centroid && (select the_geom from geography where id=69495);


QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actualtime=1499.414..1499.416 rows=1 loops=1)

   InitPlan 1 (returns $0)

-> Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4432)(actual time=18.901..42.648 rows=1 loops=1)

           Filter: (id = 69495::numeric)

-> Index Scan using idx_geography_centroid_z on geography(cost=0.00..8.28 rows=1 width=0) (actual time=43.133..1456.944rows=29687 loops=1)

         Index Cond: (centroid && $0)
         Filter: ((type)::text = 'Z'::text)
 Total runtime: 1499.498 ms
(8 rows)


PostgreSQL 8.4 with force_2d() wrapper function:

postgis13=# explain analyze select count(*) from geography wheretype='Z' and centroid && (select force_2d(the_geom) from geography whereid=69495);


QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actualtime=285.974..285.975 rows=1 loops=1)

   InitPlan 1 (returns $0)

-> Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4432)(actual time=19.278..48.056 rows=1 loops=1)

           Filter: (id = 69495::numeric)

-> Index Scan using idx_geography_centroid_z on geography(cost=0.00..8.28 rows=1 width=0) (actual time=48.375..196.404 rows=29687loops=1)

         Index Cond: (centroid && $0)
         Filter: ((type)::text = 'Z'::text)
 Total runtime: 286.057 ms

So while there is some improvement with your patch, it still comesnowhere close to the speed obtained with wrapping the column in functionthat deTOASTs the input. Could it be that your patch misses out thisparticular case?



ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another try at reducing repeated detoast work for PostGIS

Reply via email to