Re: [postgis-users] question on gist performance

Stefan Zweig Fri, 07 Sep 2007 01:06:55 -0700

hi mark,

thanks for your quick reply. i have tried your advice and rebuilded all my gist 
indexes on the table (via drop index, create index) and run a vacuum full 
afterwards. but that did not change the fact, that the query on the_geom is 
much slower than the same one on the_geom_1.


after that my colleague has tried another thing out:

"I have a small table (about 200 entries) containing large
(multipolygon) geometries (as discussed in 
http://postgis.refractions.net/docs/ch05.html),
stored in the_geom. Another column the_geom_envelope holds its
envelope geometries. Each column is indexed using GIST.

I experienced that doing a simple bounding box query results in
different times, dependent on the queried column:

  '...WHERE the_geom && expr;' takes about 250ms,
  '...WHERE the_geom_envelope && expr;' takes about 20ms.

ANALYZE says that GIST is used. As far as I know, the GIST makes use
only of the bounding boxes of Geometry objects, which are equal for
each object. How does it come that the query time using the_geom is more than
*ten times slower* than the query time using the_geom_envelope?"

which to me basically seems to be the same problem.

any ideas?

thanks in advance, stefan


> -----Ursprüngliche Nachricht-----
> Von: PostGIS Users Discussion <postgis-users@postgis.refractions.net>
> Gesendet: 06.09.07 09:40:18
> An: PostGIS Users Discussion <postgis-users@postgis.refractions.net>
> Betreff: Re: [postgis-users] question on gist performance


> 
> On Wed, 2007-09-05 at 18:12 +0200, Stefan Zweig wrote:
> > hi list,
> > 
> > i have noticed a strange thing while selecting geometries from a table and 
> > using the && operator (gist):
> > 
> > i have a table which holds all countrys of the world (somewhat more than 
> > 200) an their boundaries in column the_geom
> > the boundaries have a quite good resolution, so the total amount of points 
> > is huge.
> > for better performance (if low resolution is needed) i created a second 
> > column 
> > 
> > the_geom_1 as:
> > UPDATE table SET the_geom_1=simplify(the_geom,0.01)
> > 
> > which results in a less total amount of points compared to the original 
> > data.
> > 
> > i have set up a gist index on both columns
> > 
> > and now the strange thing:
> > SELECT name FROM _g2965 WHERE the_geom && SetSrid('BOX(13.760675 
> > 51.171104,15.231802 51.83737)'::box2d,4326);
> > takes 297ms
> > 
> > SELECT name FROM _g2965 WHERE the_geom_1 && SetSrid('BOX(13.760675 
> > 51.171104,15.231802 51.83737)'::box2d,4326);
> > takes 15ms
> > 
> > actually i would have thought that the query using the gist index should 
> > take the same time on both columns, because gist index strategy is 
> > comparing bounding boxes of the geometries with the bounding box of my 
> > query, isn't it?
> > can anybody give me a hint why there is such a massive time difference when 
> > selecting from different columns?
> > 
> > 
> > thanks in advance, stefan
> 
> 
> Hi Stefan,
> 
> I suspect that this because of your UPDATE statement. For each row that
> is touched during an UPDATE, a new entry is created in the GiST index -
> for various transactional reasons, the old version cannot be deleted.
> 
> For example: imagine I have your table with containing 200 countries of
> the world. When first created, your heap/indices look like this:
> 
> table: 200 rows
> the_geom_gist: 200 rows
> 
> Now you do your UPDATE table SET the_geom_1=simplify(the_geom,0.01):
> this creates new versions of each row in both the heap and the index.
> 
> table: 400 rows (200 live)
> the_geom_gist: 400 rows (200 live)
> 
> If you now build your second index:
> 
> table: 400 rows (200 live)
> the_geom_gist: 400 rows (200 live)
> the_geom_1_gist: 200 rows
> 
> The second index is smaller because the table only contained 200 live
> rows, and so these were the only ones used to build the index. Now I see
> from your schema that you've done this another two times which gives
> this:
> 
> table: 800 rows (200 live)
> the_geom_gist: 800 rows (200 live)
> the_geom_1_gist: 600 rows (200 live)
> the_geom_2_gist: 400 rows (200 live)
> the_geom_3_gist: 200 rows
> 
> So as you can see the original index is 4 times the size of the original
> which is why it takes longer to traverse. The quick solution would be to
> rebuild all of the GiST indices on your table (if these tables are
> fairly static you may wish to do a VACUUM FULL too). However, I would
> probably create one table with just a primary key and geometry column
> for each different resolution, and then perform a join onto your
> original table based upon the primary key, dropping the geometry columns
> from your original table.
> 
> 
> HTH,
> 
> Mark.
> 
> -- 
> ILande - Open Source Consultancy
> http://www.ilande.co.uk
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users@postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 




_______________________________________________________________________
Jetzt neu! Schützen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate
kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220

_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] question on gist performance

Reply via email to