On Wed, 2007-09-05 at 18:12 +0200, Stefan Zweig wrote: > hi list, > > i have noticed a strange thing while selecting geometries from a table and > using the && operator (gist): > > i have a table which holds all countrys of the world (somewhat more than 200) > an their boundaries in column the_geom > the boundaries have a quite good resolution, so the total amount of points is > huge. > for better performance (if low resolution is needed) i created a second > column > > the_geom_1 as: > UPDATE table SET the_geom_1=simplify(the_geom,0.01) > > which results in a less total amount of points compared to the original data. > > i have set up a gist index on both columns > > and now the strange thing: > SELECT name FROM _g2965 WHERE the_geom && SetSrid('BOX(13.760675 > 51.171104,15.231802 51.83737)'::box2d,4326); > takes 297ms > > SELECT name FROM _g2965 WHERE the_geom_1 && SetSrid('BOX(13.760675 > 51.171104,15.231802 51.83737)'::box2d,4326); > takes 15ms > > actually i would have thought that the query using the gist index should take > the same time on both columns, because gist index strategy is comparing > bounding boxes of the geometries with the bounding box of my query, isn't it? > can anybody give me a hint why there is such a massive time difference when > selecting from different columns? > > > thanks in advance, stefan
Hi Stefan, I suspect that this because of your UPDATE statement. For each row that is touched during an UPDATE, a new entry is created in the GiST index - for various transactional reasons, the old version cannot be deleted. For example: imagine I have your table with containing 200 countries of the world. When first created, your heap/indices look like this: table: 200 rows the_geom_gist: 200 rows Now you do your UPDATE table SET the_geom_1=simplify(the_geom,0.01): this creates new versions of each row in both the heap and the index. table: 400 rows (200 live) the_geom_gist: 400 rows (200 live) If you now build your second index: table: 400 rows (200 live) the_geom_gist: 400 rows (200 live) the_geom_1_gist: 200 rows The second index is smaller because the table only contained 200 live rows, and so these were the only ones used to build the index. Now I see from your schema that you've done this another two times which gives this: table: 800 rows (200 live) the_geom_gist: 800 rows (200 live) the_geom_1_gist: 600 rows (200 live) the_geom_2_gist: 400 rows (200 live) the_geom_3_gist: 200 rows So as you can see the original index is 4 times the size of the original which is why it takes longer to traverse. The quick solution would be to rebuild all of the GiST indices on your table (if these tables are fairly static you may wish to do a VACUUM FULL too). However, I would probably create one table with just a primary key and geometry column for each different resolution, and then perform a join onto your original table based upon the primary key, dropping the geometry columns from your original table. HTH, Mark. -- ILande - Open Source Consultancy http://www.ilande.co.uk _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users