Query
Execution time
Predicted Rows
Rows Returned
select
obj_id,
msg_date_rec,
pos_point
from
feed_all.common_pos_messages inner join
feed_all.messages on (common_pos_messages.msg_id = messages.msg_id)
where
pos_lat between 30721085 and 31012505 and
pos_lon between 2601 and 867037 and
msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'
ORDER BY
obj_id, msg_date_rec
133146.196 ms
1751339
663769
select
obj_id,
msg_date_rec,
pos_point
from
feed_all.common_pos_messages inner join
feed_all.messages on (common_pos_messages.msg_id = messages.msg_id)
where
st_contains(ST_GeomFromText('POLYGON((0.00433541700872238
51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687
51.2018083846302,0.00433541700872238
51.2018083846302,0.00433541700872238 51.6875086022247))', 4326),
pos_point) and
msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'
ORDER BY
obj_id, msg_date_rec
271619.013 ms
911564
663769
select
obj_id,
msg_date_rec,
pos_point
from
feed_all.common_pos_messages inner join
feed_all.messages on (common_pos_messages.msg_id = messages.msg_id)
where
ST_GeomFromText('POLYGON((0.00433541700872238
51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687
51.2018083846302,0.00433541700872238
51.2018083846302,0.00433541700872238 51.6875086022247))', 4326) &&
pos_point AND
msg_date_rec BETWEEN '2011-03-21' AND '2011-03-22'
ORDER BY
obj_id, msg_date_rec
249301.858 ms
2734692
663769All in all, the second query takes longer but has the closest prediction when explain - analyze is performed while the quickest one is the first with the custom btree index. What really concerns me is that regardless of the Vacuum Analyze I perform times and predictions remain unchanged. Thus I will try to persevere with the first query for this development cycle aiming to implement in the future a gist index not on points but on clusters of points.
Nicklas, I had a reading for your suggestion but if I got it right, st_contains and st_within both run a && operator on the bounding box before, so I guess that the improved execution time of the 3rd query may be attributed to the missing cycle of running the st_contains.
As always any suggestion is more than welcomed. Thanks and kind regards Yiannis On 30/03/2011 15:15, Nicklas Avén wrote:
Ok, I continue my spamming :-) Your compare isn't fair. "Your" index is just doing a bounding box comparasion not a recheck to see what points is actually inside your geometry. It looks like your polygon is a box, but PostGIS don't know that so it will do a recheck on all rows fetched by the index and do a "real" calculation. To only do the bounding box test you can use&& as operator. And using count(*) instead of retrieving all the rows is good to avoid the I/O bottleneck Do a vacuum analyze on the table and try: select count(*) from feed_all.common_pos_messages where ST_GeomFromText('POLYGON((0.00433541700872238 51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687 51.2018083846302,0.00433541700872238 51.2018083846302,0.00433541700872238 51.6875086022247))', 4326 && pos_point; Little more thought through I hope :-) /Nicklas On Wed, 2011-03-30 at 16:01 +0200, Nicklas Avén wrote:Sorry, I was too fast on the keys. before I saw the comparasion with your own index /Nicklas On Wed, 2011-03-30 at 14:40 +0100, Ioannis Anagnostopoulos wrote:The precise numbers are as follows: Total Rows in the table: 45922415 if I use: select pos_lat, pos_lon from feed_all.common_pos_messages where st_contains(ST_GeomFromText('POLYGON((0.00433541700872238 51.6875086022247,1.4450615615687 51.6875086022247,1.4450615615687 51.2018083846302,0.00433541700872238 51.2018083846302,0.00433541700872238 51.6875086022247))', 4326), pos_point) I get back: 4493678 in 4.77 mins Doing explain analyze I get: "Bitmap Heap Scan on common_pos_messages (cost=82262.99..522647.01 rows=771600 width=8) (actual time=127735.424..198141.843 rows=4493678 loops=1)" " Recheck Cond: ('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry&& pos_point)" " Filter: _st_contains('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry, pos_point)" " -> Bitmap Index Scan on idx_pos (cost=0.00..82070.09 rows=2314801 width=0) (actual time=127732.000..127732.000 rows=4493679 loops=1)" " Index Cond: ('0103000020E61000000100000005000000D29145A403C2713F0B23294800D84940E04E3ADFF81EF73F0B23294800D84940E04E3ADFF81EF73FCC056EDBD4994940D29145A403C2713FCC056EDBD4994940D29145A403C2713F0B23294800D84940'::geometry&& pos_point)" "Total runtime: 199206.428 ms" Obviously the times and the rows do not seem to much with the actual results. However after creating my own index based on lat/lon values (integers) on the same table, executing this: select pos_lat, pos_lon from feed_all.common_pos_messages where pos_lat between 30721085 and 31012505 and pos_lon between 2601 and 867037 I get back 4493680 in 2.8 mins Doing explain analyze I get: "Bitmap Heap Scan on common_pos_messages (cost=161748.26..601144.64 rows=1686719 width=8) (actual time=10064.427..60738.808 rows=4493680 loops=1)" " Recheck Cond: ((pos_lat>= 30721085) AND (pos_lat<= 31012505) AND (pos_lon>= 2601) AND (pos_lon<= 867037))" " -> Bitmap Index Scan on idx_lat_lon (cost=0.00..161326.58 rows=1686719 width=0) (actual time=10061.108..10061.108 rows=4493680 loops=1)" " Index Cond: ((pos_lat>= 30721085) AND (pos_lat<= 31012505) AND (pos_lon>= 2601) AND (pos_lon<= 867037))" "Total runtime: 61850.720 ms" The predictions are still "out" from the actual but the btree index seems to behave better. Any suggestions? Probably I need to bring up to date my statistics. Kind Regards Yiannis On 30/03/2011 13:30, Sandro Santilli wrote:On Wed, Mar 30, 2011 at 10:58:57AM +0100, Ioannis Anagnostopoulos wrote:I am involved in a heavy database design initiative where the only kind of geometries I am dealing with are points. I have recently hit a 50million rows long table with those points and my default gist index on the points does not seem to be working very fast (if not at all to be honest). I have started now thinking that probably for "points" an index may not be the best option since in a 50million rows long table most of the points are unique so the index may just duplicate the actual table, of course I may be wrong and I may just missing a very important part of the concept.How many points from the 50M set does your tipical query hits ? Does PostgreSQL selectivity estimator make a good guess about that ? Use EXPLAIN ANALYZE<your query here> to see. --strk; () Free GIS& Flash consultant/developer /\ http://strk.keybit.net/services.html_______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users_______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users_______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
pla_queries.ods
Description: application/vnd.oasis.opendocument.spreadsheet
_______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
