Bruce, Thanks, this is certainly relevant to my case -- I tried to CLUSTER the database on date to achieve some economics, but this seems to be an even more explicit way to achieve this. I will experiment with this for certain.
Still, though, it seems as if the index is not functioning correctly, or at least not functioning the way that I want it to. Depending on the size of the partitions, say if I did it monthly as you suggest, I still will end up scanning many more records than I need to. For a test, I did my query as written before (just looking at a bounding box containment), and it took 15 minutes (900,000 ms). This is with spatial indices on the rainfall point table. Now, I dumped the records on the day of interest into a temp table (took only 156 ms), and did a full-blown Within() query on the temp records, and it only took 2.5 seconds (2,579 ms). My impression is that if the indexing were working as desired, it should only take my query less than 3 seconds to complete. Does anyone know if I am missing something with regards to my understanding of index behavior? Robert W. Burgholzer Surface Water Modeler Office of Water Supply and Planning Virginia Department of Environmental Quality [email protected] 804-698-4405 Open Source Modeling Tools: http://sourceforge.net/projects/npsource/ -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Bruce Rindahl Sent: Wednesday, March 18, 2009 11:22 AM To: PostGIS Users Discussion Subject: Re: [postgis-users] Index Use Craziness Robert I don't know how far back your data goes but you might look at this: http://www.postgresql.org/docs/8.2/interactive/ddl-partitioning.html Another application similar to yours used this inherited tables idea to partition by months and a query for rainfall 15 years ago is now the same as within the last month. Before it was several orders of magnitude slower. Bruce Burgholzer,Robert wrote: > I am having much trouble getting postgres to use the proper index. > > I have a table with about 20 million daily rainfall observation points, > with between 0 and 19,000 points on any given day. In order to speed up > querying, I have created a GIST index on the geometry column, and an > index on the date column. > > What I want to do is to summarize the observed rainfall in another shape > table on any given day. If I just want to query points by date, this > uses the date index, and performs very quickly (see "CASE 1" below). > > When I try to introduce a geometry operator, the GIST index takes over, > and the index on the "thisdate" column is ignored. The weird thing is, > I can hear it churning the heck out of my hard drive checking on all the > points. (See "CASE 2" below). > > I even tried to sub-query the points for the given date, but the index > did not seem to be fooled ("CASE 3"). It gave me the exact same query > plan!! I am unsure if I really understand the way a sub-query functions > given this result. > > Any help would be appreciated! > > Robert > > CASE 1: > wsp=# explain select count(*) from precip_gridded where thisdate = > '2009-01-12'; > > QUERY PLAN > ------------------------------------------------------------------------ > ---- > Aggregate (cost=479.43..479.44 rows=1 width=0) > -> Index Scan using pg_tdix on precip_gridded (cost=0.00..451.92 > rows=11004 > width=0) > Index Cond: (thisdate = '2009-01-12 00:00:00'::timestamp > without time zone) > (3 rows) > > > CASE 2: > explain select count(*) from precip_gridded where thisdate = > '2009-01-12' and the_geom && setsrid((select extent(the_geom) from > proj_seggroups where gid = 448),4326); > > QUERY PLAN > ------------------------------------------------------------------------ > ---- > Aggregate (cost=100.23..100.24 rows=1 width=0) > InitPlan > -> Aggregate (cost=5.87..5.88 rows=1 width=2807) > -> Index Scan using psg_giix on proj_seggroups > (cost=0.00..5.87 rows=1 width=2807) > Index Cond: (gid = 448) > -> Index Scan using pg_gix on precip_gridded (cost=0.00..94.35 > rows=1 width=0) > Index Cond: (the_geom && setsrid(($0)::geometry, 4326)) > Filter: ((thisdate = '2009-01-12 00:00:00'::timestamp without > time zone) AND (the_geom && setsrid(($0)::geometry, 4326))) > (8 rows) > > > CASE 3: Sub-query by date to try and force the use of date index > wsp=# explain select count(a.*) from (select * from precip_gridded where > thisdate = '2009-01-12') as a where a.the_geom && setsrid((select > extent(the_geom) from proj_seggroups where gid = 448),4326); > > QUERY PLAN > > ------------------------------------------------------------------------ > ---- > Aggregate (cost=100.23..100.24 rows=1 width=81) > InitPlan > -> Aggregate (cost=5.87..5.88 rows=1 width=2807) > -> Index Scan using psg_giix on proj_seggroups > (cost=0.00..5.87 rows=1 width=2807) > Index Cond: (gid = 448) > -> Index Scan using pg_gix on precip_gridded (cost=0.00..94.35 > rows=1 width=81) > Index Cond: (the_geom && setsrid(($0)::geometry, 4326)) > Filter: ((thisdate = '2009-01-12 00:00:00'::timestamp without > time zone) AND (the_geom && setsrid(($0)::geometry, 4326))) > (8 rows) > > > Robert W. Burgholzer > Surface Water Modeler > Office of Water Supply and Planning > Virginia Department of Environmental Quality > [email protected] > 804-698-4405 > Open Source Modeling Tools: > http://sourceforge.net/projects/npsource/ > > > _______________________________________________ > postgis-users mailing list > [email protected] > http://postgis.refractions.net/mailman/listinfo/postgis-users > > > _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
