Hi,

The problem is now solved (in theory).

Well, it's not the functions per se that's the problem, it's the lack of
> a useful index on the expression.
>

The measurement table indexes (on date and weather station) were not being
used because the only given date ranges (e.g., 1900 - 2009) were causing the
planner to do a full table scan, which is correct. What I had to do was find
a way to reduce the dates so that the planner would actually use the index,
rather than doing a full table scan on 43 million records. By passing in
1955 - 1960 the full table scan went away in favour of an index scan, as
expected.

Each weather station has a known lifespan (per climate category). That is,
not all weather stations between 1880 and 2009 collected data.  For example,
one weather station monitored the maximum daily temperature between
2006-11-29 and 2009-12-31. Some stations span more than 30 years, but I
believe those are in the minority (e.g., 1896-12-01 to 1959-01-31). (I'll be
able to verify once the analysis is finished.)

I will add another table that maps the stations to category and min/max
dates. I can then use this reference table which should (theory part here)
tell the planner to use the index.

What is *really impressive*, though... If my understanding is correct...

PostgreSQL scanned 43 million rows 78 times, returning results in ~90 sec.

Thanks again for all your help, everybody. I sincerely appreciate your
patience, comments, and ideas.

Dave

Reply via email to