Summary
===
We are writing to the db pretty much 24 hours a day.
Recently the amount of data we write has increased, and the query speed,
formerly okay, has taken a dive.
The query is using the indexes as expected, so I don't _think_ I have a query
tuning issue, just an io problem.
The first time a query is done it takes about 60 seconds. The second time it
runs in about 6 seconds.
What I know I need advice on is io settings and various buffer settings.
I may also need advice on other things, but just don't know it yet!
Below is ...
- an explain analyze
- details of the db setup and hardware
- some vmstat and iostat output showing the disks are very busy
- the SHOW ALL output for the db config.
Details
===
Postgres 8.0.3
Below is a sample query. (This is actually implemented as a prepared statement.
Here I fill in the '?'s with actual values.)
electric=# EXPLAIN ANALYZE
electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent
electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR
logfielddatatype = 69)
electric-# AND graphtargetlog = 1327
electric-# AND timestamp = 1123052400 AND timestamp = 1123138800
electric-# ORDER BY timestamp;
QUERY PLAN
--
Sort (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232
rows=2625 loops=1)
Sort Key: public.logdata_recent.timestamp
- Result (cost=0.00..82.41 rows=6 width=14) (actual
time=52.483..60200.868 rows=2625 loops=1)
- Append (cost=0.00..82.41 rows=6 width=14) (actual
time=52.476..60189.929 rows=2625 loops=1)
- Seq Scan on logdata_recent (cost=0.00..46.25 rows=1
width=14) (actual time=0.003..0.003 rows=0 loops=1)
Filter: (((logfielddatatype = 70) OR (logfielddatatype =
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp =
1123052400) AND (timestamp = 1123138800))
- Index Scan using logdata_recent_1123085306_ix_t_fld_gtl,
logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl
on logdata_recent_stale logdata_recent (cost=0.00..18.08 rows=3 width=14)
(actual time=52.465..60181.624 rows=2625 loops=1)
Index Cond: (((timestamp = 1123052400) AND (timestamp
= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR
((timestamp = 1123052400) AND (timestamp = 1123138800) AND
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp =
1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND
(graphtargetlog = 1327)))
Filter: (((logfielddatatype = 70) OR (logfielddatatype =
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp =
1123052400) AND (timestamp = 1123138800))
- Index Scan using logdata_recent_1123139634_ix_t_fld_gtl,
logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl
on logdata_recent_active logdata_recent (cost=0.00..18.08 rows=2 width=14)
(actual time=0.178..0.178 rows=0 loops=1)
Index Cond: (((timestamp = 1123052400) AND (timestamp
= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR
((timestamp = 1123052400) AND (timestamp = 1123138800) AND
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp =
1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND
(graphtargetlog = 1327)))
Filter: (((logfielddatatype = 70) OR (logfielddatatype =
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp =
1123052400) AND (timestamp = 1123138800))
Total runtime: 60214.545 ms
(13 rows)
60 seconds is much longer than it used to be. I would guess it used to be under
10 seconds. The second time the above query is run we see the magic of caching
as the time goes down to 6 seconds.
logdata_recent_active and logdata_recent_stale are inherited tables of
logdata_recent, which never has any data. (This is pseudo-partitioning in
action!)
So the very quick seq_scan on the empty logdata_recent parent table is okay
with me.
The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious
as to why the same index shows up 3 times in the using clause, but can live
without knowing the details as long as it doesn't indicate that something's
wrong.
The logdata_recent_stale table has 5 millions rows. The size of the table
itself, on disk, is 324MB. The size of the index is 210MB.
The disks are ext3 with journalling type of ordered, but this was later changed
to