Hi,
A perfectly normal and healthy PG 8.2.3 instance I have started acting funny -
a query that normally takes only a few milliseconds sometimes runs for minutes.
I thought maybe something's funny with indices, so I reindexed them all.
Didn't help. Thought maybe something's funky with the whole box, so I
rebooted, but that didn't help either.
The symptoms are:
- high load (5-10 vs. the usual 0.2 or so)
- lots of processes waiting for IO. See the last ("wa") column in the vmstat
output:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 192 477508 7620 420336 0 0 1074 1279 714 204 7 14 40 39
0 0 192 476488 7628 422928 0 0 1132 90 1335 407 4 2 49 45
0 0 192 476548 7644 423952 0 0 566 118 1424 393 6 2 76 16
0 0 192 475716 7644 424732 0 0 394 4 1166 201 2 1 87 10
I run the problematic query with EXPLAIN ANALYZE and saw that it uses indices
correctly and returns quickly. But once in a while it just runs for minutes,
which I can see with something like this:
select procpid as "PID", datname as "Database", usename as "User",
query_start, now(), current_query from pg_stat_activity where ((now() -
query_start) > 0) and current_query NOT LIKE '<IDLE>%'
Now the interesting part. I monitor this box and can see when exactly the load
went up - it went up suddenly and rapidly at *exactly* 3 AM last night, which
is when my db maintenance scripts run. These scripts do:
1. VACUUM VERBOSE
2. ANALYZE
3. pg_dump (for backup)
I checked the logs - nothing funky in there, nothing with "ERROR" or "WARN".
But that is exactly when this problem started.
Is there any way I can see why sometimes a speedy SELECT runs for several
minutes?
Maybe my tables grew too big and something doesn't have enough space/memory?
Maybe I can see that from the vacuum verbose output?
Thanks,
Otis
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster