Re: [PERFORM] Config review

2004-12-07 Thread Andrew Sullivan
On Tue, Dec 07, 2004 at 10:07:54AM -0500, Tom Lane wrote:
> If you are using a RAID configuration it might just be that you need
> to adjust the configuration (IIRC, there are some RAID setups that
> are not very write-friendly).  Otherwise you may have little alternative
> but to buy faster disks.

It might be that altering the Clariion array from RAID 5 to RAID 1+0
would make a difference; but I'd be very surprised to learn that you
could get that array to go a whole lot faster.

One thing that might also be worth investigating is whether
performance actually goes up by moveing the WAL into the array. 
We've had some remarkably good experiences with our recently-acquired
EMC.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Config review

2004-12-07 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Bryan Vest <[EMAIL PROTECTED]> writes:
>> Currently the database is at a size of 87.6Gig. A Vacuum Analyze runs every
>> night and has been taking 4 or 5 hours to complete. Everything seems to run
>> fine for a while, then at some point the load goes through the roof and the
>> iowait % also goes way up. It will recover after a little bit and then do the
>> same thing all over again.

> While others have pointed out problems with the config I don't think any of
> them explains this irregular behaviour.

As Greg says, it might be checkpoints or a background query.  If it
actually is the vacuum itself causing the variation in load, the theory
that comes to mind is that the performance tanks when the vacuum run
switches from find-dead-tuples to clean-indexes mode; clean-indexes is
usually a lot more I/O intensive.

ISTM it actually doesn't matter much which of these explanations is
correct, because all three imply the same thing: not enough disk I/O
bandwidth.  The disk is near saturation already and any increase in
demand drives response time over the knee of the curve.

If you are using a RAID configuration it might just be that you need
to adjust the configuration (IIRC, there are some RAID setups that
are not very write-friendly).  Otherwise you may have little alternative
but to buy faster disks.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Config review

2004-12-07 Thread Greg Stark

Bryan Vest <[EMAIL PROTECTED]> writes:

> Currently the database is at a size of 87.6Gig. A Vacuum Analyze runs every
> night and has been taking 4 or 5 hours to complete. Everything seems to run
> fine for a while, then at some point the load goes through the roof and the
> iowait % also goes way up. It will recover after a little bit and then do the
> same thing all over again. When this happens access to the web based user
> interface slows way down for our customers. Any input for improvements to this
> config would be appreciated, Thanks.

While others have pointed out problems with the config I don't think any of
them explains this irregular behaviour. From what you're describing the
response time is ok most of the time except for these particular bursts?

Do they occur at regular intervals? Is it possible it's just the
checkpointing? Can you see which volumes the i/o traffic is on? Is it on the
local transaction log files or is it on the data files? Does the write i/o
spike upwards or is it just a storm of read i/o? Also, incidentally, Is it
possible you have a cron job running vacuum and don't realize it?

If it happens at irregular intervals then it could be a single bad query
that's causing the problem. One bad query would cause a sequential scan of
your 87G and potentially push out a lot of data from the cache. I imagine this
might also be especially bad with the shared_buffers being out of whack.

You might start by checking the easiest thing first, set
log_min_duration_statement to something high and slowly lower it until it's
printing a handful of queries during the heaviest period.

You could also look for a pgsql_tmp directory that indicate a disk sort is
happening, which would mean some query is trying to sort a lot of data. You
might have to lower sort_mem to a conservative value before you could see that
though.

The pgsql_tmp directory appears (and disappears?) as needed, it's something
like this:

bash-2.05b# ls /var/lib/postgres/data/base/17150/pgsql_tmp
pgsql_tmp22184.0

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])