Great response, Thanks.
Regarding 12GB memory and 13G db, and almost no I/O, one thing I don't
understand is that even though the OS caches most of the memory and PG
can use it if it needs it, why would the system swap (not much, only
during peak times)? The SHMMAX is set to 512MB, shared_buffers is 150MB,
effective cache size is 2GB, sort mem is 2MB, rest is default values. It
also happens that a large query (reporting type) can hold up the other
queries, and the load averages shoot up during peak times.
Regarding a baseline -
-We have docs and monitor for frequency of sql statements, most
expensive ones, etc. (IronEye)
-I am monitoring disk reads/writes using iostat
-How do I measure commit frequency, and system events like checkpoint?
(vacuum is done nightly during less or no load)
From: Aaron W [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 04, 2004 8:58 AM
To: [EMAIL PROTECTED]; Anjan Dave
Subject: Re: Scaling further up
I'd look at adding more disks first. Depending on what
type of query
load you get, that box sounds like it will be very
much I/O bound....
Given a a 13G database on a 12G system, with a low
growth rate, it is likely that there is almost no I/O
for most activities. The exception is checkpointing.
The first thing I'd do is try to build a spreadsheet
- select frequency, and # logical and physical reads
- insert/delete/update frequency, and # logical and
physical read and writes involved
- commit frequency, etc.
(start out with simplistic assumptions, and do it for
- system events (checkpoints, vacuum)
I assume that the only high I/O you will see will be
for logging. The RAID issue there is basically
obviated by the sequential write nature of WAL. If
that is the case, EMC is not the most cost effective
or obvious solution - since the value they provide is
mostly manageability for disaster recovery. The goal
in this case is to write at the application max speed,
and with mimimal latency. Any responsible battery
backed up write through (mirrored) cached controller
can do that for you.
On the other hand, if your requests are not *all*
trivial, you are going to test the hardware and
scheduling algorithms of OS and pg. Even if 0.1% of
3,000 tps take a second - that ends up generating 3
seconds of load.... Any, even slightly, slow
transactions will generate enormous queues which slow
In most systems of this volume I've seen, the mix of
activities is constantly invalidating cache, making L2
caching less important. Memory to CPU bus speed is a
limiting factor, as well as raw CPU speed in
processing the requests. Xeon is not a great
architecture for this because of FSB contention; I
suspect a 4-way will be completely FSB bottlenecked so
a more than 4 way would likely not change performance.
I would try to get a simple model/benchmark going and
test against it. You should be talking to the big iron
vendors for their take on your issues and get their
Do you Yahoo!?
Yahoo! Search - Find what you're looking for faster
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]