Re: [PERFORM] possible improvement between G4 and G5

2004-04-20 Thread Aaron Werman
There are a few things that you can do to help force yourself to be I/O
bound. These include:

- RAID 5 for write intensive applications, since multiple writes per synch
write is good. (There is a special case for logging or other streaming
sequential writes on RAID 5)

- Data journaling file systems are helpful in stress testing your
checkpoints

- Using midsized battery backed up write through buffering controllers. In
general, if you have a small cache, you see the problem directly, and a huge
cache will balance out load and defer writes to quieter times. That is why a
midsized cache is so useful in showing stress in your system only when it is
being stressed.

Only partly in jest,
/Aaron

BTW - I am truly curious about what happens to your system if you use
separate RAID 0+1 for your logs, disk sorts, and at least the most active
tables. This should reduce I/O load by an order of magnitude.

"Vivek Khera" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> > "JB" == Josh Berkus <[EMAIL PROTECTED]> writes:
>
> JB> Aaron,
> >> I do consulting, so they're all over the place and tend to be complex.
Very
> >> few fit in RAM, but still are very buffered. These are almost all
backed
> >> with very high end I/O subsystems, with dozens of spindles with battery
> >> backed up writethrough cache and gigs of buffers, which may be why I
worry
> >> so much about CPU. I have had this issue with multiple servers.
>
> JB> Aha, I think this is the difference.  I never seem to be able to
> JB> get my clients to fork out for adequate disk support.  They are
> JB> always running off single or double SCSI RAID in the host server;
> JB> not the sort of setup you have.
>
> Even when I upgraded my system to a 14-spindle RAID5 with 128M cache
> and 4GB RAM on a dual Xeon system, I still wind up being I/O bound
> quite often.
>
> I think it depends on what your "working set" turns out to be.  My
> workload really spans a lot more of the DB than I can end up caching.
>
> -- 
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D.Khera Communications, Inc.
> Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
> AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
>
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
>

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] possible improvement between G4 and G5

2004-04-19 Thread Vivek Khera
> "JB" == Josh Berkus <[EMAIL PROTECTED]> writes:

JB> Aaron,
>> I do consulting, so they're all over the place and tend to be complex. Very
>> few fit in RAM, but still are very buffered. These are almost all backed
>> with very high end I/O subsystems, with dozens of spindles with battery
>> backed up writethrough cache and gigs of buffers, which may be why I worry
>> so much about CPU. I have had this issue with multiple servers.

JB> Aha, I think this is the difference.  I never seem to be able to
JB> get my clients to fork out for adequate disk support.  They are
JB> always running off single or double SCSI RAID in the host server;
JB> not the sort of setup you have.

Even when I upgraded my system to a 14-spindle RAID5 with 128M cache
and 4GB RAM on a dual Xeon system, I still wind up being I/O bound
quite often.

I think it depends on what your "working set" turns out to be.  My
workload really spans a lot more of the DB than I can end up caching.

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Josh Berkus
Aaron,

> I do consulting, so they're all over the place and tend to be complex. Very
> few fit in RAM, but still are very buffered. These are almost all backed
> with very high end I/O subsystems, with dozens of spindles with battery
> backed up writethrough cache and gigs of buffers, which may be why I worry
> so much about CPU. I have had this issue with multiple servers.

Aha, I think this is the difference.  I never seem to be able to get my 
clients to fork out for adequate disk support.   They are always running off 
single or double SCSI RAID in the host server; not the sort of setup you 
have.

> What my CPU tends to be doing is a combination of general processing,
> complex SQL processing: nested loops and sorting and hashing and triggers
> and SPs.

I haven't noticed SPs to be particularly CPU-hoggish, more RAM.

> I'm curious about you having flat CPU, which is not my experience. Are your
> apps mature and stable?

Well, "flat" was a bit of an exaggeration ... there are spikes ... but average 
CPU load is < 30%.I think the difference is that your clients listen to 
you about disk access.   Mine are all too liable to purchase a quad-Xeon 
machine but with an Adaptec RAID-5 card with 4 drives, and *then* call me and 
ask for advice.

As a result, most intensive operations don't tend to swamp the CPU because 
they are waiting for disk.   

I have noticed the limitiations on RAM for 64 vs. 32, as I find it easier to 
convince a client to get 8GB RAM than four-channel RAID with 12 drives, 
mostly because the former is cheaper.   Linux 2.4 + Bigmem just doesn't cut 
it for making effective use of > 3GB of RAM.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Aaron Werman


- Original Message - 
From: "Josh Berkus" <[EMAIL PROTECTED]>
To: "Aaron Werman" <[EMAIL PROTECTED]>; "Qing Zhao" <[EMAIL PROTECTED]>;
"Tom Lane" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, April 06, 2004 2:52 PM
Subject: Re: [PERFORM] possible improvement between G4 and G5


> Aaron,
>
> > I'm surprised by this thought. I tend to hit CPU bottlenecks more often
than
> > I/O ones. In most applications, db I/O is a combination of buffer misses
and
> > logging, which are both reasonably constrained.
>
> Not my experience at all.  In fact, the only times I've seen modern
platforms
> max out the CPU was when:
> a) I had bad queries with bad plans, or
> b) I had reporting queires that did a lot of calculation for display
(think
> OLAP).
>
> Otherwise, on the numerous servers I administrate, RAM spikes, and I/O
> bottlenecks, but the CPU stays almost flat.
>
> Of course, most of my apps are large databases (i.e. too big for RAM) with
a
> heavy transaction-processing component.
>
> What kind of applications are you running?
>
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
>



I do consulting, so they're all over the place and tend to be complex. Very
few fit in RAM, but still are very buffered. These are almost all backed
with very high end I/O subsystems, with dozens of spindles with battery
backed up writethrough cache and gigs of buffers, which may be why I worry
so much about CPU. I have had this issue with multiple servers.

Consider an analysis db with 10G data. Of that, 98% of the access is read
and only 2% write (that is normal for almost anything that is not order
entry, even transaction processing with thorough cross validation). Almost
all the queries access 10%, or 1G of the data. Of the reads, they average ~3
level b-trees, with the first 2 levels certainly cached, and the last ones
often cached. Virtually all the I/O activity is logical reads against
buffer. A system with a 100 transactions which on average access 200 rows
does 98% of 200 rows x 100 transactions x 3 logical I/Os per read = 58,800
logical reads, of which actually maybe a hundred are physical reads.  It
also does 2% of 200 rows x 100 transactions x (1 table logical I/O and say 2
index logical writes) per write = 1,200 logical writes to log, of which
there are 100 transaction commit synch writes, and in reality less than that
because of queuing against logs (there are also 1,200 logical writes
deferred to checkpoint, of which it is likely to only be 40 physical writes
because of page overlaps).

Transaction processing is a spectrum between activity logging, and database
centric design. The former, where actions are stored in the database is
totally I/O bound with the engine acting as a thin layer of logical to
physical mapping. Database centric processing makes the engine a functional
server of discrete actions - and is a big CPU hog.

What my CPU tends to be doing is a combination of general processing,
complex SQL processing: nested loops and sorting and hashing and triggers
and SPs.

I'm curious about you having flat CPU, which is not my experience. Are your
apps mature and stable?



/Aaron

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Josh Berkus
Aaron,

> I'm surprised by this thought. I tend to hit CPU bottlenecks more often than
> I/O ones. In most applications, db I/O is a combination of buffer misses and
> logging, which are both reasonably constrained. 

Not my experience at all.  In fact, the only times I've seen modern platforms 
max out the CPU was when:
a) I had bad queries with bad plans, or
b) I had reporting queires that did a lot of calculation for display (think 
OLAP).

Otherwise, on the numerous servers I administrate, RAM spikes, and I/O 
bottlenecks, but the CPU stays almost flat.

Of course, most of my apps are large databases (i.e. too big for RAM) with a 
heavy transaction-processing component.

What kind of applications are you running?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Aaron Werman

- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Qing Zhao" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, April 06, 2004 1:47 AM
Subject: Re: [PERFORM] possible improvement between G4 and G5


> Qing Zhao <[EMAIL PROTECTED]> writes:
> > We have got a G5 64-bit processor to replace an old G4 32-bit
> > processor.  Given everything else equal, should we see a big
> > improvement on PG's performance?
>
> Nope.  Database performance typically depends on disk performance first,
> and RAM size second.

I'm surprised by this thought. I tend to hit CPU bottlenecks more often than
I/O ones. In most applications, db I/O is a combination of buffer misses and
logging, which are both reasonably constrained. RAM size seems to me to be
the best way to improve performance, and then CPU which is needed to perform
the in-memory searching, locking, versioning,  and processing, and finally
I/O (this is not the case in small I/O subsystems - if you have less than a
dozen drives, you're easily I/O bound). I/O is often the thing I tune first,
because I can do it in place without buying hardware.

Conceptually, an RDBMS converts slow random I/O into in memory processing
and sequential logging writes. If successful, it should reduce the I/O
overhead.

/Aaron

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Jim C. Nasby
On Tue, Apr 06, 2004 at 01:47:22AM -0400, Tom Lane wrote:
> Qing Zhao <[EMAIL PROTECTED]> writes:
> > We have got a G5 64-bit processor to replace an old G4 32-bit
> > processor.  Given everything else equal, should we see a big
> > improvement on PG's performance?
> 
> Nope.  Database performance typically depends on disk performance first,
> and RAM size second.  A 64-bit processor might help by allowing you to
> install more RAM, but you didn't say that you had.

Memory bandwidth is a consideration too, so you might see some
performance improvements on a G5. We recently debated between Xeons and
Opterons in a new PGSQL server and a little poking around on the lists
indicated that the Opterons did perform better, presumably due to the
increased memory bandwidth. Incidentally, this is why you need about 2x
the CPUs on Sun hardware vs RS6000 hardware for database stuff (and that
gets expensive if you're paying per CPU!).
-- 
Jim C. Nasby, Database Consultant  [EMAIL PROTECTED]
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] possible improvement between G4 and G5

2004-04-05 Thread Tom Lane
Qing Zhao <[EMAIL PROTECTED]> writes:
> We have got a G5 64-bit processor to replace an old G4 32-bit
> processor.  Given everything else equal, should we see a big
> improvement on PG's performance?

Nope.  Database performance typically depends on disk performance first,
and RAM size second.  A 64-bit processor might help by allowing you to
install more RAM, but you didn't say that you had.

> The other question I have is that, when I tried different size for
> shared_buffer ( i used 10,000, 1,000, 528, 256) and Max
> connections=32, it gives me error when I tried to start PG using
> pg_ctl start as postgres. It kept saying this is bigger than the
> system Shared Memory.

Out-of-the-box, Mac OS X has a very low SHMMAX limit.  See the PG admin
docs or the mail list archives about how to increase it.  You should do
this --- most people find that you want to set shared_buffers to 1000 or
1 or so for best performance.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings