Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Joe Uhl


On Mar 20, 2009, at 4:58 PM, Scott Marlowe wrote:


On Fri, Mar 20, 2009 at 2:49 PM, Joe Uhl joe...@gmail.com wrote:


On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote:



What does the cs entry on vmstat say at this time?  If you're cs is
skyrocketing then you're getting a context switch storm, which is
usually a sign that there are just too many things going on at  
once /

you've got an old kernel things like that.


cs column (plus cpu columns) of vmtstat 1 30 reads as follows:

csus  sy id wa
11172 95  4  1  0
12498 94  5  1  0
14121 91  7  1  1
11310 90  7  1  1
12918 92  6  1  1
10613 93  6  1  1
9382  94  4  1  1
14023 89  8  2  1
10138 92  6  1  1
11932 94  4  1  1
15948 93  5  2  1
12919 92  5  3  1
10879 93  4  2  1
14014 94  5  1  1
9083  92  6  2  0
11178 94  4  2  0
10717 94  5  1  0
9279  97  2  1  0
12673 94  5  1  0
8058  82 17  1  1
8150  94  5  1  1
11334 93  6  0  0
13884 91  8  1  0
10159 92  7  0  0
9382  96  4  0  0
11450 95  4  1  0
11947 96  3  1  0
8616  95  4  1  0
10717 95  3  1  0

We are running on 2.6.28.7-2 kernel.  I am unfamiliar with vmstat  
output but
reading the man page (and that cs = context switches per second)  
makes my

numbers seem very high.


No, those aren't really all that high.  If you were hitting cs
contention, I'd expect it to be in the 25k to 100k range.  10k
average under load is pretty reasonable.

Our sum JDBC pools currently top out at 400 connections (and we are  
doing

work on all 400 right now).  I may try dropping those pools down even
smaller. Are there any general rules of thumb for figuring out how  
many

connections you should service at maximum?  I know of the memory
constraints, but thinking more along the lines of connections per  
CPU core.


Well, maximum efficiency is usually somewhere in the range of 1 to 2
times the number of cores you have, so trying to get the pool down to
a dozen or two connections would be the direction to generally head.
May not be reasonable or doable though.


Turns out we may have an opportunity to purchase a new database server  
with this increased load.  Seems that the best route, based on  
feedback to this thread, is to go whitebox, get quad opterons, and get  
a very good disk controller.


Can anyone recommend a whitebox vendor?

Is there a current controller anyone on this list has experience with  
that they could recommend?


This will be a bigger purchase so will be doing research and  
benchmarking but any general pointers to a vendor/controller greatly  
appreciated.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Greg Smith

On Tue, 24 Mar 2009, Joe Uhl wrote:


Can anyone recommend a whitebox vendor?


I dumped a list of recommended vendors from a discussion here a while back 
at http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks you could get 
started with.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Scott Marlowe
On Tue, Mar 24, 2009 at 1:29 PM, Greg Smith gsm...@gregsmith.com wrote:
 On Tue, 24 Mar 2009, Joe Uhl wrote:

 Can anyone recommend a whitebox vendor?

 I dumped a list of recommended vendors from a discussion here a while back
 at http://wiki.postgresql.org/wiki/SCSI_vs._IDE/SATA_Disks you could get
 started with.

I'd add Aberdeen Inc to that list.  They supply quality white box
servers with 3ware, areca, or LSI controllers, and provide a 5 year
all inclusive warranty.  Their customer service is top notch too.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Ron

At 02:47 PM 3/24/2009, Joe Uhl wrote:

Turns out we may have an opportunity to purchase a new database 
server with this increased load.  Seems that the best route, based 
on feedback to this thread, is to go whitebox, get quad opterons, 
and get a very good disk controller.


Can anyone recommend a whitebox vendor?

I'll 2nd the Aberdeen recommendation.  I'll add Pogolinux to that list as well.


Is there a current controller anyone on this list has experience 
with that they could recommend?
The 2 best performing RAID controller vendors at this time are AMCC 
(AKA 3Ware) and Areca.
In general, the 8+ port Areca's with their BB cache maxed outperform 
every other controller available.



This will be a bigger purchase so will be doing research and 
benchmarking but any general pointers to a vendor/controller greatly 
appreciated.


Be =very= careful to thoroughly bench both the AMD and Intel CPU 
options.  It is far from clear which is the better purchase.


I'd be very interested to see the results of your research and 
benchmarks posted here on pgsql-performance.


Ron Peacetree 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Scott Marlowe
On Tue, Mar 24, 2009 at 4:58 PM, Ron rjpe...@earthlink.net wrote:
 At 02:47 PM 3/24/2009, Joe Uhl wrote:

 Turns out we may have an opportunity to purchase a new database server
 with this increased load.  Seems that the best route, based on feedback to
 this thread, is to go whitebox, get quad opterons, and get a very good disk
 controller.

 Can anyone recommend a whitebox vendor?

 I'll 2nd the Aberdeen recommendation.  I'll add Pogolinux to that list as
 well.


 Is there a current controller anyone on this list has experience with that
 they could recommend?

 The 2 best performing RAID controller vendors at this time are AMCC (AKA
 3Ware) and Areca.
 In general, the 8+ port Areca's with their BB cache maxed outperform every
 other controller available.


 This will be a bigger purchase so will be doing research and benchmarking
 but any general pointers to a vendor/controller greatly appreciated.

 Be =very= careful to thoroughly bench both the AMD and Intel CPU options.
  It is far from clear which is the better purchase.

My anecdotal experience has been that the Opterons stay afloat longer
as load increases, but I haven't had machines with similar enough
hardware to really test that.

 I'd be very interested to see the results of your research and benchmarks
 posted here on pgsql-performance.

Me too.  I'm gonna spend some time this summer benchmarking and tuning
the database servers that I pretty much had to burn in and put in
production this year due to time pressures.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-24 Thread Scott Carey

On 3/24/09 4:16 PM, Scott Marlowe scott.marl...@gmail.com wrote:

 On Tue, Mar 24, 2009 at 4:58 PM, Ron rjpe...@earthlink.net wrote:
 At 02:47 PM 3/24/2009, Joe Uhl wrote:
 
 Turns out we may have an opportunity to purchase a new database server
 with this increased load.  Seems that the best route, based on feedback to
 this thread, is to go whitebox, get quad opterons, and get a very good disk
 controller.
 
 Can anyone recommend a whitebox vendor?
 
 I'll 2nd the Aberdeen recommendation.  I'll add Pogolinux to that list as
 well.
 
 
 Is there a current controller anyone on this list has experience with that
 they could recommend?
 
 The 2 best performing RAID controller vendors at this time are AMCC (AKA
 3Ware) and Areca.
 In general, the 8+ port Areca's with their BB cache maxed outperform every
 other controller available.

I personally have had rather bad performance experiences with 3Ware
9550/9650 SATA cards.  I have no experience with the AMCC SAS stuff though.
Adaptec demolished the 9650 on arrays larger than 4 drives, and Areca will
do better at the very high end.

However, if CPU is the issue for this particular case, then the RAID
controller details are less significant.

I don't know how much data you have, but don't forget the option of SSDs, or
a mix of hard drives and SSDs for different data.  Ideally, you would want
the OS to just extend its pagecache onto a SSD, but only OpenSolaris can do
that right now and it is rather new (needs to be persistent across reboots).

http://blogs.sun.com/brendan/entry/test
http://blogs.sun.com/brendan/entry/l2arc_screenshots
 

 
 
 This will be a bigger purchase so will be doing research and benchmarking
 but any general pointers to a vendor/controller greatly appreciated.
 
 Be =very= careful to thoroughly bench both the AMD and Intel CPU options.
  It is far from clear which is the better purchase.
 
 My anecdotal experience has been that the Opterons stay afloat longer
 as load increases, but I haven't had machines with similar enough
 hardware to really test that.
 

One may want to note that Intel's next generation servers are due out within
45 days from what I can sense ('Q2' traditionally means ~April 1 for Intel
when on time).  These should be a rather significant bump for a database as
they adopt the AMD / Alpha style memory-controller-on-CPU architecture and
add a lot of cache.  Other relevant improvements:  increased performance on
compare-and-swap operations, the return of hyper threading, and ridiculous
memory bandwidth per CPU (3 DDR3 memory channels per CPU).

 I'd be very interested to see the results of your research and benchmarks
 posted here on pgsql-performance.
 
 Me too.  I'm gonna spend some time this summer benchmarking and tuning
 the database servers that I pretty much had to burn in and put in
 production this year due to time pressures.
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-20 Thread Joe Uhl

On Mar 17, 2009, at 12:19 AM, Greg Smith wrote:


On Tue, 17 Mar 2009, Gregory Stark wrote:

Hm, well the tests I ran for posix_fadvise were actually on a Perc5  
-- though
who knows if it was the same under the hood -- and I saw better  
performance
than this. I saw about 4MB/s for a single drive and up to about  
35MB/s for 15

drives. However this was using linux md raid-0, not hardware raid.


Right, it's the hardware RAID on the Perc5 I think people mainly  
complain about.  If you use it in JBOD mode and let the higher  
performance CPU in your main system drive the RAID functions it's  
not so bad.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com  
Baltimore, MD


I have not yet had a chance to try software raid on the standby server  
(still planning to) but wanted to follow up to see if there was any  
good way to figure out what the postgresql processes are spending  
their CPU time on.


We are under peak load right now, and I have Zabbix plotting CPU  
utilization and CPU wait (from vmstat output) along with all sorts of  
other vitals on charts.  CPU utilization is a sustained 90% - 95% and  
CPU Wait is hanging below 10%.  Since being pointed at vmstat by this  
list I have been watching CPU Wait and it does get high at times  
(hence still wanting to try Perc5 in JBOD) but then there are  
sustained periods, right now included, where our CPUs are just getting  
crushed while wait and IO (only doing about 1.5 MB/sec right now) are  
very low.


This high CPU utilization only occurs when under peak load and when  
our JDBC pools are fully loaded.  We are moving more things into our  
cache and constantly tuning indexes/tables but just want to see if  
there is some underlying cause that is killing us.


Any recommendations for figuring out what our database is spending its  
CPU time on?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-20 Thread Scott Marlowe
On Fri, Mar 20, 2009 at 2:26 PM, Joe Uhl joe...@gmail.com wrote:
 On Mar 17, 2009, at 12:19 AM, Greg Smith wrote:

 On Tue, 17 Mar 2009, Gregory Stark wrote:

 Hm, well the tests I ran for posix_fadvise were actually on a Perc5 --
 though
 who knows if it was the same under the hood -- and I saw better
 performance
 than this. I saw about 4MB/s for a single drive and up to about 35MB/s
 for 15
 drives. However this was using linux md raid-0, not hardware raid.

 Right, it's the hardware RAID on the Perc5 I think people mainly complain
 about.  If you use it in JBOD mode and let the higher performance CPU in
 your main system drive the RAID functions it's not so bad.

 --
 * Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

 I have not yet had a chance to try software raid on the standby server
 (still planning to) but wanted to follow up to see if there was any good way
 to figure out what the postgresql processes are spending their CPU time on.

 We are under peak load right now, and I have Zabbix plotting CPU utilization
 and CPU wait (from vmstat output) along with all sorts of other vitals on
 charts.  CPU utilization is a sustained 90% - 95% and CPU Wait is hanging
 below 10%.  Since being pointed at vmstat by this list I have been watching
 CPU Wait and it does get high at times (hence still wanting to try Perc5 in
 JBOD) but then there are sustained periods, right now included, where our
 CPUs are just getting crushed while wait and IO (only doing about 1.5 MB/sec
 right now) are very low.

 This high CPU utilization only occurs when under peak load and when our JDBC
 pools are fully loaded.  We are moving more things into our cache and
 constantly tuning indexes/tables but just want to see if there is some
 underlying cause that is killing us.

 Any recommendations for figuring out what our database is spending its CPU
 time on?

What does the cs entry on vmstat say at this time?  If you're cs is
skyrocketing then you're getting a context switch storm, which is
usually a sign that there are just too many things going on at once /
you've got an old kernel things like that.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-20 Thread Joe Uhl


On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote:


On Fri, Mar 20, 2009 at 2:26 PM, Joe Uhl joe...@gmail.com wrote:

On Mar 17, 2009, at 12:19 AM, Greg Smith wrote:


On Tue, 17 Mar 2009, Gregory Stark wrote:

Hm, well the tests I ran for posix_fadvise were actually on a  
Perc5 --

though
who knows if it was the same under the hood -- and I saw better
performance
than this. I saw about 4MB/s for a single drive and up to about  
35MB/s

for 15
drives. However this was using linux md raid-0, not hardware raid.


Right, it's the hardware RAID on the Perc5 I think people mainly  
complain
about.  If you use it in JBOD mode and let the higher performance  
CPU in

your main system drive the RAID functions it's not so bad.

--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com  
Baltimore, MD


I have not yet had a chance to try software raid on the standby  
server
(still planning to) but wanted to follow up to see if there was any  
good way
to figure out what the postgresql processes are spending their CPU  
time on.


We are under peak load right now, and I have Zabbix plotting CPU  
utilization
and CPU wait (from vmstat output) along with all sorts of other  
vitals on
charts.  CPU utilization is a sustained 90% - 95% and CPU Wait is  
hanging
below 10%.  Since being pointed at vmstat by this list I have been  
watching
CPU Wait and it does get high at times (hence still wanting to try  
Perc5 in
JBOD) but then there are sustained periods, right now included,  
where our
CPUs are just getting crushed while wait and IO (only doing about  
1.5 MB/sec

right now) are very low.

This high CPU utilization only occurs when under peak load and when  
our JDBC

pools are fully loaded.  We are moving more things into our cache and
constantly tuning indexes/tables but just want to see if there is  
some

underlying cause that is killing us.

Any recommendations for figuring out what our database is spending  
its CPU

time on?


What does the cs entry on vmstat say at this time?  If you're cs is
skyrocketing then you're getting a context switch storm, which is
usually a sign that there are just too many things going on at once /
you've got an old kernel things like that.


cs column (plus cpu columns) of vmtstat 1 30 reads as follows:

csus  sy id wa
11172 95  4  1  0
12498 94  5  1  0
14121 91  7  1  1
11310 90  7  1  1
12918 92  6  1  1
10613 93  6  1  1
9382  94  4  1  1
14023 89  8  2  1
10138 92  6  1  1
11932 94  4  1  1
15948 93  5  2  1
12919 92  5  3  1
10879 93  4  2  1
14014 94  5  1  1
9083  92  6  2  0
11178 94  4  2  0
10717 94  5  1  0
9279  97  2  1  0
12673 94  5  1  0
8058  82 17  1  1
8150  94  5  1  1
11334 93  6  0  0
13884 91  8  1  0
10159 92  7  0  0
9382  96  4  0  0
11450 95  4  1  0
11947 96  3  1  0
8616  95  4  1  0
10717 95  3  1  0

We are running on 2.6.28.7-2 kernel.  I am unfamiliar with vmstat  
output but reading the man page (and that cs = context switches per  
second) makes my numbers seem very high.


Our sum JDBC pools currently top out at 400 connections (and we are  
doing work on all 400 right now).  I may try dropping those pools down  
even smaller. Are there any general rules of thumb for figuring out  
how many connections you should service at maximum?  I know of the  
memory constraints, but thinking more along the lines of connections  
per CPU core.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-20 Thread Scott Marlowe
On Fri, Mar 20, 2009 at 2:49 PM, Joe Uhl joe...@gmail.com wrote:

 On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote:

 What does the cs entry on vmstat say at this time?  If you're cs is
 skyrocketing then you're getting a context switch storm, which is
 usually a sign that there are just too many things going on at once /
 you've got an old kernel things like that.

 cs column (plus cpu columns) of vmtstat 1 30 reads as follows:

 cs    us  sy id wa
 11172 95  4  1  0
 12498 94  5  1  0
 14121 91  7  1  1
 11310 90  7  1  1
 12918 92  6  1  1
 10613 93  6  1  1
 9382  94  4  1  1
 14023 89  8  2  1
 10138 92  6  1  1
 11932 94  4  1  1
 15948 93  5  2  1
 12919 92  5  3  1
 10879 93  4  2  1
 14014 94  5  1  1
 9083  92  6  2  0
 11178 94  4  2  0
 10717 94  5  1  0
 9279  97  2  1  0
 12673 94  5  1  0
 8058  82 17  1  1
 8150  94  5  1  1
 11334 93  6  0  0
 13884 91  8  1  0
 10159 92  7  0  0
 9382  96  4  0  0
 11450 95  4  1  0
 11947 96  3  1  0
 8616  95  4  1  0
 10717 95  3  1  0

 We are running on 2.6.28.7-2 kernel.  I am unfamiliar with vmstat output but
 reading the man page (and that cs = context switches per second) makes my
 numbers seem very high.

No, those aren't really all that high.  If you were hitting cs
contention, I'd expect it to be in the 25k to 100k range.  10k
average under load is pretty reasonable.

 Our sum JDBC pools currently top out at 400 connections (and we are doing
 work on all 400 right now).  I may try dropping those pools down even
 smaller. Are there any general rules of thumb for figuring out how many
 connections you should service at maximum?  I know of the memory
 constraints, but thinking more along the lines of connections per CPU core.

Well, maximum efficiency is usually somewhere in the range of 1 to 2
times the number of cores you have, so trying to get the pool down to
a dozen or two connections would be the direction to generally head.
May not be reasonable or doable though.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-20 Thread Joe Uhl


On Mar 20, 2009, at 4:58 PM, Scott Marlowe wrote:


On Fri, Mar 20, 2009 at 2:49 PM, Joe Uhl joe...@gmail.com wrote:


On Mar 20, 2009, at 4:29 PM, Scott Marlowe wrote:



What does the cs entry on vmstat say at this time?  If you're cs is
skyrocketing then you're getting a context switch storm, which is
usually a sign that there are just too many things going on at  
once /

you've got an old kernel things like that.


cs column (plus cpu columns) of vmtstat 1 30 reads as follows:

csus  sy id wa
11172 95  4  1  0
12498 94  5  1  0
14121 91  7  1  1
11310 90  7  1  1
12918 92  6  1  1
10613 93  6  1  1
9382  94  4  1  1
14023 89  8  2  1
10138 92  6  1  1
11932 94  4  1  1
15948 93  5  2  1
12919 92  5  3  1
10879 93  4  2  1
14014 94  5  1  1
9083  92  6  2  0
11178 94  4  2  0
10717 94  5  1  0
9279  97  2  1  0
12673 94  5  1  0
8058  82 17  1  1
8150  94  5  1  1
11334 93  6  0  0
13884 91  8  1  0
10159 92  7  0  0
9382  96  4  0  0
11450 95  4  1  0
11947 96  3  1  0
8616  95  4  1  0
10717 95  3  1  0

We are running on 2.6.28.7-2 kernel.  I am unfamiliar with vmstat  
output but
reading the man page (and that cs = context switches per second)  
makes my

numbers seem very high.


No, those aren't really all that high.  If you were hitting cs
contention, I'd expect it to be in the 25k to 100k range.  10k
average under load is pretty reasonable.

Our sum JDBC pools currently top out at 400 connections (and we are  
doing

work on all 400 right now).  I may try dropping those pools down even
smaller. Are there any general rules of thumb for figuring out how  
many

connections you should service at maximum?  I know of the memory
constraints, but thinking more along the lines of connections per  
CPU core.


Well, maximum efficiency is usually somewhere in the range of 1 to 2
times the number of cores you have, so trying to get the pool down to
a dozen or two connections would be the direction to generally head.
May not be reasonable or doable though.


Thanks for the info.  Figure I can tune our pools down and monitor  
throughput/CPU/IO and look for a sweet spot with our existing  
hardware.  Just wanted to see if tuning connections down could  
potentially help.


I feel as though we are going to have to replicate this DB before too  
long.  We've got an almost identical server doing nothing but PITR  
with 8 CPU cores mostly idle that could be better spent.  Our pgfouine  
reports, though only logging queries that take over 1 second, show  
90%  reads.


I have heard much about Slony, but has anyone used the newer version  
of Mammoth Replicator (or looks to be called PostgreSQL + Replication  
now) on 8.3?  From the documentation, it appears to be easier to set  
up and less invasive but I struggle to find usage information/stories  
online.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] High CPU Utilization

2009-03-16 Thread Joe Uhl
Our production database is seeing very heavy CPU utilization - anyone  
have any ideas/input considering the following?


CPU utilization gradually increases during the day until it approaches  
90%-100% at our peak time.  When this happens our transactions/sec  
drops and our site becomes very slow.  When in this state, I can see  
hundreds of queries in pg_stat_activity that are not waiting on locks  
but sit there for minutes.  When the database is not in this state,  
those same queries can complete in fractions of a second - faster that  
my script that watches pg_stat_activity can keep track of them.


This server has dual quad core xeon 5310s, 32 GB RAM, and a few  
different disk arrays (all managed in hardware by either the Perc5/i  
or Perc5/e adapter).  The Postgres data is on a 14 disk 7.2k SATA raid  
10.  This server runs nothing but Postgres.


The PostgreSQL database (according to pg_database_size) is 55GB and we  
are running PostgreSQL 8.3.5 and the 2.6.28.7-2 kernel under Arch Linux.


Right now (not under peak load) this server is running at 68% CPU  
utilization and its SATA raid 10 is doing about 2MB/s writes and 11MB/ 
s reads.  When I run dd I can hit 200+MB/s writes and 230+ MB/s reads,  
so we are barely using the available IO.  Further when I run dd the  
CPU utilization of that process only approaches 20%-30% of one core.


Additionally, when I view top -c I generally see a dozen or so  
idle postgres processes (they appear and drop away quickly though)  
consuming very large chunks of CPU (as much as 60% of a core).  At any  
given time we have hundreds of idle postgres processes due to the JDBC  
connection pooling but most of them are 0% as I would expect them to  
be.  I also see selects and inserts consuming very large percentages  
of CPU but I view that as less strange since they are doing work.


Any ideas as to what is causing our CPUs to struggle?  Is the fact  
that our RAM covers a significant portion of the database causing our  
CPUs to do a bunch of thrashing as they work with memory while our  
disk controllers sit idle?  According to top we barely use any swap.


We currently have max_connections set to 1000 (roughly the sum of the  
JDBC pools on our application servers).  Would decreasing this value  
help?  We can decrease the JDBC pools or switch to pgbouncer for  
pooling if this is the case.


Really just looking for any input/ideas.  Our workload is primarily  
OLTP in nature - essentially a social network.  By transactions/sec at  
the start I am using the xact_commit value in pg_stat_database.   
Please let me know if this value is not appropriate for getting a tps  
guess.  Right now with the 60% CPU utilization and low IO use  
xact_commit is increasing at a rate of 1070 a second.


I have an identical PITR slave I can pause the PITR sync on to run any  
test against.  I will happily provide any additional information that  
would be helpful.


Any assistance is greatly appreciated.

Joe Uhl

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Alan Hodgson
On Monday 16 March 2009, Joe Uhl joe...@gmail.com wrote:
 Right now (not under peak load) this server is running at 68% CPU
 utilization and its SATA raid 10 is doing about 2MB/s writes and 11MB/
 s reads.  When I run dd I can hit 200+MB/s writes and 230+ MB/s reads,
 so we are barely using the available IO.  Further when I run dd the
 CPU utilization of that process only approaches 20%-30% of one core.

What does vmstat say when it's slow? The output of vmstat 1 30 would be 
informative. 

note: dd is sequential I/O. Normal database usage is random I/O. 

-- 
Even a sixth-grader can figure out that you can’t borrow money to pay off 
your debt

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Joe Uhl
Here is vmstat 1 30.  We are under peak load right now so I can gather  
information from the real deal :)


Had an almost complete lockup a moment ago, number of non-idle  
postgres connections was 637.  Going to drop our JDBC pool sizes a bit  
and bounce everything.


procs ---memory-- ---swap-- -io -system--  
cpu
r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy  
id wa
12 35  95056 11102380  56856 1495494834   669   54112  
23  3 54 19
12 39  95056 11092484  56876 1496320400  6740  1204 10066  
13277 91  5  0  4
8 42  95056 11081712  56888 1497224400  8620  1168 10659 17020  
78  6  0 15
10 30  95052 11069768  56904 1498262800  8944   976 9809 15109  
81  6  1 12
4 27  95048 11059576  56916 1499129600  8852   440 7652 13294  
63  4  2 32
5 42  95048 11052524  56932 1499649600  4700   384 6383 11249  
64  4  4 28
5 33  95048 11047492  56956 1500142800  3852   572 6029 14010  
36  4  5 56
7 35  95048 11041184  56960 1500548000  3964   136 5042 10802  
40  3  1 56
1 33  95048 11037988  56968 1500924000  3892   168 3384 6479  
26  1  3 69
3 28  95048 11029332  56980 1501574400  6724   152 4964 12844  
11  2  8 79
0 34  95048 11025880  56988 1502016800  3852   160 3616 8614  
11  1  6 82
3 25  95048 10996356  57044 1504479600  7892   456 3126 7115   
4  3  8 85
1 26  95048 10991692  57052 1505010000  5188   176 2566 5976   
3  2 12 83
0 29  95048 10985408  57060 1505496800  420080 2586 6582   
4  1 12 83
1 29  95048 10980828  57064 1505899200  456064 2966 7557   
7  2  6 85
2 28  95048 10977192  57072 1506317600  386072 2695 6742  
11  1  7 81
2 29  95048 10969120  57088 1506780800  508484 3296 8067  
14  1  0 84
0 25  95048 10962096  57104 1507298400  4440   500 2721 6263  
12  1  6 80
0 23  95044 10955320  57108 1507926000  5712   232 2678 5990   
6  1  6 87
2 25  95044 10948644  57120 1508452400  5120   184 3499 8143  
20  3  9 69
3 21  95044 10939744  57128 1509064400  5756   264 4724 10272  
32  3  5 60
1 19  95040 10933196  57144 15095024   120  4440   180 2585 5244  
13  2 15 70
0 21  95040 10927596  57148 1509868400  3248   136 2973 7292   
8  1  9 81
1 20  95040 10920708  57164 1510424400  5192   360 1865 4547   
3  1  9 87
1 24  95040 10914552  57172 1510585600  230816 1948 4450   
6  1  1 93
0 24  95036 10909148  57176 1511024000  3824   152 1330 2632   
3  1  6 90
1 21  95036 10900628  57192 1511633200  5680   180 1898 3986   
4  1 11 84
0 19  95036 10888356  57200 1512173600  5952   120 2252 3991  
12  1  8 79
2 22  95036 10874336  57204 1512825200  6320   112 2831 6755   
5  2  8 85
3 26  95036 10857592  57220 1513402000  5124   216 3067 5296  
32  6  3 59


Alan, my apologies if you get this twice.  Didn't reply back to the  
list on first try.


On Mar 16, 2009, at 3:52 PM, Alan Hodgson wrote:


On Monday 16 March 2009, Joe Uhl joe...@gmail.com wrote:

Right now (not under peak load) this server is running at 68% CPU
utilization and its SATA raid 10 is doing about 2MB/s writes and  
11MB/
s reads.  When I run dd I can hit 200+MB/s writes and 230+ MB/s  
reads,

so we are barely using the available IO.  Further when I run dd the
CPU utilization of that process only approaches 20%-30% of one core.


What does vmstat say when it's slow? The output of vmstat 1 30  
would be

informative.

note: dd is sequential I/O. Normal database usage is random I/O.

--
Even a sixth-grader can figure out that you can’t borrow money to  
pay off

your debt

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Greg Smith

On Mon, 16 Mar 2009, Joe Uhl wrote:

Here is vmstat 1 30.  We are under peak load right now so I can gather 
information from the real deal


Quite helpful, reformatting a bit and picking an informative section:

procs ---memory-----swap- io--- -system-- cpu
r  b   swpd free   buff  cache   si   so   bibo   in   cs us sy id wa
0 34  95048 11025880  56988 15020168 00  3852   160 3616 8614 11  1  6 82
3 25  95048 10996356  57044 15044796 00  7892   456 3126 7115  4  3  8 85
1 26  95048 10991692  57052 15050100 00  5188   176 2566 5976  3  2 12 83

This says that your server is spending all its time waiting for I/O, 
actual CPU utilization is minimal.  You're only achieving around 3-8MB/s 
of random I/O.  That's the reality of what your disk I/O subsystem is 
capable of, regardless of what its sequential performance with dd looks 
like.  If you were to run a more complicated benchmark like bonnie++ 
instead, I'd bet that your seeks/second results are very low, even 
though sequential read/write is fine.


The Perc5 controllers have a pretty bad reputation for performance on this 
list, even in RAID10.  Not much you can do about that beyond scrapping the 
controller and getting a better one.


What you might do in order to reduce the total number of writes needed is 
some standard postgresql.conf tuning; see 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server


What you could do here is increase shared_buffers, checkpoint_segments, 
and checkpoint_completion_target as described there.  Having more buffers 
dedicated to the database and having less checkpoints can result in less 
random I/O showing up, as popular data pages will stay in RAM for longer 
without getting written out so much.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Joe Uhl
I dropped the pool sizes and brought things back up.  Things are  
stable, site is fast, CPU utilization is still high.  Probably just a  
matter of time before issue comes back (we get slammed as kids get out  
of school in the US).


Now when I run vmtstat 1 30 it looks very different (below).  Waiting  
is minimal, user is very high.  Under nontrivial load, according to  
xact_commit in pg_stat_database we are doing 1800+ tps.


Appreciate the input and explanation on vmstat.  I am going to throw  
some of these numbers into zabbix so I can keep a better eye on them.   
This server is a couple years old so the purchase of a new controller  
and/or disks is not out of the question.


On final note, have made several changes to postgresql.conf.  Some of  
those here:

max_connections = 1000
shared_buffers = 7680MB
work_mem = 30MB
synchronous_commit = off
checkpoint_segments = 50
effective_cache_size = 2MB

procs ---memory-- ---swap-- -io -system--  
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us  
sy id wa
 9  8  73036 500164  82200 2349774834   669   54111  
23  3 54 19
20  4  73036 497252  82200 2350083600  2500   680 11145 15168  
91  4  2  2
21  1  73036 491416  82204 2350383200  1916   920 10303 14032  
94  4  1  1
23  5  73036 489580  82212 2350586000  1348  3296 11682 15970  
94  5  1  0
31  1  73036 481408  82220 2350775200   984  8988 10123 11289  
97  3  0  0
25  4  73036 483248  82232 2350942000  1268  1312 10705 14063  
96  4  0  0
23  4  73036 480096  82232 2351238000  2372   472 9805 13996  
94  5  1  1
24  4  73036 476732  82236 2351519600  2012   720 10365 14307  
96  3  1  0
22  1  73036 474468  82236 2351658400   944  3108 9838 12831  
95  4  1  0
14  1  73036 455756  82284 2353454800   908  3284 9096 11333  
94  4  1  0
10  2  73036 455224  82292 2353630400  1760   416 12454 17736  
89  6  3  2
17  0  73036 460620  82292 235300  1292   968 12030 18333  
90  7  2  1
13  4  73036 459764  82292 2353972400   332   288 9722 14197  
92  5  2  1
17  5  73036 457516  82292 2354217600  1872 17752 10458 15465  
91  5  2  1
19  4  73036 450804  82300 2354564000  2980   640 10602 15621  
90  6  2  2
24  0  73036 447660  82312 2354764400  1736 10724 12401 15413  
93  6  1  0
20  6  73036 444380  82320 2355069200  2064   476 9008 10985  
94  4  1  0
22  2  73036 442880  82328 2355364000  2496  3156 10739 15211  
93  5  1  1
11  1  73036 441448  82328 2355563200  1452  3552 10812 15337  
93  5  2  1
 6  2  73036 439812  82348 2355742000  1052  1128 8603 10514  
91  3  3  2
 6  3  73036 433456  82348 2356086000  2484   656 7636 13033  
68  4 14 14
 6  3  73036 433084  82348 2356262800  1400   408 6046 11778  
70  3 18  9
 5  0  73036 430776  82356 2356426400  1108  1300 7549 13754  
73  4 16  7
 5  2  73036 430124  82360 2356558000  1016  2216 7844 14507  
72  4 18  6
 4  2  73036 429652  82380 2356748000  1168  2468 7694 15466  
58  4 24 14
 6  2  73036 427304  82384 2356966800  1132   752 5993 13606  
49  5 36 10
 7  1  73036 423020  82384 2357193200  1244   824 8085 18072  
56  3 30 10
procs ---memory-- ---swap-- -io -system--  
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us  
sy id wa
 4  0  73036 420816  82392 2357382400  1292   820 5370 10958  
46  2 41 10
 9  1  73020 418048  82392 23576900   520  1632  2592 5931 11629  
60  3 29  8
 4  2  73004 415164  82424 23578620   560  1812  4116 7503 14674  
71  3 15 12


On Mar 16, 2009, at 4:19 PM, Dave Youatt wrote:
Last column wa is % cpu time spent waiting (for IO to complete).   
80s

and 90s is pretty high, probably too high.

Might also want to measure the IO/s performance of your RAID
controller.  From the descriptions, it will be much more important  
that

long sequential reads/writes for characterizing your workload.

There are also some disappointing HW RAID controllers out there.
Generally, Aretec and Promise are good, Adaptec good, depending on
model, and the ones that Dell ship w/their servers haven't had good
reviews/reports.


On 03/16/2009 01:10 PM, Joe Uhl wrote:
Here is vmstat 1 30.  We are under peak load right now so I can  
gather

information from the real deal :)

Had an almost complete lockup a moment ago, number of non-idle
postgres connections was 637.  Going to drop our JDBC pool sizes a  
bit

and bounce everything.

procs ---memory-- ---swap-- -io -system--
cpu
r  b   swpd   free   buff  cache   si   sobibo   in   cs us  
sy

id wa
12 35  95056 11102380  56856 1495494834   669   54112
23  3 54 19
12 39  95056 11092484  56876 1496320400  6740  1204 10066
13277 91  5  0  4
8 42  95056 11081712  56888 

Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Scott Marlowe
On Mon, Mar 16, 2009 at 2:50 PM, Joe Uhl joe...@gmail.com wrote:
 I dropped the pool sizes and brought things back up.  Things are stable,
 site is fast, CPU utilization is still high.  Probably just a matter of time
 before issue comes back (we get slammed as kids get out of school in the
 US).

Yeah, I'm guessing your server (or more specifically its RAID card)
just aren't up to the task.  We had the same problem last year with a
machine with 16 Gig ram and dual dual core 3.0GHz xeons with a Perc 5
something or other.  No matter how we tuned it or played with it, we
just couldn't get good random performance out of it.  It's since been
replaced by a white box unit with a tyan mobo and dual 4 core opterons
and an Areca 1680 and a 12 drive RAID-10. We can sustain 30 to 60 Megs
a second random access with 0 to 10% iowait.

Here's a typical vmstat 10 output when our load factor is hovering around 8...
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 4  1460 170812  92856 2992815600   604  3986 4863 10146
74  3 20  3  0
 7  1460 124160  92912 2993966000   812  5701 4829 9733 70
 3 23  3  0
13  0460 211036  92984 2994763600   589  3178 4429 9964 69
 3 25  3  0
 7  2460  90968  93068 2996336800  1067  4463 4915 11081
78  3 14  5  0
 7  3460 115216  93100 2996333600  3008  3197 4032 11812
69  4 15 12  0
 6  1460 142120  93088 2992373600  1112  6390 4991 11023
75  4 15  6  0
 6  0460 157896  93208 2993257600   698  2196 4151 8877 71
 2 23  3  0
11  0460 124868  93296 2994882400   963  3645 4891 10382
74  3 19  4  0
 5  3460  95960  93272 2991806400   592 30055 5550 7430 56
 3 18 23  0
 9  0460  95408  93196 2991455600  1090  3522 4463 10421
71  3 21  5  0
 9  0460 128632  93176 2991641200   883  4774 4757 10378
76  4 17  3  0

Note the bursty parts where we're shoving out 30Megs a second and the
wait jumps to 23%.  That's about as bad as it gets during the day for
us.  NBote that in your graph your bi column appears to be dominating
your bo column, so it looks like you're reaching a point where the
write cache on the controller gets full and you're real throughput is
shown to be ~ 1 megabyte a second outbound, and the inbound traffic
either has priority or is just filling in the gaps.  It looks to me
like your RAID card is prioritizing reads over writes, and the whole
system is just slowing to a crawl.  I'm willing to bet that if you
were running pure SW RAID with no RAID controller you'd get better
numbers.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Gregory Stark
Greg Smith gsm...@gregsmith.com writes:

 On Mon, 16 Mar 2009, Joe Uhl wrote:

 Here is vmstat 1 30.  We are under peak load right now so I can gather
 information from the real deal

 Quite helpful, reformatting a bit and picking an informative section:

 procs ---memory-----swap- io--- -system-- cpu
 r  b   swpd free   buff  cache   si   so   bibo   in   cs us sy id wa
 0 34  95048 11025880  56988 15020168 00  3852   160 3616 8614 11  1  6 82
 3 25  95048 10996356  57044 15044796 00  7892   456 3126 7115  4  3  8 85
 1 26  95048 10991692  57052 15050100 00  5188   176 2566 5976  3  2 12 83

 This says that your server is spending all its time waiting for I/O, actual 
 CPU
 utilization is minimal.  You're only achieving around 3-8MB/s of random I/O.
 That's the reality of what your disk I/O subsystem is capable of, regardless 
 of
 what its sequential performance with dd looks like.  If you were to run a more
 complicated benchmark like bonnie++ instead, I'd bet that your seeks/second
 results are very low, even though sequential read/write is fine.

 The Perc5 controllers have a pretty bad reputation for performance on this
 list, even in RAID10.  Not much you can do about that beyond scrapping the
 controller and getting a better one.

Hm, well the tests I ran for posix_fadvise were actually on a Perc5 -- though
who knows if it was the same under the hood -- and I saw better performance
than this. I saw about 4MB/s for a single drive and up to about 35MB/s for 15
drives. However this was using linux md raid-0, not hardware raid.

But you shouldn't get your hopes up too much for random i/o. 3-8MB seems low
but consider the following:

 $ units
 2445 units, 71 prefixes, 33 nonlinear units

 You have: 8kB / .5|7200min
 You want: MB/s
 * 1.92
 / 0.5208

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Greg Smith

On Mon, 16 Mar 2009, Joe Uhl wrote:


Now when I run vmtstat 1 30 it looks very different (below).


That looks much better.  Obviously you'd like some more headroom on the 
CPU situation than you're seeing, but that's way better than having so 
much time spent waiting for I/O.



max_connections = 1000
work_mem = 30MB


Be warned that you need to be careful with this combination.  If all 1000 
connections were to sort something at once, you could end up with 30GB 
worth of RAM used for that purpose.  It's probably quite unlikely that 
will happen, but 30MB is on the high side with that many connections.


I wonder if your pool might work better, in terms of lowering total CPU 
usage, if you reduced the number of incoming connections.  Each connection 
adds some overhead and now that you've got the I/O situation under better 
control you might get by with less simultaneous ones.  Something to 
consider.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Greg Smith

On Tue, 17 Mar 2009, Gregory Stark wrote:


Hm, well the tests I ran for posix_fadvise were actually on a Perc5 -- though
who knows if it was the same under the hood -- and I saw better performance
than this. I saw about 4MB/s for a single drive and up to about 35MB/s for 15
drives. However this was using linux md raid-0, not hardware raid.


Right, it's the hardware RAID on the Perc5 I think people mainly complain 
about.  If you use it in JBOD mode and let the higher performance CPU in 
your main system drive the RAID functions it's not so bad.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance