Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-06-01 Thread Kevin Grittner
Claudio Freire  wrote:
 Stephen Frost  wrote:
 Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote:
 
 we are actually also running out db max connections (also)
 ( which is currently at 600) , when that happens something at
 the beginning of the application stack also gets dysfunctional
 and it changes the very input to the system. ( think of negative
 feedback systems )

 Oh. Yeah, have you considered pgbouncer?
 
 Or pooling at the application level. Many ORMs support connection
 pooling and limiting out-of-the-box.
 
 In essence, postgres should never bounce connections, it should all
 be handled by the application or a previous pgbouncer, both of
 which would do it more efficient and effectively.
 
Stephen and Claudio have, I think, pointed you in the right
direction.  For more detail on why, see this Wiki page:
 
http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Rajesh Kumar. Mallah
| 
| Load avg is the number of processes in the running queue, which can
| be either waiting to be run or actually running.
| 
| So if you had 100% CPU usage, then you'd most definitely have a load
| avg of 64, which is neither good or bad. It may simply mean that
| you're using your hardware's full potential.


Dear Claudio ,

Thanks for the reply  and clarifying on the actually running part.

below is a snapshot of the top output while the system was loaded.

top - 12:15:13 up 101 days, 19:01,  1 user,  load average: 23.50, 18.89, 21.74
Tasks: 650 total,  11 running, 639 sleeping,   0 stopped,   0 zombie
Cpu(s): 26.5%us,  5.7%sy,  0.0%ni, 67.2%id,  0.0%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:  131971752k total, 122933996k used,  9037756k free,   251544k buffers
Swap: 33559780k total,   251916k used, 33307864k free, 116356252k cached

Our applications does slowdown when loads are at that level. Can you please
tell what else can be metered?


| 
| If your processes are waiting but not using CPU or I/O time... all I
| can think of is mcelog (it's the only application I've ever witnessed
| doing that). Do check ps/top and try to find out which processes are
| in a waiting state to have a little more insight.


I will read more on the processes status and try to keep a close
eye over it. I shall be responding after a few hours on it.

regds
mallah.

| 
| -- 
| Sent via pgsql-performance mailing list
| (pgsql-performance@postgresql.org)
| To make changes to your subscription:
| http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Andy Colson

On 05/24/2012 12:26 AM, Rajesh Kumar. Mallah wrote:


- Claudio Freireklaussfre...@gmail.com  wrote:

| From: Claudio Freireklaussfre...@gmail.com
| To: Rajesh Kumar. Mallahmal...@tradeindia.com
| Cc: pgsql-performance@postgresql.org
| Sent: Thursday, May 24, 2012 9:23:43 AM
| Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and 
CPU is idle
|
| On Thu, May 24, 2012 at 12:39 AM, Rajesh Kumar. Mallah
|mal...@tradeindia.com  wrote:
|  The problem is that  sometimes there are spikes of load avg which
|  jumps to  50 very rapidly ( ie from 0.5 to 50  within 10 secs) and
|  it remains there for sometime and slowly reduces to normal value.
|
|  During such times of high load average we observe that there is no
| IO wait
|  in system and even CPU is 50% idle. In any case the IO Wait always
| remains  1.0 % and
|  is mostly 0. Hence the load is not due to high I/O wait which was
| generally
|  the case with our previous hardware.
|
| Do you experience decreased query performance?


Yes we do experience substantial application performance degradations.




Maybe you are hitting some locks?   If its not IO and not CPU then maybe 
something is getting locked and queries are piling up.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Rajesh Kumar. Mallah

Dear Andy ,

Following the discussion on load average we are now  investigating on some 
other parts of the stack (other than db). 

Essentially we are bumping up the limits (on appserver) so that more requests 
goes to the DB server.


| 
| Maybe you are hitting some locks?   If its not IO and not CPU then
| maybe something is getting locked and queries are piling up.




| 
| -Andy

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Steve Crawford

On 05/24/2012 05:58 AM, Rajesh Kumar. Mallah wrote:

Dear Andy ,

Following the discussion on load average we are now  investigating on some
other parts of the stack (other than db).

Essentially we are bumping up the limits (on appserver) so that more requests
goes to the DB server.

Which leads to the question: what, other than the db, runs on this machine?

Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Stephen Frost
Rajesh,

* Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote:
 We are  puzzled why the CPU and DISK I/O system are not being utilized 
 fully and would seek lists' wisdom on that.

What OS is this?  What kernel version?

 just a thought, will it be a good idea to partition the host hardware 
 to 4 equal  virtual environments , ie 1 for master (r/w) and 3 slaves r/o
 and distribute the r/o load on the 3 slaves ?

Actually, it might help with 9.1, if you're really running into some
scalability issues in our locking area..  You might review this:

http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html

That's a pretty contrived test case, but I suppose it's possible your
case is actually close enough to be getting affected also..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Rajesh Kumar. Mallah
 
| From: Steve Crawford scrawf...@pinpointresearch.com
| To: Rajesh Kumar. Mallah mal...@tradeindia.com
| Cc: Andy Colson a...@squeakycode.net, Claudio Freire 
klaussfre...@gmail.com, pgsql-performance@postgresql.org
| Sent: Thursday, May 24, 2012 9:23:47 PM
| Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and 
CPU is idle
|
| On 05/24/2012 05:58 AM, Rajesh Kumar. Mallah wrote:
|  Dear Andy ,
| 
|  Following the discussion on load average we are now  investigating
| on some
|  other parts of the stack (other than db).
| 
|  Essentially we are bumping up the limits (on appserver) so that more
| requests
|  goes to the DB server.
| Which leads to the question: what, other than the db, runs on this
| machine?

No nothing else runs on *this* machine. 
We are lucky to have such a beefy hardware dedicated to postgres :)
We have a separate machine for application server that has 2 tiers.
I am trying to reach to the point to max out the db machine , for that
to happen we need to work on the other parts.

regds
mallah.


| 
| Cheers,
| Steve

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Rajesh Kumar. Mallah
- Stephen Frost sfr...@snowman.net wrote:

| From: Stephen Frost sfr...@snowman.net
| To: Rajesh Kumar. Mallah mal...@tradeindia.com
| Cc: pgsql-performance@postgresql.org
| Sent: Thursday, May 24, 2012 9:27:37 PM
| Subject: Re: [PERFORM] High load average in 64-core server ,  no I/O wait and 
CPU is idle
|
| Rajesh,
| 
| * Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote:
|  We are  puzzled why the CPU and DISK I/O system are not being
| utilized 
|  fully and would seek lists' wisdom on that.
| 
| What OS is this?  What kernel version?

Dear Frost ,

We are running linux with kernel 3.2.X 
(which has the lseek improvements)

| 
|  just a thought, will it be a good idea to partition the host
| hardware 
|  to 4 equal  virtual environments , ie 1 for master (r/w) and 3
| slaves r/o
|  and distribute the r/o load on the 3 slaves ?
| 
| Actually, it might help with 9.1, if you're really running into some
| scalability issues in our locking area..  You might review this:
| 
| http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html
| 
| That's a pretty contrived test case, but I suppose it's possible your
| case is actually close enough to be getting affected also..

Thanks for the reference , even i thought so (LockManager) ,
but we are actually also running out db max connections (also) 
( which is currently at 600) , when that happens  something at 
the beginning of the application stack also gets dysfunctional and it 
changes the very input to the system. ( think of negative feedback systems ) 

It is sort of complicated but i will definitely update list , 
when i get to  the point of putting the blame on DB  :-) .

Regds
Mallah.

| 
|   Thanks,
| 
|   Stephen

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Stephen Frost
* Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote:
 We are running linux with kernel 3.2.X 
 (which has the lseek improvements)

Ah, good.

 Thanks for the reference , even i thought so (LockManager) ,
 but we are actually also running out db max connections (also) 
 ( which is currently at 600) , when that happens  something at 
 the beginning of the application stack also gets dysfunctional and it 
 changes the very input to the system. ( think of negative feedback systems ) 

Oh.  Yeah, have you considered pgbouncer?

 It is sort of complicated but i will definitely update list , 
 when i get to  the point of putting the blame on DB  :-) .

Ok. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Claudio Freire
On Thu, May 24, 2012 at 2:09 PM, Stephen Frost sfr...@snowman.net wrote:
 * Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote:
 We are running linux with kernel 3.2.X
 (which has the lseek improvements)

 Ah, good.

 Thanks for the reference , even i thought so (LockManager) ,
 but we are actually also running out db max connections (also)
 ( which is currently at 600) , when that happens  something at
 the beginning of the application stack also gets dysfunctional and it
 changes the very input to the system. ( think of negative feedback systems )

 Oh.  Yeah, have you considered pgbouncer?

Or pooling at the application level. Many ORMs support connection
pooling and limiting out-of-the-box.

In essence, postgres should never bounce connections, it should all be
handled by the application or a previous pgbouncer, both of which
would do it more efficient and effectively.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-23 Thread Rajesh Kumar. Mallah

Dear List ,

We are having scalability issues with a high end hardware

The  hardware is
CPU  = 4 *  opteron 6272 with 16 cores ie Total = 64 cores. 
RAM  = 128 GB DDR3
Disk = High performance RAID10 with lots of 15K spindles and a working BBU 
Cache.

normally the 1 min load average of the system remains between 0.5 to 1.0 .

The problem is that  sometimes there are spikes of load avg which 
jumps to  50 very rapidly ( ie from 0.5 to 50  within 10 secs) and 
it remains there for sometime and slowly reduces to normal value.

During such times of high load average we observe that there is no IO wait 
in system and even CPU is 50% idle. In any case the IO Wait always remains  
1.0 % and 
is mostly 0. Hence the load is not due to high I/O wait which was generally
the case with our previous hardware.
 
We are  puzzled why the CPU and DISK I/O system are not being utilized 
fully and would seek lists' wisdom on that.

We have setup sar to poll the system parameters every minute and 
the data of which is graphed with cacti. If required any of the 
system parameters or postgresql parameter can easily be  put under 
cacti monitoring and can be graphed.

The query load is mostly read only.
 
It is also possible to replicate the problem with pg_bench to some
extent . I choose -s = 100 and -t=1 , the load does shoot but not
that spectacularly as achieved by the real world usage.

any help shall be greatly appreciated.

just a thought, will it be a good idea to partition the host hardware 
to 4 equal  virtual environments , ie 1 for master (r/w) and 3 slaves r/o
and distribute the r/o load on the 3 slaves ?


regds
mallah

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-23 Thread Claudio Freire
On Thu, May 24, 2012 at 12:39 AM, Rajesh Kumar. Mallah
mal...@tradeindia.com wrote:
 The problem is that  sometimes there are spikes of load avg which
 jumps to  50 very rapidly ( ie from 0.5 to 50  within 10 secs) and
 it remains there for sometime and slowly reduces to normal value.

 During such times of high load average we observe that there is no IO wait
 in system and even CPU is 50% idle. In any case the IO Wait always remains  
 1.0 % and
 is mostly 0. Hence the load is not due to high I/O wait which was generally
 the case with our previous hardware.

Do you experience decreased query performance?

Load can easily get to 64 (1 per core) without reaching its capacity.
So, unless you're experiencing decreased performance I wouldn't think
much of it.

Do you have mcelog running? as a cron or a daemon?
Sometimes, mcelog tends to crash in that way. We had to disable it in
our servers because it misbehaved like that. It only makes load avg
meaningless, no performance impact, but being unable to accurately
measure load is bad enough.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-23 Thread Rajesh Kumar. Mallah

- Claudio Freire klaussfre...@gmail.com wrote:

| From: Claudio Freire klaussfre...@gmail.com
| To: Rajesh Kumar. Mallah mal...@tradeindia.com
| Cc: pgsql-performance@postgresql.org
| Sent: Thursday, May 24, 2012 9:23:43 AM
| Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and 
CPU is idle
|
| On Thu, May 24, 2012 at 12:39 AM, Rajesh Kumar. Mallah
| mal...@tradeindia.com wrote:
|  The problem is that  sometimes there are spikes of load avg which
|  jumps to  50 very rapidly ( ie from 0.5 to 50  within 10 secs) and
|  it remains there for sometime and slowly reduces to normal value.
| 
|  During such times of high load average we observe that there is no
| IO wait
|  in system and even CPU is 50% idle. In any case the IO Wait always
| remains  1.0 % and
|  is mostly 0. Hence the load is not due to high I/O wait which was
| generally
|  the case with our previous hardware.
| 
| Do you experience decreased query performance?


Yes we do experience substantial application performance degradations.


| 
| Load can easily get to 64 (1 per core) without reaching its capacity.
| So, unless you're experiencing decreased performance I wouldn't think
| much of it.

I far as i understand ,
Load Avg is the average number of processes waiting to be run in past 1 , 
5 or 15 mins. A number  1 would mean that countable number of processes
were waiting to be run. how can load of more than 1 and upto 64 be OK
for a 64 core machine ?



| 
| Do you have mcelog running? as a cron or a daemon?


No we do not have mcelog.

BTW the Postgresql version is : 9.1.3 which i forgot to mention 
in my last email.


regds
mallah.

| Sometimes, mcelog tends to crash in that way. We had to disable it in
| our servers because it misbehaved like that. It only makes load avg
| meaningless, no performance impact, but being unable to accurately
| measure load is bad enough.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-23 Thread Claudio Freire
On Thu, May 24, 2012 at 2:26 AM, Rajesh Kumar. Mallah
mal...@tradeindia.com wrote:
 |
 | Load can easily get to 64 (1 per core) without reaching its capacity.
 | So, unless you're experiencing decreased performance I wouldn't think
 | much of it.

 I far as i understand ,
 Load Avg is the average number of processes waiting to be run in past 1 ,
 5 or 15 mins. A number  1 would mean that countable number of processes
 were waiting to be run. how can load of more than 1 and upto 64 be OK
 for a 64 core machine ?

Load avg is the number of processes in the running queue, which can be
either waiting to be run or actually running.

So if you had 100% CPU usage, then you'd most definitely have a load
avg of 64, which is neither good or bad. It may simply mean that
you're using your hardware's full potential.

If your processes are waiting but not using CPU or I/O time... all I
can think of is mcelog (it's the only application I've ever witnessed
doing that). Do check ps/top and try to find out which processes are
in a waiting state to have a little more insight.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance