Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
Claudio Freire wrote: Stephen Frost wrote: Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote: we are actually also running out db max connections (also) ( which is currently at 600) , when that happens something at the beginning of the application stack also gets dysfunctional and it changes the very input to the system. ( think of negative feedback systems ) Oh. Yeah, have you considered pgbouncer? Or pooling at the application level. Many ORMs support connection pooling and limiting out-of-the-box. In essence, postgres should never bounce connections, it should all be handled by the application or a previous pgbouncer, both of which would do it more efficient and effectively. Stephen and Claudio have, I think, pointed you in the right direction. For more detail on why, see this Wiki page: http://wiki.postgresql.org/wiki/Number_Of_Database_Connections -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
| | Load avg is the number of processes in the running queue, which can | be either waiting to be run or actually running. | | So if you had 100% CPU usage, then you'd most definitely have a load | avg of 64, which is neither good or bad. It may simply mean that | you're using your hardware's full potential. Dear Claudio , Thanks for the reply and clarifying on the actually running part. below is a snapshot of the top output while the system was loaded. top - 12:15:13 up 101 days, 19:01, 1 user, load average: 23.50, 18.89, 21.74 Tasks: 650 total, 11 running, 639 sleeping, 0 stopped, 0 zombie Cpu(s): 26.5%us, 5.7%sy, 0.0%ni, 67.2%id, 0.0%wa, 0.0%hi, 0.6%si, 0.0%st Mem: 131971752k total, 122933996k used, 9037756k free, 251544k buffers Swap: 33559780k total, 251916k used, 33307864k free, 116356252k cached Our applications does slowdown when loads are at that level. Can you please tell what else can be metered? | | If your processes are waiting but not using CPU or I/O time... all I | can think of is mcelog (it's the only application I've ever witnessed | doing that). Do check ps/top and try to find out which processes are | in a waiting state to have a little more insight. I will read more on the processes status and try to keep a close eye over it. I shall be responding after a few hours on it. regds mallah. | | -- | Sent via pgsql-performance mailing list | (pgsql-performance@postgresql.org) | To make changes to your subscription: | http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
On 05/24/2012 12:26 AM, Rajesh Kumar. Mallah wrote: - Claudio Freireklaussfre...@gmail.com wrote: | From: Claudio Freireklaussfre...@gmail.com | To: Rajesh Kumar. Mallahmal...@tradeindia.com | Cc: pgsql-performance@postgresql.org | Sent: Thursday, May 24, 2012 9:23:43 AM | Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle | | On Thu, May 24, 2012 at 12:39 AM, Rajesh Kumar. Mallah |mal...@tradeindia.com wrote: | The problem is that sometimes there are spikes of load avg which | jumps to 50 very rapidly ( ie from 0.5 to 50 within 10 secs) and | it remains there for sometime and slowly reduces to normal value. | | During such times of high load average we observe that there is no | IO wait | in system and even CPU is 50% idle. In any case the IO Wait always | remains 1.0 % and | is mostly 0. Hence the load is not due to high I/O wait which was | generally | the case with our previous hardware. | | Do you experience decreased query performance? Yes we do experience substantial application performance degradations. Maybe you are hitting some locks? If its not IO and not CPU then maybe something is getting locked and queries are piling up. -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
Dear Andy , Following the discussion on load average we are now investigating on some other parts of the stack (other than db). Essentially we are bumping up the limits (on appserver) so that more requests goes to the DB server. | | Maybe you are hitting some locks? If its not IO and not CPU then | maybe something is getting locked and queries are piling up. | | -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
On 05/24/2012 05:58 AM, Rajesh Kumar. Mallah wrote: Dear Andy , Following the discussion on load average we are now investigating on some other parts of the stack (other than db). Essentially we are bumping up the limits (on appserver) so that more requests goes to the DB server. Which leads to the question: what, other than the db, runs on this machine? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
Rajesh, * Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote: We are puzzled why the CPU and DISK I/O system are not being utilized fully and would seek lists' wisdom on that. What OS is this? What kernel version? just a thought, will it be a good idea to partition the host hardware to 4 equal virtual environments , ie 1 for master (r/w) and 3 slaves r/o and distribute the r/o load on the 3 slaves ? Actually, it might help with 9.1, if you're really running into some scalability issues in our locking area.. You might review this: http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html That's a pretty contrived test case, but I suppose it's possible your case is actually close enough to be getting affected also.. Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
| From: Steve Crawford scrawf...@pinpointresearch.com | To: Rajesh Kumar. Mallah mal...@tradeindia.com | Cc: Andy Colson a...@squeakycode.net, Claudio Freire klaussfre...@gmail.com, pgsql-performance@postgresql.org | Sent: Thursday, May 24, 2012 9:23:47 PM | Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle | | On 05/24/2012 05:58 AM, Rajesh Kumar. Mallah wrote: | Dear Andy , | | Following the discussion on load average we are now investigating | on some | other parts of the stack (other than db). | | Essentially we are bumping up the limits (on appserver) so that more | requests | goes to the DB server. | Which leads to the question: what, other than the db, runs on this | machine? No nothing else runs on *this* machine. We are lucky to have such a beefy hardware dedicated to postgres :) We have a separate machine for application server that has 2 tiers. I am trying to reach to the point to max out the db machine , for that to happen we need to work on the other parts. regds mallah. | | Cheers, | Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
- Stephen Frost sfr...@snowman.net wrote: | From: Stephen Frost sfr...@snowman.net | To: Rajesh Kumar. Mallah mal...@tradeindia.com | Cc: pgsql-performance@postgresql.org | Sent: Thursday, May 24, 2012 9:27:37 PM | Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle | | Rajesh, | | * Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote: | We are puzzled why the CPU and DISK I/O system are not being | utilized | fully and would seek lists' wisdom on that. | | What OS is this? What kernel version? Dear Frost , We are running linux with kernel 3.2.X (which has the lseek improvements) | | just a thought, will it be a good idea to partition the host | hardware | to 4 equal virtual environments , ie 1 for master (r/w) and 3 | slaves r/o | and distribute the r/o load on the 3 slaves ? | | Actually, it might help with 9.1, if you're really running into some | scalability issues in our locking area.. You might review this: | | http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-about-64.html | | That's a pretty contrived test case, but I suppose it's possible your | case is actually close enough to be getting affected also.. Thanks for the reference , even i thought so (LockManager) , but we are actually also running out db max connections (also) ( which is currently at 600) , when that happens something at the beginning of the application stack also gets dysfunctional and it changes the very input to the system. ( think of negative feedback systems ) It is sort of complicated but i will definitely update list , when i get to the point of putting the blame on DB :-) . Regds Mallah. | | Thanks, | | Stephen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
* Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote: We are running linux with kernel 3.2.X (which has the lseek improvements) Ah, good. Thanks for the reference , even i thought so (LockManager) , but we are actually also running out db max connections (also) ( which is currently at 600) , when that happens something at the beginning of the application stack also gets dysfunctional and it changes the very input to the system. ( think of negative feedback systems ) Oh. Yeah, have you considered pgbouncer? It is sort of complicated but i will definitely update list , when i get to the point of putting the blame on DB :-) . Ok. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
On Thu, May 24, 2012 at 2:09 PM, Stephen Frost sfr...@snowman.net wrote: * Rajesh Kumar. Mallah (mal...@tradeindia.com) wrote: We are running linux with kernel 3.2.X (which has the lseek improvements) Ah, good. Thanks for the reference , even i thought so (LockManager) , but we are actually also running out db max connections (also) ( which is currently at 600) , when that happens something at the beginning of the application stack also gets dysfunctional and it changes the very input to the system. ( think of negative feedback systems ) Oh. Yeah, have you considered pgbouncer? Or pooling at the application level. Many ORMs support connection pooling and limiting out-of-the-box. In essence, postgres should never bounce connections, it should all be handled by the application or a previous pgbouncer, both of which would do it more efficient and effectively. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
Dear List , We are having scalability issues with a high end hardware The hardware is CPU = 4 * opteron 6272 with 16 cores ie Total = 64 cores. RAM = 128 GB DDR3 Disk = High performance RAID10 with lots of 15K spindles and a working BBU Cache. normally the 1 min load average of the system remains between 0.5 to 1.0 . The problem is that sometimes there are spikes of load avg which jumps to 50 very rapidly ( ie from 0.5 to 50 within 10 secs) and it remains there for sometime and slowly reduces to normal value. During such times of high load average we observe that there is no IO wait in system and even CPU is 50% idle. In any case the IO Wait always remains 1.0 % and is mostly 0. Hence the load is not due to high I/O wait which was generally the case with our previous hardware. We are puzzled why the CPU and DISK I/O system are not being utilized fully and would seek lists' wisdom on that. We have setup sar to poll the system parameters every minute and the data of which is graphed with cacti. If required any of the system parameters or postgresql parameter can easily be put under cacti monitoring and can be graphed. The query load is mostly read only. It is also possible to replicate the problem with pg_bench to some extent . I choose -s = 100 and -t=1 , the load does shoot but not that spectacularly as achieved by the real world usage. any help shall be greatly appreciated. just a thought, will it be a good idea to partition the host hardware to 4 equal virtual environments , ie 1 for master (r/w) and 3 slaves r/o and distribute the r/o load on the 3 slaves ? regds mallah -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
On Thu, May 24, 2012 at 12:39 AM, Rajesh Kumar. Mallah mal...@tradeindia.com wrote: The problem is that sometimes there are spikes of load avg which jumps to 50 very rapidly ( ie from 0.5 to 50 within 10 secs) and it remains there for sometime and slowly reduces to normal value. During such times of high load average we observe that there is no IO wait in system and even CPU is 50% idle. In any case the IO Wait always remains 1.0 % and is mostly 0. Hence the load is not due to high I/O wait which was generally the case with our previous hardware. Do you experience decreased query performance? Load can easily get to 64 (1 per core) without reaching its capacity. So, unless you're experiencing decreased performance I wouldn't think much of it. Do you have mcelog running? as a cron or a daemon? Sometimes, mcelog tends to crash in that way. We had to disable it in our servers because it misbehaved like that. It only makes load avg meaningless, no performance impact, but being unable to accurately measure load is bad enough. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
- Claudio Freire klaussfre...@gmail.com wrote: | From: Claudio Freire klaussfre...@gmail.com | To: Rajesh Kumar. Mallah mal...@tradeindia.com | Cc: pgsql-performance@postgresql.org | Sent: Thursday, May 24, 2012 9:23:43 AM | Subject: Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle | | On Thu, May 24, 2012 at 12:39 AM, Rajesh Kumar. Mallah | mal...@tradeindia.com wrote: | The problem is that sometimes there are spikes of load avg which | jumps to 50 very rapidly ( ie from 0.5 to 50 within 10 secs) and | it remains there for sometime and slowly reduces to normal value. | | During such times of high load average we observe that there is no | IO wait | in system and even CPU is 50% idle. In any case the IO Wait always | remains 1.0 % and | is mostly 0. Hence the load is not due to high I/O wait which was | generally | the case with our previous hardware. | | Do you experience decreased query performance? Yes we do experience substantial application performance degradations. | | Load can easily get to 64 (1 per core) without reaching its capacity. | So, unless you're experiencing decreased performance I wouldn't think | much of it. I far as i understand , Load Avg is the average number of processes waiting to be run in past 1 , 5 or 15 mins. A number 1 would mean that countable number of processes were waiting to be run. how can load of more than 1 and upto 64 be OK for a 64 core machine ? | | Do you have mcelog running? as a cron or a daemon? No we do not have mcelog. BTW the Postgresql version is : 9.1.3 which i forgot to mention in my last email. regds mallah. | Sometimes, mcelog tends to crash in that way. We had to disable it in | our servers because it misbehaved like that. It only makes load avg | meaningless, no performance impact, but being unable to accurately | measure load is bad enough. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle
On Thu, May 24, 2012 at 2:26 AM, Rajesh Kumar. Mallah mal...@tradeindia.com wrote: | | Load can easily get to 64 (1 per core) without reaching its capacity. | So, unless you're experiencing decreased performance I wouldn't think | much of it. I far as i understand , Load Avg is the average number of processes waiting to be run in past 1 , 5 or 15 mins. A number 1 would mean that countable number of processes were waiting to be run. how can load of more than 1 and upto 64 be OK for a 64 core machine ? Load avg is the number of processes in the running queue, which can be either waiting to be run or actually running. So if you had 100% CPU usage, then you'd most definitely have a load avg of 64, which is neither good or bad. It may simply mean that you're using your hardware's full potential. If your processes are waiting but not using CPU or I/O time... all I can think of is mcelog (it's the only application I've ever witnessed doing that). Do check ps/top and try to find out which processes are in a waiting state to have a little more insight. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance