Dear List, Today has been good since morning. Although it is a lean day for us but the indications are nice. I thank everyone who shared the concern. I think the most significant change has been to reduce shared_buffers from 10G to 4G , this has lead to reduced memory usage and some breathing space to the OS.
Although i am yet to incorporate the suggestions from pgtune but i think the issue of max_connection needs to be addressed first. I am investigating application issues and about the mechanism that puts many backend to '<IDLE> in transaction ' mode for significant times. I thank Tom for the script he sent. Once that resolves i shall check pooling as suggested by Kevin, then eventually max_connections can be reduced. I shall also check pgpool and pgbouncer if they are helpful in this regard. I observed that the number of simultaneous connection today (lean day) hovers between 1 to 10 , occasionally shooting to 15 but never more than 20 i would say. I am happy that i/o waits are negligible and cpu is idling also for a while. procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 22 0 18468 954120 13460 28491772 0 0 568 1558 13645 18355 62 10 27 2 0 16 0 18468 208100 13476 28469084 0 0 580 671 14039 17055 67 13 19 1 0 10 2 18812 329032 13400 28356972 0 46 301 1768 13848 17884 68 10 20 1 0 16 2 18812 366596 13416 28361620 0 0 325 535 13957 16649 72 11 16 1 0 50 1 18812 657048 13432 28366548 0 0 416 937 13823 16667 62 9 28 1 0 6 1 18812 361040 13452 28371908 0 0 323 522 14352 16789 74 12 14 0 0 33 0 18812 162760 12604 28210152 0 0 664 1544 14701 16315 66 11 22 2 0 5 0 18812 212028 10764 27921800 0 0 552 648 14567 17737 67 10 21 1 0 6 0 18796 279920 10548 27890388 3 0 359 562 12635 15976 60 9 30 1 0 8 0 18796 438820 10564 27894440 0 0 289 2144 12234 15770 57 8 34 1 0 5 0 18796 531800 10580 27901700 0 0 514 394 12169 16005 59 8 32 1 0 17 0 18796 645868 10596 27890704 0 0 423 948 13369 16554 67 10 23 1 0 9 1 18796 1076540 10612 27898604 0 0 598 403 12703 17363 71 10 18 1 0 8 0 18796 1666508 10628 27904748 0 0 430 1123 13314 17421 57 9 32 1 0 9 1 18776 1541444 10644 27913092 1 0 653 954 13194 16822 75 11 12 1 0 8 0 18776 1526728 10660 27921380 0 0 692 788 13073 16987 74 9 15 1 0 8 0 18776 1482304 10676 27933176 0 0 966 2029 13017 16651 76 12 11 1 0 21 0 18776 1683260 10700 27937492 0 0 298 663 13110 15796 67 10 23 1 0 18 0 18776 2087664 10716 27943512 0 0 406 622 12399 17072 62 9 28 1 0 With 300 connections, I think that either of these could lead you to > experience intermittent bursts of extreme swapping. I'd drop it to > somewhere in the 16MB to 32MB range until I had a connection pool > configured such that it was actually keeping the number of active > connections much lower. > > > (*) wal_buffers = 8MB # pgtune wizard 2010-06-25 > > (64kb , via default) > > Sure, I'd boost this. > > > checkpoint_segments = 16 # pgtune wizard 2010-06-25 > > (30 , specified) > > If you have the disk space for the 30 segments, I wouldn't reduce > it. > > > shared_buffers = 7680MB # pgtune wizard 2010-06-25 > > (4096 MB , specified) > > This one is perhaps the most sensitive to workload. Anywhere > between 1GB and 8GB might be best for you. Greg Smith has some > great advice on how to tune this for your workload. > > > (*) max_connections = 80 # pgtune wizard 2010-06-25 > > (300 , ;-) specified) > > > > when i reduce max_connections i start getting errors, i will see > > again concurrent connections during business hours. > > That's probably a good number to get to, but you have to reduce the > number of actual connections before you set the limit that low. > > > lot of our connections are in <IDLE> in transaction state > > If any of these stay in that state for more than a minute or two, > you need to address that if you want to get your connection count > under control. If any of them persist for hours or days, you need > to fix it to avoid bloat which can kill performance. > > -Kevin >