On March 16, 2015 at 3:24:34 AM, Roxanne Reid-Bennett (r...@tara-lu.com) wrote:
Robert,

Wow - You've engaged the wizards indeed.

I haven't heard or seen anything that would answer my *second* question if 
faced with this (my first would have been "what changed")....

Yes, indeed — I feel honored to have so many people chime into this issue.

The problem was that nothing abnormal was happening — just the normal queries 
were running that hadn’t given us any problems for months. We undid everything 
that had been recently changed in an effort to address “what changed”. Nothing 
helped, which is what had us so perplexed.

However, I am glad to report that our problems are fixed and that our server is 
back to humming along nicely. 

What we changed:

1. As it was pointed out here, max_connections of 500 was in fact insanely 
high, especially in light of using PGbouncer. Before we used PGbouncer we 
needed a lot more connections and when we started using PGbouncer, we never 
reduced this number.

2. Our server_lifetime was set far too high (1 hour). Josh Berkus suggested 
lowering that to 5 minutes.

3. We reduced the number of PGbouncer active connections to the DB.

What we learned:

1. We had too many backends

2. The backends were being kept around for too long by PGbouncer.

3. This caused too many idle backends to kick around. Once we exhausted 
physical ram, we started swapping.

4. Linux 3.2 apparently has some less than desirable swap behaviours. Once we 
started swapping, everything went nuts. 

Going forward we’re going to upgrade our kernel the next time we have down time 
for our site and the rest should be sorted now.

I wanted to thank everyone who contributed their thoughts to this thread — 
THANK YOU.

And as I said to Josh earlier: "Postgres rocks our world. I’m immensely pleased 
that once again the problems were our own stupidity and not PG’s fault. In over 
10 years of us using PG, it has never been PG’s fault. Not once.”

And thus we’re one tiny bit smarter today. Thank you everyone!



P.S. If anyone would still like to get some more information about this problem 
for their own edification, please let me know. Given that we’ve fixed the 
issue, I don’t want to spam this list by responding to all the questions that 
were posed.


--

--ruaok        

Robert Kaye     --     r...@musicbrainz.org     --    http://musicbrainz.org

Reply via email to