On Wed, Jun 4, 2014 at 5:27 PM, vlasmarias <vlasmar...@contigo.com> wrote:
> For the past few days, we've been seeing unexpected extremely high CPU > spikes > in our system. We observed the following: the 'free' memory would go down > to > lower than 300 MB; at that point, 'cached' slowly starts to go down, and > then CPU starts to go way up. > > It's almost as if the OS was not releasing 'cached' memory fast enough for > Postgres. Is that analysis correct? Is there a way to fix this? > This sounds like a kernel problem, probably either the zone reclaim issue, or the transparent huge pages issue. I don't know the exact details off the top of my head, but both have been discussed a lot on both this list and the pgsql-hackers list. > > Here's the session: > > 04:58:37 load average: 2.37, free: 532, cached: 22852 > 04:58:57 load average: 1.91, free: 451, cached: 22859 > 04:59:17 load average: 1.82, free: 469, cached: 22866 > 04:59:57 load average: 1.57, free: 387, cached: 22884 > What tool is that? I'm not familiar with this output format. > max_connections | 500 > While this is probably fundamentally a kernel problem, you are not doing yourself any favors by allowing 500 connections to a machine with 24 cores. High numbers of connections can trigger poor kernel behavior. Cheers, Jeff