No errors in the logs, except when we hit max_connections No shared memory problems – no associated spike in I/O or system CPU indicating shared memory is either unused or over used. Sufficient memory in cache/buffers, zero swapping or anything indicative of a memory problem.
The box is pretty beefy – 24 core, 768G RAM :) - so yes, an effective cache of 568GB is normal, we arrived at it with months of tuning over time. cpu_tuple_cost of 0.03 – yes, a lot of our settings are tweaked from the defaults based on performance. I don't have the output now, the the 0.03 was based on recommendations from posrgtes user groups, and via testing with setting it up and running explain analyze on queries. None of the settings have changed when this problem began. Thanks, Karthik From: Venkata Balaji Nagothi <vbn...@gmail.com<mailto:vbn...@gmail.com>> Date: Monday, March 10, 2014 7:35 PM To: "Anand Kumar, Karthik" <karthik.anandku...@classmates.com<mailto:karthik.anandku...@classmates.com>> Cc: "pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>" <pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>> Subject: Re: [GENERAL] Increase in max_connections On Tue, Mar 11, 2014 at 12:04 PM, Anand Kumar, Karthik <karthik.anandku...@classmates.com<mailto:karthik.anandku...@classmates.com>> wrote: Hi all, We're running postgres 9.3.2, server configuration below. Seemingly randomly, we will see the number of active queries in postgres go up until we hit max_connections. The DB will recover after a few minutes. We had the issue a couple of times in Feb 2014. We then upgraded the postgres server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to several times a day. The user CPU goes up as well to a 100%, no increase in I/O or system CPU. We have slow query logging, and there is no dramatic change in the slow queries either. There is a corresponding spike in shared locks, but that seems to be an effect not a cause – it corresponds to an increase in the number of running processes at the time. We had a similar issue in the past – that was solved by disabling transparent_huge_pages – but the difference there was that we'd see queries slow down dramatically. Currently, we don't. Also, transparent_huge_pages is still disabled. I do realize the issue would be caused by a spurt in incoming connections – we do not yet have conclusive evidence on whether that's happening (active queries climbs up, however no conclusive proof on whether thats because of slow down, or because of increase in traffic). Working on getting the information, will update with that information as soon as we have it. I thought I'd send a post out to the group before then, to see if anyone has run into anything similar. Thanks, Karthik site=# SELECT version(); PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit site=# SELECT name, current_setting(name), source site-# FROM pg_settings site-# WHERE source NOT IN ('default', 'override'); application_name|psql|client archive_command|/usr/bin/archiver.sh %f %p|configuration file archive_mode|on|configuration file autovacuum_freeze_max_age|250000000|configuration file autovacuum_max_workers|6|configuration file bgwriter_lru_maxpages|1000|configuration file bgwriter_lru_multiplier|4|configuration file checkpoint_completion_target|0.8|configuration file checkpoint_segments|250|configuration file checkpoint_timeout|15min|configuration file checkpoint_warning|6min|configuration file client_encoding|UTF8|client commit_siblings|25|configuration file cpu_tuple_cost|0.03|configuration file DateStyle|ISO, MDY|configuration file default_statistics_target|300|configuration file default_text_search_config|pg_catalog.english|configuration file effective_cache_size|568GB|configuration file fsync|on|configuration file lc_messages|en_US.UTF-8|configuration file lc_monetary|en_US.UTF-8|configuration file lc_numeric|en_US.UTF-8|configuration file lc_time|en_US.UTF-8|configuration file listen_addresses|*|configuration file log_autovacuum_min_duration|0|configuration file log_checkpoints|on|configuration file log_connections|on|configuration file log_destination|syslog|configuration file log_directory|pg_log|configuration file log_filename|postgresql-%a.log|configuration file log_line_prefix|user=%u,db=%d,ip=%h |configuration file log_min_duration_statement|100ms|configuration file log_min_messages|debug1|configuration file log_rotation_age|1d|configuration file log_rotation_size|0|configuration file log_timezone|US/Pacific|configuration file log_truncate_on_rotation|on|configuration file logging_collector|off|configuration file maintenance_work_mem|1GB|configuration file max_connections|1500|configuration file max_locks_per_transaction|1000|configuration file max_stack_depth|2MB|environment variable max_wal_senders|5|configuration file port|5432|command line random_page_cost|2|configuration file shared_buffers|8GB|configuration file synchronous_commit|off|configuration file syslog_facility|local0|configuration file syslog_ident|postgres|configuration file TimeZone|US/Pacific|configuration file vacuum_freeze_table_age|0|configuration file wal_buffers|32MB|configuration file wal_keep_segments|250|configuration file wal_level|hot_standby|configuration file wal_sync_method|fsync|configuration file work_mem|130MB|configuration file Please let us know your hardware configuration like RAM, CPU (cores) etc. Do you see any messages indicating any processes getting terminated/killed forcibly in the Postgresql logs ? Or do you see any shared memory related error messages ? cpu_tuple_cost=0.03 - which is not default, any reasons for increasing this. effective_cache_size = 568 GB - Please help us know if this is optimal for your system. Venkata Balaji N Sr. Database Administrator Fujitsu Australia