Re: [PERFORM] postgresql.conf recommendations

Josh Krupka Wed, 06 Feb 2013 10:21:24 -0800

David,

Interesting observations.  I had not been tracking the interrupts but
perhaps I should take a look.  How are you measuring them over a period of
time, or are you just getting them real time?


Did you turn off THP all together or just the THP defrag?


On Wed, Feb 6, 2013 at 10:42 AM, David Whittaker <d...@iradix.com> wrote:

> Josh/Johnny,
>
> We've been seeing a similar problem as well, and had also figured THP was
> involved.  We found this in syslog:
> https://gist.github.com/davewhittaker/4723285, which led us to disable
> THP 2 days ago.  At first the results seemed good.  In particular, our
> issues always seemed interrupt related and the average interrupts/sec
> immediately dropped from 7k to around 3k after restarting.  The good news
> is that we didn't see any spike in system CPU time yesterday.  The bad news
> is that we did see a spike in app latency that originated from the DB, but
> now the spike is in user CPU time and seems to be spread across all of the
> running postgres processes.  Interrupts still blew up to 21k/sec when it
> happened.  We are still diagnosing, but I'd be curious to see if either of
> you get similar results from turning THP off.
>
>
> On Tue, Feb 5, 2013 at 11:23 PM, Josh Krupka <jkru...@gmail.com> wrote:
>
>> I've been looking into something on our system that sounds similar to
>> what you're seeing.  I'm still researching it, but I'm suspecting the
>> memory compaction that runs as part of transparent huge pages when memory
>> is allocated... yet to be proven.  The tunable you mentioned controls the
>> compaction process that runs at allocation time so it can try to allocate
>> large pages, there's a separate one that controls if the compaction is done
>> in khugepaged, and a separate one that controls whether THP is used at all
>> or not (/sys/kernel/mm/transparent_hugepage/enabled, or perhaps different
>> in your distro)
>>
>> What's the output of this command?
>> egrep 'trans|thp|compact_' /proc/vmstat
>> compact_stall represents the number of processes that were stalled to do
>> a compaction, the other metrics have to do with other parts of THP.  If you
>> see compact_stall climbing, from what I can tell those might be causing
>> your spikes.  I haven't found a way of telling how long the processes have
>> been stalled. You could probably get a little more insight into the
>> processes with some tracing assuming you can catch it quickly enough.
>> Running perf top will also show the compaction happening but that doesn't
>> necessarily mean it's impacting your running processes.
>>
>>
>>
>>
>> On Tue, Feb 5, 2013 at 6:46 PM, Johnny Tan <johnnyd...@gmail.com> wrote:
>>
>>> # cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
>>> [always] never
>>>
>>>
>>> On Tue, Feb 5, 2013 at 5:37 PM, Josh Krupka <jkru...@gmail.com> wrote:
>>>
>>>> Just out of curiosity, are you using transparent huge pages?
>>>> On Feb 5, 2013 5:03 PM, "Johnny Tan" <johnnyd...@gmail.com> wrote:
>>>>
>>>>> Server specs:
>>>>> Dell R610
>>>>> dual E5645 hex-core 2.4GHz
>>>>> 192GB RAM
>>>>> RAID 1: 2x400GB SSD (OS + WAL logs)
>>>>> RAID 10: 4x400GB SSD (/var/lib/pgsql)
>>>>>
>>>>>
>>>>> /etc/sysctl.conf:
>>>>> kernel.msgmnb = 65536
>>>>> kernel.msgmax = 65536
>>>>> kernel.shmmax = 68719476736
>>>>> kernel.shmall = 4294967296
>>>>> vm.overcommit_memory = 0
>>>>> vm.swappiness = 0
>>>>> vm.dirty_background_bytes = 536870912
>>>>> vm.dirty_bytes = 536870912
>>>>>
>>>>>
>>>>> postgresql.conf:
>>>>> listen_addresses = '*' # what IP address(es) to listen on;
>>>>> max_connections = 150 # (change requires restart)
>>>>> shared_buffers = 48GB # min 128kB
>>>>> work_mem = 1310MB # min 64kB
>>>>> maintenance_work_mem = 24GB # min 1MB
>>>>> wal_level = hot_standby # minimal, archive, or hot_standby
>>>>> checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
>>>>> checkpoint_timeout = 30min # range 30s-1h
>>>>> checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0
>>>>> - 1.0
>>>>> max_wal_senders = 5 # max number of walsender processes
>>>>> wal_keep_segments = 2000 # in logfile segments, 16MB each; 0 disables
>>>>> hot_standby = on # "on" allows queries during recovery
>>>>> max_standby_archive_delay = 120s # max delay before canceling queries
>>>>> max_standby_streaming_delay = 120s # max delay before canceling
>>>>> queries
>>>>> effective_cache_size = 120GB
>>>>> constraint_exclusion = partition # on, off, or partition
>>>>> log_destination = 'syslog' # Valid values are combinations of
>>>>> logging_collector = on # Enable capturing of stderr and csvlog
>>>>> log_directory = 'pg_log' # directory where log files are written,
>>>>> log_filename = 'postgresql-%a.log' # log file name pattern,
>>>>> log_truncate_on_rotation = on # If on, an existing log file with the
>>>>> log_rotation_age = 1d # Automatic rotation of logfiles will
>>>>> log_rotation_size = 0 # Automatic rotation of logfiles will
>>>>> log_min_duration_statement = 500 # -1 is disabled, 0 logs all
>>>>> statements
>>>>> log_checkpoints = on
>>>>> log_line_prefix = 'user=%u db=%d remote=%r ' # special values:
>>>>> log_lock_waits = on # log lock waits >= deadlock_timeout
>>>>> autovacuum = on # Enable autovacuum subprocess?  'on'
>>>>> log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
>>>>> autovacuum_max_workers = 5 # max number of autovacuum subprocesses
>>>>> datestyle = 'iso, mdy'
>>>>> lc_messages = 'en_US.UTF-8' # locale for system error message
>>>>> lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
>>>>> lc_numeric = 'en_US.UTF-8' # locale for number formatting
>>>>> lc_time = 'en_US.UTF-8' # locale for time formatting
>>>>> default_text_search_config = 'pg_catalog.english'
>>>>> deadlock_timeout = 300ms
>>>>>
>>>>>
>>>>> per pgtune:
>>>>>
>>>>> #------------------------------------------------------------------------------
>>>>> # pgtune wizard run on 2013-02-05
>>>>> # Based on 198333224 KB RAM in the server
>>>>>
>>>>> #------------------------------------------------------------------------------
>>>>> default_statistics_target = 100
>>>>> maintenance_work_mem = 1GB
>>>>> checkpoint_completion_target = 0.9
>>>>> effective_cache_size = 128GB
>>>>> work_mem = 1152MB
>>>>> wal_buffers = 8MB
>>>>> checkpoint_segments = 16
>>>>> shared_buffers = 44GB
>>>>> max_connections = 80
>>>>>
>>>>> We use pgbouncer (set to 140 connections) in transaction pooling mode
>>>>> in front of our db.
>>>>>
>>>>>
>>>>> The problem:
>>>>>
>>>>> For the most part, the server hums along. No other applications run on
>>>>> this server other than postgres. Load averages rarely break 2.0, it never
>>>>> swaps, and %iowait is usually not more than 0.12
>>>>>
>>>>> But periodically, there are spikes in our app's db response time.
>>>>> Normally, the app's db response time hovers in the 100ms range for most of
>>>>> the day. During the spike times, it can go up to 1000ms or 1500ms, and the
>>>>> number of pg connections goes to 140 (maxed out to pgbouncer's limit, 
>>>>> where
>>>>> normally it's only about 20-40 connections). Also, during these times,
>>>>> which usually last less than 2 minutes, we will see several thousand
>>>>> queries in the pg log (this is with log_min_duration_statement = 500),
>>>>> compared to maybe one or two dozen 500ms+ queries in non-spike times.
>>>>>
>>>>> Inbetween spikes could be an hour, two hours, sometimes half a day.
>>>>> There doesn't appear to be any pattern that we can see:
>>>>> * there are no obvious queries that are locking the db
>>>>> * it doesn't necessarily happen during high-traffic times, though it
>>>>> can
>>>>> * it doesn't happen during any known system, db, or app
>>>>> regularly-scheduled job, including crons
>>>>> * in general, there's no discernible regularity to it at all
>>>>> * it doesn't coincide with checkpoint starts or completions
>>>>> * it doesn't coincide with autovacuums
>>>>> * there are no messages in any system logs that might indicate any
>>>>> system or hardware-related issue
>>>>>
>>>>> Besides spikes in our graphs, the only other visible effect is that
>>>>> %system in sar goes from average of 0.7 to as high as 10.0 or so (%iowait
>>>>> and all other sar variables remain the same).
>>>>>
>>>>> And according to our monitoring system, web requests get queued up,
>>>>> and our alerting system sometimes either says there's a timeout or that it
>>>>> had multiple web response times greater than 300ms, and so we suspect (but
>>>>> have no proof) that some users will see either a long hang or possibly a
>>>>> timeout. But since it's almost always less than two minutes, and sometimes
>>>>> less than one, we don't really hear any complaints (guessing that most
>>>>> people hit reload, and things work again, so they continue on), and we
>>>>> haven't been able to see any negative effect ourselves.
>>>>>
>>>>> But we want to get in front of the problem, in case it is something
>>>>> that will get worse as traffic continues to grow. We've tweaked various
>>>>> configs on the OS side as well as the postgresql.conf side. What's posted
>>>>> above is our current setup, and the problem persists.
>>>>>
>>>>> Any ideas as to where we could even look?
>>>>>
>>>>> Also, whether related or unrelated to the spikes, are there any
>>>>> recommendations for our postgresql.conf or sysctl.conf based on our
>>>>> hardware? From pgtune's output, I am lowering maintenance_work_mem from
>>>>> 24GB down to maybe 2GB, but I keep reading conflicting things about other
>>>>> settings, such as checkpoints or max_connections.
>>>>>
>>>>> johnny
>>>>>
>>>>>
>>>
>>
>

Re: [PERFORM] postgresql.conf recommendations

Reply via email to