My primary goal is to determine the current level of utilization of the current
hardware + software combination, and how much more load it can handle.
It’s quite likely that volume of work performed by that database will need to
increase by 50% or more over the next months, and I don’t want to risk hitting
a performance wall unexpectedly when that happens, I want to plan in advance
for more hardware or other solutions in case the current hardware and software
combination is inadequate.
My secondary goal is to determine optimal settings for best performance, and
learn more about how postgres works.
>From the point of view of the applications using the database, the system
>doesn’t currently experience any performance issues. Even when the checkpoints
>didn’t finish in the default configured 5 minutes, the database performed
>acceptably.
About the shared_buffers, decreasing them would indeed make the database keep
less dirty buffers, but that would not reduce the overall number of writes to
the disk. It would just change the responsibility of doing writes more towards
the bgwriter process instead of the checkpoint process, and I don’t see any
advantage in doing that.
Actually, reducing the shared buffers may even increase the number of writes to
the datafiles, because according to the documentation the bgwriter process can
write the same buffer multiple times during a checkpoint interval, and the
checkpoint would only write it once. That’s why I want to completely disable
the bgwriter.
About reducing the checkpoint write time from 25 minutes to 20 minutes, you are
correct that the same number of buffers will need to be flushed and that it
will increase the I/O intensity and put more stress on the disk system in that
interval. That’s exactly what I want.
I want to see if the hardware is powerful enough to complete the checkpoint in
just 20 minutes. This will prove there is some headroom and the database can
handle more load. If that works fine, I may even try 15 minutes, to see how it
behaves.
Since it’s a production system I have to be a bit careful, downtime during the
day can be quite expensive, but at the same time suddenly hitting a performance
wall unexpectedly in the future will be even more expensive.
After I finish the tests and reach a conclusion about the capabilities of this
hardware + software combination, I will probably set it to something like 0.90
target, so that it distributes the writes over 27 minutes.
Thanks,
Vlad
From: Igor Polishchuk
Sent: Friday, October 6, 2017 02:56
To: Vladimir Nicolici
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime
Vladimir,
Just curious, if your goal is to reduce checkpoint overhead, shouldn’t you
decrease shared_buffers instead of increasing it?
With bigger shared_buffers, you can accumulate more dirty buffers for
checkpoint to take care.
I remember in early versions ( around 8.4), when checkpoint_completion_target
was not available, one suggested way of fighting heavy checkpoints was
setting very low shared_buffers.
Also, why do yo need to reduce your checkpoint write time to 20 minutes from
25? What will you gain? If you will have the same number of dirty buffers to
flush, your IO intensity and overhead will increase in these 20 minutes.
Igor
On Oct 5, 2017, at 12:58, Vladimir Nicolici <vla...@gmail.com> wrote:
Some further updates about the issue.
I did a bit of benchmarking on the disk system with iozone, and the during the
test the SSDs seemed to be able to easily sustain 200 MB/second of writes each,
they fluctuated between 200 MB/s and 400 MB/s when doing 96 GB of random writes
in a file. That would mean between 400 and 800 MB/s for the entire RAID volume,
since it’s 1+0 and has 4 SSDs, 2 in each mirror.
I wasn’t able to benchmark reads properly because the machine has so much RAM
that they work mostly from cache. But reads shouldn’t be an issue anyway. I’ll
try to do more comprehensive tests tomorrow but, from what I’ve seen so far,
SSD I/O limits shouldn’t be a concern.
I changed some configuration parameters during the night to the values I was
considering yesterday:
• shared_buffers = 144GB #previously 96 GB
• bgwriter_lru_maxpages = 100 #previously 400
• checkpoint_timeout = 30min #previously 5min
• checkpoint_completion_target = 0.83 #previously 0.85; 0.83 means 25 minutes
writes out of 30 minutes.
• max_wal_size = 96GB #previously 16GB
• wal_buffers = 16MB #previously 32 MB
With the new settings the checkpoints now finish on time, more or less. One
recent checkpoint looked like this:
2017-10-05 14:16:22.891 EDT [7828] LOG: checkpoint starting: time
2017-10-05 14:42:35.429 EDT [7828] LOG: checkpoint complete: wrote 4770679
buffers (25.3%); 0 transaction log file(s) added, 0 removed, 2088 recycled;
write=1501.567 s, sync=1.844 s, total=1572.538 s; sync files=750, longest=0.029