Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

2017-10-10 Thread Vladimir Nicolici
No, it didn’t. The delete was done in a single transaction.

From: Achilleas Mantzios
Sent: Tuesday, October 10, 2017 17:18
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

Hello Vladimir,

maybe your update triggered auto_vacuum on those tables ? Default 
autovacuum_freeze_max_age is exactly set at 200,000,000 . Did you check your 
vacuum stats afterwards (pg_stat_*_tables) ?
Can you show the code which performed the deletes?

On 10/10/2017 16:56, Vladimir Nicolici wrote:
I experimented some more with the settings this weekend, while doing some large 
write operations (deleting 200 million records from a table), and I realized 
that the database is capable of generating much more WAL than I estimated.
 
And it seems that spikes in write activity, when longer than a few minutes, can 
cause the checkpoint process to “panic” and start a checkpoint earlier, and 
trying to complete it as soon as possible, estimating, correctly, that if that 
level of activity continues it will hit the max_wal_size limit.
 
Based on that, I reduced the checkpoint_timeout from 30 minutes to 20 minutes, 
while keeping max_wal_size at 144GB . Alternatively I could have increased the 
maximum WAL size more, but I’m not sure it’s a good idea to set it higher than 
the shared buffers, which are also set at 144GB. After this change, on Monday 
all checkpoints were triggered by “time”, I didn’t have any more checkpoints 
triggered by “xlog”.
 
I also set checkpoint_completion_target to 0.5 to see if our hardware can 
handle concentrating the write activity for 20 minutes in just 10 minutes, and 
that worked very well too, checkpoints finished on time. The %util (busy%) for 
the SSDs as reported by sar was around 20% when not doing a checkpoint, and 60% 
during the checkpoint, so it seems the hardware will be able to handle future 
increases in activity just fine.
 
The lesson I learned here is that max_wal_size needs to be configured based on 
the *maximum* volume of wal the database can generate in the checkpoint_timeout 
interval. Initially I had it set based on the *average* volume of wal generated 
in that interval, setting it to 3 times that average, but that was not enough, 
triggering the unexpected behavior.
 
Thanks,
Vlad

-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

2017-10-05 Thread Andres Freund
Hi,

On 2017-10-06 05:53:39 +0300, Vladimir Nicolici wrote:
> Hello, it’s postgres 9.6.

Consider setting checkpoint_flush_after to 16MB or something large like that.


> I will probably try the compression on Monday or Tuesday, I can only
> experiment with a single set of changes in a day, and I plan to test
> something else tomorrow.
> 
> Thanks for the suggestions, and sorry for the reply style, but my mail
> client is not best suited for replying inline to individual points.

You should consider getting a new mail client then...

- Andres


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

2017-10-05 Thread Vladimir Nicolici
Hello, it’s postgres 9.6 . I know reducing wal_buffers from 32 MB to 16MB is 
unlikely to help, but according to the documentation values larger than 16MB 
are unlikely to help either, at least at the default wal segment size, so I 
decided to go with 16 MB.

I will probably try the compression on Monday or Tuesday, I can only experiment 
with a single set of changes in a day, and I plan to test something else 
tomorrow.

Thanks for the suggestions, and sorry for the reply style, but my mail client 
is not best suited for replying inline to individual points.

From: Andres Freund
Sent: Friday, October 6, 2017 04:51
To: Vladimir Nicolici
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

Hi,

On 2017-10-05 22:58:31 +0300, Vladimir Nicolici wrote:
> I changed some configuration parameters during the night to the values I was 
> considering yesterday:
> 
> - shared_buffers = 144GB #previously 96 GB
> - bgwriter_lru_maxpages = 100  #previously 400
> - checkpoint_timeout = 30min  #previously 5min
> - checkpoint_completion_target = 0.83 #previously 0.85; 0.83 means 25 minutes 
> writes out of 30 minutes.
> - max_wal_size = 96GB #previously 16GB

Which version of postgres is this?

> - wal_buffers = 16MB  #previously 32 MB

That seems quite unlikely to help.

You might want to try also enabling wal_compression, sometimes the WAL
volume is a considerable problem.

I'd suggest reporting some "pidstat -dl 1" output, so we can see which
processes are doing how much IO.

Regards,

Andres



Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

2017-10-05 Thread Vladimir Nicolici
My primary goal is to determine the current level of utilization of the current 
hardware + software combination, and how much more load it can handle.

It’s quite likely that volume of work performed by that database will need to 
increase by 50% or more over the next months, and I don’t want to risk hitting 
a performance wall unexpectedly when that happens, I want to plan in advance 
for more hardware or other solutions in case the current hardware and software 
combination is inadequate.

My secondary goal is to determine optimal settings for best performance, and 
learn more about how postgres works.

>From the point of view of the applications using the database, the system 
>doesn’t currently experience any performance issues. Even when the checkpoints 
>didn’t finish in the default configured 5 minutes, the database performed 
>acceptably.
 
About the shared_buffers, decreasing them would indeed make the database keep 
less dirty buffers, but that would not reduce the overall number of writes to 
the disk. It would just change the responsibility of doing writes more towards 
the bgwriter process instead of the checkpoint process, and I don’t see any 
advantage in doing that.

Actually, reducing the shared buffers may even increase the number of writes to 
the datafiles, because according to the documentation the bgwriter process can 
write the same buffer multiple times during a checkpoint interval, and the 
checkpoint would only write it once. That’s why I want to completely disable 
the bgwriter.

About reducing the checkpoint write time from 25 minutes to 20 minutes, you are 
correct that the same number of buffers will need to be flushed and that it 
will increase the I/O intensity and put more stress on the disk system in that 
interval. That’s exactly what I want.

I want to see if the hardware is powerful enough to complete the checkpoint in 
just 20 minutes. This will prove there is some headroom and the database can 
handle more load. If that works fine, I may even try 15 minutes, to see how it 
behaves.

Since it’s a production system I have to be a bit careful, downtime during the 
day can be quite expensive, but at the same time suddenly hitting a performance 
wall unexpectedly in the future will be even more expensive.

After I finish the tests and reach a conclusion about the capabilities of this 
hardware + software combination, I will probably set it to something like 0.90 
target, so that it distributes the writes over 27 minutes.

Thanks,
Vlad

From: Igor Polishchuk
Sent: Friday, October 6, 2017 02:56
To: Vladimir Nicolici
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Strange checkpoint behavior - checkpoints take a longtime

Vladimir,
Just curious, if your goal is to reduce checkpoint overhead, shouldn’t you 
decrease shared_buffers instead of increasing it? 
With bigger shared_buffers, you can accumulate more dirty buffers for 
checkpoint to take care. 
I remember in early versions ( around 8.4), when checkpoint_completion_target 
was not available, one suggested way of fighting heavy checkpoints was 
setting very low shared_buffers.
Also, why do yo need to reduce your checkpoint write time to 20 minutes from 
25? What will you gain? If you will have the same number of dirty buffers to 
flush, your IO intensity and overhead will  increase in these 20 minutes.

Igor

  
On Oct 5, 2017, at 12:58, Vladimir Nicolici <vla...@gmail.com> wrote:

Some further updates about the issue.
 
I did a bit of benchmarking on the disk system with iozone, and the during the 
test the SSDs seemed to be able to easily sustain 200 MB/second of writes each, 
they fluctuated between 200 MB/s and 400 MB/s when doing 96 GB of random writes 
in a file. That would mean between 400 and 800 MB/s for the entire RAID volume, 
since it’s 1+0 and has 4 SSDs, 2 in each mirror.
 
I wasn’t able to benchmark reads properly because the machine has so much RAM 
that they work mostly from cache. But reads  shouldn’t be an issue anyway. I’ll 
try to do more comprehensive tests tomorrow but, from what I’ve seen so far, 
SSD I/O limits shouldn’t be a concern.
 
I changed some configuration parameters during the night to the values I was 
considering yesterday:
 
• shared_buffers = 144GB #previously 96 GB
• bgwriter_lru_maxpages = 100  #previously 400
• checkpoint_timeout = 30min  #previously 5min
• checkpoint_completion_target = 0.83 #previously 0.85; 0.83 means 25 minutes 
writes out of 30 minutes.
• max_wal_size = 96GB #previously 16GB
• wal_buffers = 16MB  #previously 32 MB
 
With the new settings the checkpoints now finish on time, more or less. One 
recent checkpoint looked like this:
 
2017-10-05 14:16:22.891 EDT [7828] LOG:  checkpoint starting: time
2017-10-05 14:42:35.429 EDT [7828] LOG:  checkpoint complete: wrote 4770679 
buffers (25.3%); 0 transaction log file(s) added, 0 removed, 2088 recycled; 
write=1501.567 s, sync=1.844 s, total=1572.538 s; sync files=750, longest=0.029