Re: [PERFORM] PGSQL, checkpoints, and file system syncs

2014-04-03 Thread Heikki Linnakangas

On 04/03/2014 08:39 PM, Reza Taheri wrote:

Hello PGSQL performance community,
You might remember that I pinged you in July 2012 to introduce the TPC-V 
benchmark. I am now back with more data, and a question about checkpoints. As 
far as the plans for the benchmark, we are hoping to release a benchmarking kit 
for multi-VM servers this year (and of course one can always simply configure 
it to run on one database)

I am now dealing with a situation of performance dips when checkpoints 
complete. To simplify the discussion, I have reproduced the problem on a single 
VM/single database.

Complete config info is in the attached files. Briefly, it is a 6-vCPU VM with 
91G of memory, and 70GB in PGSQL shared buffers. The host has 512GB of memory 
and 4 sockets of Westmere (E7-4870) processors with HT enabled.

The data tablespace is on an ext4 file system on a (virtual) disk which is 
striped on 16 SSD drives in RAID 0. This is obviously overkill for the load we 
are putting on this one VM, but in the usual benchmarking config, the 16 SSDs 
are shared by 24 VMs. Log is on an ext3 file system on 4 spinning drives in 
RAID 1.

We are running PGSQL version 9.2 on RHEL 6.4; here are some parameters of 
interest (postgresql.conf is in the attachment):
checkpoint_segments = 1200
checkpoint_timeout = 360s
checkpoint_completion_target = 0.8
wal_sync_method = open_datasync
wal_buffers = 16MB
wal_writer_delay = 10ms
effective_io_concurrency = 10
effective_cache_size = 1024MB

When running tests, I noticed that when a checkpoint completes, we have a big 
burst of writes to the data disk. The log disk has a very steady write rate 
that is not affected by checkpoints except for the known phenomenon of more 
bytes in each log write when a new checkpoint period starts. In a multi-VM 
config with all VMs sharing the same data disks, when these write bursts 
happen, all VMs take a hit.

So I set out to see what causes this write burst.  After playing around with 
PGSQL parameters and observing its behavior, it appears that the bursts aren't 
produced by the database engine; they are produced by the file system. I 
suspect PGSQL has to issue a sync(2)/fsync(2)/sync_file_range(2) system call at 
the completion of the checkpoint to ensure that all blocks are flushed to disk 
before creating a checkpoint marker. To test this, I ran a loop to call sync(8) 
once a second.

The graphs in file run280.mht have the throughput, data disk activity, and checkpoint 
start/completion timestamps for the baseline case. You can see that the checkpoint completion, the 
write burst, and the throughput dip all occur at the same time, so much so that it is hard to see 
the checkpoint completion line under the graph of writes. It looks like the file system does a mini 
flush every 30 seconds. The file run274.mht is the case with sync commands running in 
the background. You can see that everything is more smooth.

Is there something I can set in the PGSQL parameters or in the file system parameters to 
force a steady flow of writes to disk rather than waiting for a sync system call? 
Mounting with commit=1 did not make a difference.


Try setting the vm.dirty_bytes sysctl. Something like 256MB might be a 
good starting point.


This comes up fairly often, see e.g.: 
http://www.postgresql.org/message-id/flat/27c32fd4-0142-44fe-8488-9f366dc75...@mr-paradox.net


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PGSQL, checkpoints, and file system syncs

2014-04-03 Thread Reza Taheri
 Try setting the vm.dirty_bytes sysctl. Something like 256MB might be a good
 starting point.
 
 This comes up fairly often, see e.g.:
 http://www.postgresql.org/message-id/flat/27C32FD4-0142-44FE-8488-
 9f366dc75...@mr-paradox.net
 
 - Heikki

Thanks, Heikki. That sounds like my problem alright. I will play with these 
parameters right away, and will report back.

Cheers,
Reza


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance