Uh. I'm not surprised you're facing utterly horrible performance with
this. Did you try using a *large* checkpoints_segments setting? To
achieve high performance
I do not seek "high performance" per se, I seek "lower maximum latency".
I think that the current settings and parameters are designed for high
throughput, but do not allow to control the latency even with a small
load.
you likely will have to make checkpoint_timeout *longer* and increase
checkpoint_segments until *all* checkpoints are started because of
"time".
Well, as I want to test a *small* load in a *reasonable* time, so I did
not enlarge the number of segments, otherwise it would take ages.
If I put a "checkpoint_timeout = 1min" and "checkpoint_completion_target =
0.9" so that the checkpoints are triggered by the timeout,
LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 4476 buffers (27.3%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=53.645 s, sync=5.127 s,
total=58.927 s; sync files=12, longest=2.890 s, average=0.427 s
...
The result is basically the same (well 18% transactions lost, but the
result do not seem to be stable one run after the other), only there are
more checkpoints.
I fail to understand how multiplying both the segments and time would
solve the latency problem. If I set 30 segments than it takes 20 minutes
to fill them, and if I put timeout to 15min then I'll have to wait for 15
minutes to test.
There's three reasons:
a) if checkpoint_timeout + completion_target is large and the checkpoint
isn't executed prematurely, most of the dirty data has been written out
by the kernel's background flush processes.
Why would they be written by the kernel if bgwriter has not sent them??
b) The amount of WAL written with less frequent checkpoints is often
*significantly* lower because fewer full page writes need to be
done. I've seen production reduction of *more* than a factor of 4.
Sure, I understand that, but ISTM that this test does not exercise this
issue, the load is small, the full page writes do not matter much.
c) If checkpoint's are infrequent enough, the penalty of them causing
problems, especially if not using ext4, plays less of a role overall.
I think that what you suggest would only delay the issue, not solve it.
I'll try to ran a long test.
--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers