Uh. I'm not surprised you're facing utterly horrible performance with
this. Did you try using a *large* checkpoints_segments setting? To
achieve high performance

I do not seek "high performance" per se, I seek "lower maximum latency".

I think that the current settings and parameters are designed for high throughput, but do not allow to control the latency even with a small load.

you likely will have to make checkpoint_timeout *longer* and increase checkpoint_segments until *all* checkpoints are started because of "time".

Well, as I want to test a *small* load in a *reasonable* time, so I did not enlarge the number of segments, otherwise it would take ages.

If I put a "checkpoint_timeout = 1min" and "checkpoint_completion_target = 0.9" so that the checkpoints are triggered by the timeout,

  LOG:  checkpoint starting: time
  LOG:  checkpoint complete: wrote 4476 buffers (27.3%); 0 transaction log
    file(s) added, 0 removed, 0 recycled; write=53.645 s, sync=5.127 s,
    total=58.927 s; sync files=12, longest=2.890 s, average=0.427 s
  ...

The result is basically the same (well 18% transactions lost, but the result do not seem to be stable one run after the other), only there are more checkpoints.

I fail to understand how multiplying both the segments and time would solve the latency problem. If I set 30 segments than it takes 20 minutes to fill them, and if I put timeout to 15min then I'll have to wait for 15 minutes to test.

There's three reasons:
a) if checkpoint_timeout + completion_target is large and the checkpoint
isn't executed prematurely, most of the dirty data has been written out
by the kernel's background flush processes.

Why would they be written by the kernel if bgwriter has not sent them??

b) The amount of WAL written with less frequent checkpoints is often
*significantly* lower because fewer full page writes need to be
done. I've seen production reduction of *more* than a factor of 4.

Sure, I understand that, but ISTM that this test does not exercise this issue, the load is small, the full page writes do not matter much.

c) If checkpoint's are infrequent enough, the penalty of them causing
problems, especially if not using ext4, plays less of a role overall.

I think that what you suggest would only delay the issue, not solve it.

I'll try to ran a long test.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to