Re: [HACKERS] postgresql latency & bgwriter not doing its job

Fabien COELHO Wed, 27 Aug 2014 00:33:25 -0700


Hello Andres,

[...]
I think you're misunderstanding how spread checkpoints work.

Yep, definitely:-) On the other hand I though I was seeking something"simple", namely correct latency under small load, that I would expect outof the box.

What you describe is reasonable, and is more or less what I was hopingfor, although I thought that bgwriter was involved from the start andcheckpoint would only do what is needed in the end. My mistake.

When the checkpointer process starts a spread checkpoint it first writesall buffers to the kernel in a paced manner.That pace is determined by checkpoint_completion_target andcheckpoint_timeout.


This pacing does not seem to work, even at slow pace.

If you have a stall of roughly the same magnitude (say a factor
of two different), the smaller once a minute, the larger once an
hour. Obviously the once-an-hour one will have a better latency in many,
many more transactions.

I do not believe in delaying as much as possible writing do disk to handlea small load as a viable strategy. However, to show my good will, I havetried to follow your advices: I've launched a 5000 seconds test with 50segments, 30 min timeout, 0.9 completion target, at 25 tps, which is lessthan 1/10 of the maximum throughput.


There are only two time-triggered checkpoints:

  LOG:  checkpoint starting: time
  LOG:  checkpoint complete: wrote 48725 buffers (47.6%);
      1 transaction log file(s) added, 0 removed, 0 recycled;
      write=1619.750 s, sync=27.675 s, total=1647.932 s;
      sync files=14, longest=27.593 s, average=1.976 s

  LOG:  checkpoint starting: time
  LOG:  checkpoint complete: wrote 22533 buffers (22.0%);
      0 transaction log file(s) added, 0 removed, 23 recycled;
      write=826.919 s, sync=9.989 s, total=837.023 s;
      sync files=8, longest=6.742 s, average=1.248 s

For the first one, 48725 buffers is 380MB. 1800 * 0.9 = 1620 seconds tocomplete, so it means 30 buffer writes per second... should be ok. Howeversync costs 27 seconds nevertheless, and the server was more or lessoffline for about 30 seconds flat. For the second one, 180 MB to write, 10seconds offline. For some reason the target time is reduced. I have alsotried with the "deadline" IO scheduler which make more sense than thedefault "cfq", but the result was similar. Not sure how software RAIDinteracts with IO scheduling, though.

Overall result: over the 5000s test, I have lost (i.e. more than 200msbehind schedule) more than 2.5% of transactions (1/40). Due to theunfinished cycle, the long term average is probably about 3%. Although itis better than 10%, it is not good. I would expect/hope for somethingpretty close to 0, even with ext4 on Linux, for a dedicated host which hasnothing else to do but handle two dozen transactions per second.

Current conclusion: I have not found any way to improve the situation to"good" with parameters from the configuration. Currently a small loadresults in periodic offline time, that can be delayed but not avoided. Thedelaying tactic results in less frequent but longer downtime. I preferfrequent very short downtime instead.

I really think that something is amiss. Maybe pg does not handle pacing asit should.

For the record, a 25tps bench with a "small" config (default 3 segments,5min timeout, 0.5 completion target) and with a parallel:


        while true ; do echo "CHECKPOINT;"; sleep 0.2s; done | psql

results in "losing" only 0.01% of transactions (12 transactions out of125893 where behind more than 200ms in 5000 seconds). Although you maythink it stupid, from my point of view it shows that it is possible tocoerce pg to behave.


With respect to the current status:

(1) the ability to put checkpoint_timeout to values smaller than 30s couldhelp, although obviously there would be other consequences. But theability to avoid periodic offline time looks like a desirable objective.

(2) I still think that a parameter to force bgwriter to write more stuffcould help, but this is not tested.


(3) Any other effective idea to configure for responsiveness is welcome!

If someone wants to repeat these tests, it is easy and only takes a fewminutes:


  sh> createdb test
  sh> pgbench -i -s 100 -F 95 test
  sh> pgbench -M prepared -N -R 25 -L 200 -c 2 -T 5000 -P 1 test > pgb.out

Note: the -L to limit latency is a submitted patch. Without this,unresponsiveness shows as increasing laging time.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql latency & bgwriter not doing its job

Reply via email to