Thank you for comments!
>> On Tue, Jun 25, 2013 at 1:15 PM, Heikki Linnakangas
Hmm, so the write patch doesn't do much, but the fsync patch makes the response
times somewhat smoother. I'd suggest that we drop the write patch for now, and
>>> focus on the fsyncs.
Write patch is effective in TPS! I think that delay of checkpoint write is
caused
long time fsync and heavy load in fsync phase. Because it go slow disk right in
write
phase. Therefore, combination of write patch and fsync patch are suiter each
other than
only write patch. I think that amount of WAL write in beginning of checkpoint can
indicate effect of write patch.
>>> What checkpointer_fsync_delay_ratio and checkpointer_fsync_delay_threshold
>>> settings did you use with the fsync patch? It's disabled by default.
I used these parameters.
checkpointer_fsync_delay_ratio = 1
checkpointer_fsync_delay_threshold = 1000ms
As a matter of fact, I used long time sleep in slow fsyncs.
And other maintains parameters are here.
checkpoint_completion_target = 0.7
checkpoint_smooth_target = 0.3
checkpoint_smooth_margin = 0.5
checkpointer_write_delay = 200ms
Attached is a quick patch to implement a fixed, 100ms delay between fsyncs, and
the
assumption that fsync phase is 10% of the total checkpoint duration. I suspect
100ms
>>> is too small to have much effect, but that happens to be what we have
currently in
CheckpointWriteDelay(). Could you test this patch along with yours? If you can
test
with different delays (e.g 100ms, 500ms and 1000ms) and different ratios between
the write and fsync phase (e.g 0.5, 0.7, 0.9), to get an idea of how sensitive
the
test case is to those settings.
It seems interesting algorithm! I will test it in same setting and study about
your patch essence.
(2013/06/26 5:28), Heikki Linnakangas wrote:
On 25.06.2013 23:03, Robert Haas wrote:
On Tue, Jun 25, 2013 at 1:15 PM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
I'm not sure it's a good idea to sleep proportionally to the time it took to
complete the previous fsync. If you have a 1GB cache in the RAID controller,
fsyncing the a 1GB segment will fill it up. But since it fits in cache, it
will return immediately. So we proceed fsyncing other files, until the cache
is full and the fsync blocks. But once we fill up the cache, it's likely
that we're hurting concurrent queries. ISTM it would be better to stay under
that threshold, keeping the I/O system busy, but never fill up the cache
completely.
Isn't the behavior implemented by the patch a reasonable approximation
of just that? When the fsyncs start to get slow, that's when we start
to sleep. I'll grant that it would be better to sleep when the
fsyncs are *about* to get slow, rather than when they actually have
become slow, but we have no way to know that.
Well, that's the point I was trying to make: you should sleep *before* the
fsyncs
get slow.
Actuary, fsync time is changed by progress of background disk writes in OS. We
cannot know about progress of background disk write before fsyncs. I think
Robert's argument is right. Please see under following log messages.
* fsync file which had been already wrote in disk
DEBUG: 00000: checkpoint sync: number=23 file=base/16384/16413.5 time=2.546
msec
DEBUG: 00000: checkpoint sync: number=24 file=base/16384/16413.6 time=3.174
msec
DEBUG: 00000: checkpoint sync: number=25 file=base/16384/16413.7 time=2.358
msec
DEBUG: 00000: checkpoint sync: number=26 file=base/16384/16413.8 time=2.013
msec
DEBUG: 00000: checkpoint sync: number=27 file=base/16384/16413.9 time=1232.535
msec
DEBUG: 00000: checkpoint sync: number=28 file=base/16384/16413_fsm time=0.005
msec
* fsync file which had not been wrote in disk very much
DEBUG: 00000: checkpoint sync: number=54 file=base/16384/16419.8 time=3408.759
msec
DEBUG: 00000: checkpoint sync: number=55 file=base/16384/16419.9 time=3857.075
msec
DEBUG: 00000: checkpoint sync: number=56 file=base/16384/16419.10
time=13848.237 msec
DEBUG: 00000: checkpoint sync: number=57 file=base/16384/16419.11 time=898.836
msec
DEBUG: 00000: checkpoint sync: number=58 file=base/16384/16419_fsm time=0.004
msec
DEBUG: 00000: checkpoint sync: number=59 file=base/16384/16419_vm time=0.002
msec
I think it is wasteful of sleep every fsyncs including short time, and fsync time
performance is also changed by hardware which is like RAID card and kind of or
number of disks and OS. So it is difficult to set fixed-sleep-time. My proposed
method will be more adoptive in these cases.
The only feedback we have on how bad things are is how long it took
the last fsync to complete, so I actually think that's a much better
way to go than any fixed sleep - which will often be unnecessarily
long on a well-behaved system, and which will often be far too short
on one that's having trouble. I'm inclined to think think Kondo-san
has got it right.
Quite possible, I really don't know. I'm inclined to first try the simplest
thing
possible, and only make it more complicated if that's not good enough.
Kondo-san's patch wasn't very complicated, but nevertheless a fixed sleep
between
every fsync, unless you're behind the schedule, is even simpler. In particular,
it's easier to tie that into the checkpoint scheduler - I'm not sure how you'd
measure progress or determine how long to sleep unless you assume that every
fsync is the same.
I think it is important in phase of fsync that short time as possible without IO
freeze, keep schedule of checkpoint, and good for executing transactions. I try
to make progress patch in that's point of view. By the way, executing DBT-2
benchmark has long time(It may be four hours.). For that reason I hope that don't
mind my late reply very much! :-)
Best Regards,
--
Mitsumasa KONDO
NTT Open Sorce Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers