Re: [HACKERS] Partitioned checkpointing

Fabien COELHO Sat, 26 Sep 2015 05:31:21 -0700


Hello,


These are interesting runs.

In a situation in which small values are set in dirty_bytes anddirty_backgound_bytes, a buffer is likely stored in the HD immediatelyafter the buffer is written in the kernel by the checkpointer. Thus, Itried a quick hack to make the checkpointer invoke write system call towrite a dirty buffer immediately followed by invoking store operationfor a buffer implemented with sync_file_range() system call. # Forreference, I attach the patch. As shown in file_sync_range.JPG, thisstrategy considered to have been effective.


Indeed. This approach is part of this current patch:

        https://commitfest.postgresql.org/6/260/

Basically, what you do is to call sync_file_range on each block, and youtested on a high-end system probably with a lot of BBU disk cache, which Iguess allows the disk to reorder writes so as to benefit from sequentialwrite performance.

In conclusion, as long as pgbench execution against linux concerns,using sync_file_range() is a promising solution.

I found that calling sync_file_range for every block could degradeperformance a bit under some conditions, at least onmy low-end systems(just a [raid] disk, no significant disk cache in front of it), so theabove patch aggregates neighboring writes so as to issue lesssync_file_range calls.

That is, the checkpointer invokes sync_file_range() to store a bufferimmediately after it writes the buffer in the kernel.

Yep. It is interesting that sync_file_range alone improves stability a loton your high-end system, although sorting is mandatory for low-endsystems.

My interpretation, already stated above, is that the hardware does thesorting on the cached data at the disk level in your system.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned checkpointing

Reply via email to