On 08/30/2014 09:45 PM, Andres Freund wrote:
On 2014-08-30 14:16:10 -0400, Tom Lane wrote:
Andres Freund <and...@2ndquadrant.com> writes:
On 2014-08-30 13:50:40 -0400, Tom Lane wrote:
A possible compromise is to sort a limited number of
buffers ---- say, collect a few thousand dirty buffers then sort, dump and
fsync them, repeat as needed.

Yea, that's what I suggested nearby. But I don't really like it, because
it robs us of the the chance to fsync() a relfilenode immediately after
having synced all its buffers.

Uh, how so exactly?  You could still do that.  Yeah, you might fsync a rel
once per sort-group and not just once per checkpoint, but it's not clear
that that's a loss as long as the group size isn't tiny.

Because it wouldn't have the benefit of sycing the minimal amount of
data anymore. If lots of other relfilenodes have been synced inbetween
the amount of newly dirtied pages in the os' buffercache (written by
backends, bgwriter) for a individual relfilenode is much higher.

I wonder how much of the benefit from sorting comes from sorting the pages within each file, and how much just from grouping all the writes of each file together. In other words, how much difference is there between sorting, and fsyncing between each file, or the crude patch I posted earlier.

If we're going to fsync between each file, there's no need to sort all the buffers at once. It's enough to pick one file as the target - like in my crude patch - and sort only the buffers for that file. Then fsync that file and move on to the next file. That requires scanning the buffers multiple times, but I think that's OK.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to