On Tue, Jan 14, 2014 at 5:23 PM, Dave Chinner <da...@fromorbit.com> wrote:
> By default, background writeback doesn't start until 10% of memory
> is dirtied, and on your machine that's 25GB of RAM. That's way to
> high for your workload.
> It appears to me that we are seeing large memory machines much more
> commonly in data centers - a couple of years ago 256GB RAM was only
> seen in supercomputers. Hence machines of this size are moving from
> "tweaking settings for supercomputers is OK" class to "tweaking
> settings for enterprise servers is not OK"....
> Perhaps what we need to do is deprecate dirty_ratio and
> dirty_background_ratio as the default values as move to the byte
> based values as the defaults and cap them appropriately. e.g.
> 10/20% of RAM for small machines down to a couple of GB for large
I think that's right. In our case we know we're going to call fsync()
eventually and that's going to produce a torrent of I/O. If that
torrent fits in downstream caches or can be satisfied quickly without
disrupting the rest of the system too much, then life is good. But
the downstream caches don't typically grow proportionately to the size
of system memory. Maybe a machine with 16GB has 1GB of battery-backed
write cache, but it doesn't follow that 256GB machine has 16GB of
battery-backed write cache.
> Essentially, changing dirty_background_bytes, dirty_bytes and
> dirty_expire_centiseconds to be much smaller should make the kernel
> start writeback much sooner and so you shouldn't have to limit the
> amount of buffers the application has to prevent major fsync
> triggered stalls...
I think this has been tried with some success, but I don't know the
details. I think the bytes values are clearly more useful than the
percentages, because you can set them smaller and with better
One thought that occurs to me is that it might be useful to have
PostgreSQL tell the system when we expect to perform an fsync.
Imagine fsync_is_coming(int fd, time_t). We know long in advance
(minutes) when we're gonna do it, so in some sense what we'd like to
tell the kernel is: we're not in a hurry to get this data on disk
right now, but when the indicated time arrives, we are going to do
fsyncs of a bunch of files in rapid succession, so please arrange to
flush the data as close to that time as possible (to maximize
write-combining) while still finishing by that time (so that the
fsyncs are fast and more importantly so that they don't cause a
The Enterprise PostgreSQL Company
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: