On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote: > Dave Chinner <da...@fromorbit.com> writes: > > On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote: > >> What we'd really like for checkpointing is to hand the kernel a boatload > >> (several GB) of dirty pages and say "how about you push all this to disk > >> over the next few minutes, in whatever way seems optimal given the storage > >> hardware and system situation. Let us know when you're done." > > > The issue there is that the kernel has other triggers for needing to > > clean data. We have no infrastructure to handle variable writeback > > deadlines at the moment, nor do we have any infrastructure to do > > roughly metered writeback of such files to disk. I think we could > > add it to the infrastructure without too much perturbation of the > > code, but as you've pointed out that still leaves the fact there's > > no obvious interface to configure such behaviour. Would it need to > > be persistent? > > No, we'd be happy to re-request it during each checkpoint cycle, as > long as that wasn't an unduly expensive call to make. I'm not quite > sure where such requests ought to "live" though. One idea is to tie > them to file descriptors; but the data to be written might be spread > across more files than we really want to keep open at one time.
It would be a property of the inode, as that is how writeback is tracked and timed. Set and queried through a file descriptor, though - it's basically the same context that fadvise works through. > But the only other idea that comes to mind is some kind of global sysctl, > which would probably have security and permissions issues. (One thing > that hasn't been mentioned yet in this thread, but maybe is worth pointing > out now, is that Postgres does not run as root, and definitely doesn't > want to. So we don't want a knob that would require root permissions > to twiddle.) I have assumed all along that requiring root to do stuff would be a bad thing. :) > We could probably live with serially checkpointing data > in sets of however-many-files-we-can-have-open, if file descriptors are > the place to keep the requests. Inodes live longer than file descriptors, but there's no guarantee that they live from one fd context to another. Hence my question about persistence ;) Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers