On Wed, Jan 15, 2014 at 07:08:18PM -0500, Tom Lane wrote:
> Dave Chinner <da...@fromorbit.com> writes:
> > On Wed, Jan 15, 2014 at 10:12:38AM -0500, Tom Lane wrote:
> >> What we'd really like for checkpointing is to hand the kernel a boatload
> >> (several GB) of dirty pages and say "how about you push all this to disk
> >> over the next few minutes, in whatever way seems optimal given the storage
> >> hardware and system situation.  Let us know when you're done."
> > The issue there is that the kernel has other triggers for needing to
> > clean data. We have no infrastructure to handle variable writeback
> > deadlines at the moment, nor do we have any infrastructure to do
> > roughly metered writeback of such files to disk. I think we could
> > add it to the infrastructure without too much perturbation of the
> > code, but as you've pointed out that still leaves the fact there's
> > no obvious interface to configure such behaviour. Would it need to
> > be persistent?
> No, we'd be happy to re-request it during each checkpoint cycle, as
> long as that wasn't an unduly expensive call to make.  I'm not quite
> sure where such requests ought to "live" though.  One idea is to tie
> them to file descriptors; but the data to be written might be spread
> across more files than we really want to keep open at one time.

It would be a property of the inode, as that is how writeback is
tracked and timed. Set and queried through a file descriptor,
though - it's basically the same context that fadvise works

> But the only other idea that comes to mind is some kind of global sysctl,
> which would probably have security and permissions issues.  (One thing
> that hasn't been mentioned yet in this thread, but maybe is worth pointing
> out now, is that Postgres does not run as root, and definitely doesn't
> want to.  So we don't want a knob that would require root permissions
> to twiddle.)

I have assumed all along that requiring root to do stuff would be a
bad thing. :)

> We could probably live with serially checkpointing data
> in sets of however-many-files-we-can-have-open, if file descriptors are
> the place to keep the requests.

Inodes live longer than file descriptors, but there's no guarantee
that they live from one fd context to another. Hence my question
about persistence ;)


Dave Chinner

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to