On Wed, Jan 15, 2014 at 7:12 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Heikki Linnakangas <hlinnakan...@vmware.com> writes:
> > On 01/15/2014 07:50 AM, Dave Chinner wrote:
> >> FWIW [and I know you're probably sick of hearing this by now], but
> >> the blk-io throttling works almost perfectly with applications that
> >> use direct IO.....
> > For checkpoint writes, direct I/O actually would be reasonable.
> > Bypassing the OS cache is a good thing in that case - we don't want the
> > written pages to evict other pages from the OS cache, as we already have
> > them in the PostgreSQL buffer cache.
> But in exchange for that, we'd have to deal with selecting an order to
> write pages that's appropriate depending on the filesystem layout,
> other things happening in the system, etc etc. We don't want to build
> an I/O scheduler, IMO, but we'd have to.
> > Writing one page at a time with O_DIRECT from a single process might be
> > quite slow, so we'd probably need to use writev() or asynchronous I/O to
> > work around that.
> Yeah, and if the system has multiple spindles, we'd need to be issuing
> multiple O_DIRECT writes concurrently, no?
writev effectively does do that, doesn't it? But they do have to be on the
same file handle, so that could be a problem. I think we need something
like sorted checkpoints sooner or later, anyway.
> What we'd really like for checkpointing is to hand the kernel a boatload
> (several GB) of dirty pages and say "how about you push all this to disk
> over the next few minutes, in whatever way seems optimal given the storage
> hardware and system situation. Let us know when you're done."
And most importantly, "Also, please don't freeze up everything else in the