On Tue, Dec 10, 2013 at 9:22 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
>> Communicating more with the kernel (through posix_fadvise, fallocate,
>> aio, iovec, etc...) would probably be good, but it does expose more
>> kernel issues. posix_fadvise, for instance, is a double-edged sword
>> ATM. I do believe, however, that exposing those issues and prompting a
>> fix is far preferable than silently working around them.
> Getting the kernel to improve those things so PostgreSQL can be changed to
> use them more aggressively seems almost hopeless to me. PostgreSQL would
> have to be coded to take advantage of the improved versions, while defending
> itself from the pre-improved versions. And my understanding is that
> different distributions of Linux cherry pick changes to the kernel back and
> forth into their code, so just looking at the kernel version number without
> also looking at the distribution doesn't mean very much about whether we
> have the improved feature or not. Or am I misinformed about that?
> If we can point things out to the kernel hackers things that would be
> absolute improvements, where PostgreSQL and everything else just magically
> start working better if that improvement makes it in, that is great. Both if
> both systems have to be changed in sync to derive any benefit, how do we
> coordinate that?
Well, posix_fadvise is one such thing. It's a cheap form of AIO used
by more than a few programs that want I/O performance, and in its
current form is sub-optimal, the fix is rather simple, it just needs a
lot of testing.
But my report on LKML spurred little actual work. So it's possible
this kind of thing will need patches attached.
On Tue, Dec 10, 2013 at 9:34 PM, Andres Freund <and...@2ndquadrant.com> wrote:
> On 2013-12-04 05:39:23 -0200, Claudio Freire wrote:
>> Problem is, Postgres relies on a working kernel cache for checkpoints.
>> Checkpoint logic would have to be heavily reworked to account for an
>> impaired kernel cache.
> I don't think checkpoints are the critical problem with that, they are
> nicely in the background and we could easily add sorting.
Problem is, with DirectIO, they won't be so background.
Currently, checkpoints assume there's a background process catching
all I/O requests, sorting them, and flushing them as optimally as
possible. This makes the checkpoint's slow-paced write pattern
benignly background, since it will be scheduled opportunistically by
If you use DirectIO, however, a write will pretty much physically move
the writing head (when it reaches the queue's head at least) of
rotating media, causing delays on all other pending I/O requests.
That's quite un-backgroundly of it.
A few blocks per second like that can pretty much kill sequential
scans (I've seen that effect happen with fadvise).
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: