Date: Fri, 17 Aug 2012 00:26:37 +0100 From: Peter Geoghegan
<pe...@2ndquadrant.com> To: Jeff Janes <jeff.ja...@gmail.com> Cc:
pgsql-hackers <email@example.com> Subject: Re: tuplesort
memory usage: grow_memtuples Message-ID:
On 27 July 2012 16:39, Jeff Janes <jeff.ja...@gmail.com> wrote:
>> Can you suggest a benchmark that will usefully exercise this patch?
> I think the given sizes below work on most 64 bit machines.
I think this patch (or at least your observation about I/O waits
within vmstat) may point to a more fundamental issue with our sort
code: Why are we not using asynchronous I/O in our implementation?
There are anecdotal reports of other RDBMS implementations doing far
better than we do here, and I believe asynchronous I/O, pipelining,
and other such optimisations have a lot to do with that. It's
something I'd hoped to find the time to look at in detail, but
probably won't in the 9.3 cycle. One of the more obvious ways of
optimising an external sort is to use asynchronous I/O so that one run
of data can be sorted or merged while other runs are being read from
or written to disk. Our current implementation seems naive about this.
There are some interesting details about how this is exposed by POSIX
I've recently tried extending the postgresql prefetch mechanism on linux
to use the posix (i.e. librt)
aio_read and friends where possible. In other words, in
PrefetchBuffer(), try getting a buffer
and issuing aio_read before falling back to fposix_advise(). It
gives me about 8% improvement
in throughput relative to the fposix-advise variety, for a workload of
16 highly-disk-read-intensive applications running to 16 backends.
For my test each application runs a query chosen to have plenty of
bitmap heap scans.
I can provide more details on my changes if interested.
On whether this technique might improve sort performance :
First, the disk access pattern for sorting is mostly sequential
(although I think
the sort module does some tricky work with reuse of pages in its
which maybe is random-like), and there are several claims on the net
that linux buffered file handling
already does a pretty good job of read-ahead for a sequential access
without any need for the application to help it.
I can half-confirm that in that I tried adding calls to PrefetchBuffer
in regular heap scan
and did not see much improvement. But I am still pursuing that area.
But second, it would be easy enough to add some fposix_advise calls to
sort and see whether
that helps. (Can't make use of PrefetchBuffer since sort does not use
the regular relation buffer pool)
It's already anticipated that we might take advantage of libaio for
the benefit of FilePrefetch() (see its accompanying comments - it uses
posix_fadvise itself - effective_io_concurrency must be> 0 for this
to ever be called). It perhaps could be considered parallel
"low-hanging fruit" in that it allows us to offer limited though
useful backend parallelism without first resolving thorny issues
around what abstraction we might use, or how we might eventually make
backends thread-safe. AIO supports registering signal callbacks (a
SIGPOLL handler can be called), which seems relatively
I believe libaio is dead, as it depended on the old linux kernel
asynchronous file io,
which was problematic and imposed various restrictions on the application.
librt aio has no restrictions and does a good enough job but uses pthreads
and synchronous io, which can make CPU overhead a bit heavy and also I
results in causing more context switching than with synchronous io,
whereas one of the benefits of kernel async io (in theory) is reduce
From what I've seen, pthreads aio can give a benefit when there is
high IO wait
from mostly-read activity, the disk access pattern is not sequential
(so kernel readahead
cant predict it) but postgresql can predict it, and there's enough
spare idle CPU to
run the pthreads. So it does seem that bitmap heap scan is a good
choice for prefetching.
Platform support for AIO might be a bit lacking, but then you can say
the same about posix_fadvise. We don't assume that poll(2) is
available, but we already use it where it is within the latch code.
Besides, in-kernel support can be emulated if POSIX threads is
available, which I believe would make this broadly useful on unix-like
-- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development,
24x7 Support, Training and Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: