On 2016-04-12 14:29:10 -0400, Robert Haas wrote:
> On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund <and...@anarazel.de> wrote:
> > While benchmarking on hydra
> > (c.f. 
> > http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
> > which has quite slow IO, I was once more annoyed by how incredibly long
> > the vacuum at the the end of a pgbench -i takes.
> >
> > The issue is that, even for an entirely shared_buffers resident scale,
> > essentially no data is cached in shared buffers. The COPY to load data
> > uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
> > that copy immediately writes and evicts all data. Then vacuum reads &
> > writes the data in small chunks; again evicting nearly all buffers. Then
> > the creation of the ringbuffer has to read that data *again*.
> >
> > That's fairly idiotic.
> >
> > While it's not easy to fix this in the general case, we introduced those
> > ringbuffers for a reason after all, I think we at least should add a
> > special case for loads where shared_buffers isn't fully used yet.  Why
> > not skip using buffers from the ringbuffer if there's buffers on the
> > freelist? If we add buffers gathered from there to the ringlist, we
> > should have few cases that regress.
> That does not seem like a good idea from here.  One of the ideas I
> still want to explore at some point is having a background process
> identify the buffers that are just about to be evicted and stick them
> on the freelist so that the backends don't have to run the clock sweep
> themselves on a potentially huge number of buffers, at perhaps
> substantial CPU cost.  Amit's last attempt at this didn't really pan
> out, but I'm not convinced that the approach is without merit.

FWIW, I've posted an implementation of this in the checkpoint flushing
thread; I saw quite substantial gains with it. It was just entirely
unrealistic to push that into 9.6.

> And, on the other hand, if we don't do something like that, it will be
> quite an exceptional case to find anything on the free list.  Doing it
> just to speed up developer benchmarking runs seems like the wrong
> idea.

I don't think it's just developer benchmarks. I've seen a number of
customer systems where significant portions of shared buffers were
unused due to this.

Unless you have an OLTP system, you can right now easily end up in a
situation where, after a restart, you'll never fill shared_buffers.
Just because sequential scans for OLAP and COPY use ringbuffers. It sure
isn't perfect to address the problem while there's free space in s_b,
but it sure is better than to just continue to have significant portions
of s_b unused.

> > Additionally, maybe we ought to increase the ringbuffer sizes again one
> > of these days? 256kb for VACUUM is pretty damn low.
> But all that does is force the backend to write to the operating
> system, which is where the real buffering happens.

Relying on that has imo proven to be a pretty horrible idea.

> The bottom line
> here, IMHO, is not that there's anything wrong with our ring buffer
> implementation, but that if you run PostgreSQL on a system where the
> I/O is hitting a 5.25" floppy (not to say 8") the performance may be
> less than ideal.  I really appreciate IBM donating hydra - it's been
> invaluable over the years for improving PostgreSQL performance - but I
> sure wish they had donated a better I/O subsystem.

It's really not just hydra. I've seen the same problem on 24 disk raid-0
type installations. The small ringbuffer leads to reads/writes being
constantly interspersed, apparently defeating readahead.


Andres Freund

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to