Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-05-02 Thread Robert Haas
On Fri, Apr 29, 2016 at 7:08 AM, Bruce Momjian  wrote:
> On Wed, Apr  6, 2016 at 12:57:16PM +0200, Andres Freund wrote:
>> While benchmarking on hydra
>> (c.f. 
>> http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
>> which has quite slow IO, I was once more annoyed by how incredibly long
>> the vacuum at the the end of a pgbench -i takes.
>>
>> The issue is that, even for an entirely shared_buffers resident scale,
>> essentially no data is cached in shared buffers. The COPY to load data
>> uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
>> that copy immediately writes and evicts all data. Then vacuum reads &
>> writes the data in small chunks; again evicting nearly all buffers. Then
>> the creation of the ringbuffer has to read that data *again*.
>>
>> That's fairly idiotic.
>>
>> While it's not easy to fix this in the general case, we introduced those
>> ringbuffers for a reason after all, I think we at least should add a
>> special case for loads where shared_buffers isn't fully used yet.  Why
>> not skip using buffers from the ringbuffer if there's buffers on the
>> freelist? If we add buffers gathered from there to the ringlist, we
>> should have few cases that regress.
>>
>> Additionally, maybe we ought to increase the ringbuffer sizes again one
>> of these days? 256kb for VACUUM is pretty damn low.
>
> Is this a TODO?

I think we are in agreement that some changes may be needed, but I
don't think we necessarily know what the changes are.  So you could
say something like "improve VACUUM ring buffer logic", for example,
but I think something specific like "increase size of the VACUUM ring
buffer" will just encourage someone to do it as a beginner project,
which it really isn't.  Maybe others disagree, but I don't think this
is a slam-dunk where we can just change the behavior in 10 minutes and
expect to have winners but no losers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-29 Thread Bruce Momjian
On Wed, Apr  6, 2016 at 12:57:16PM +0200, Andres Freund wrote:
> Hi,
> 
> While benchmarking on hydra
> (c.f. 
> http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
> which has quite slow IO, I was once more annoyed by how incredibly long
> the vacuum at the the end of a pgbench -i takes.
> 
> The issue is that, even for an entirely shared_buffers resident scale,
> essentially no data is cached in shared buffers. The COPY to load data
> uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
> that copy immediately writes and evicts all data. Then vacuum reads &
> writes the data in small chunks; again evicting nearly all buffers. Then
> the creation of the ringbuffer has to read that data *again*.
> 
> That's fairly idiotic.
> 
> While it's not easy to fix this in the general case, we introduced those
> ringbuffers for a reason after all, I think we at least should add a
> special case for loads where shared_buffers isn't fully used yet.  Why
> not skip using buffers from the ringbuffer if there's buffers on the
> freelist? If we add buffers gathered from there to the ringlist, we
> should have few cases that regress.
> 
> Additionally, maybe we ought to increase the ringbuffer sizes again one
> of these days? 256kb for VACUUM is pretty damn low.

Is this a TODO?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-16 Thread Amit Kapila
On Thu, Apr 14, 2016 at 10:22 AM, Peter Geoghegan  wrote:
>
> On Tue, Apr 12, 2016 at 11:38 AM, Andres Freund 
wrote:
> >> And, on the other hand, if we don't do something like that, it will be
> >> quite an exceptional case to find anything on the free list.  Doing it
> >> just to speed up developer benchmarking runs seems like the wrong
> >> idea.
> >
> > I don't think it's just developer benchmarks. I've seen a number of
> > customer systems where significant portions of shared buffers were
> > unused due to this.
> >
> > Unless you have an OLTP system, you can right now easily end up in a
> > situation where, after a restart, you'll never fill shared_buffers.
> > Just because sequential scans for OLAP and COPY use ringbuffers. It sure
> > isn't perfect to address the problem while there's free space in s_b,
> > but it sure is better than to just continue to have significant portions
> > of s_b unused.
>
> I agree that the ringbuffer heuristics are rather unhelpful in many
> real-world scenarios. This is definitely a real problem that we should
> try to solve soon.
>
> An adaptive strategy based on actual cache pressure in the recent past
> would be better. Maybe that would be as simple as not using a
> ringbuffer based on simply not having used up all of shared_buffers
> yet. That might not be good enough, but it would probably still be
> better than what we have.
>

I think that such a strategy could be helpful in certain cases, but not
sure every time using it can be beneficial.  There could be cases where we
extend ring buffers to use unused buffers in shared buffer pool for bulk
processing workloads and immediately after that there is a demand for
buffers from other statements.   Not sure, but I think an idea of different
kind of buffer pools can be helpful for some such cases.   Different kind
of buffer pools could be ring buffers, extended ring buffers (relations
associated with such buffer pools can bypass ring buffers and use unused
shared buffers), retain or keep buffers (relations that are frequently
accessed can be associated with this kind of buffer pool where buffers can
stay for longer time) and a default buffer pool (all relations by default
will be associated with default buffer pool where the behaviour will be
same as current).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-13 Thread Jeff Janes
On Tue, Apr 12, 2016 at 11:38 AM, Andres Freund  wrote:


>
>> The bottom line
>> here, IMHO, is not that there's anything wrong with our ring buffer
>> implementation, but that if you run PostgreSQL on a system where the
>> I/O is hitting a 5.25" floppy (not to say 8") the performance may be
>> less than ideal.  I really appreciate IBM donating hydra - it's been
>> invaluable over the years for improving PostgreSQL performance - but I
>> sure wish they had donated a better I/O subsystem.

When I had this problem some years ago, I traced it down to the fact
you have to sync the WAL before you can evict a dirty page.  If your
vacuum is doing a meaningful amount of cleaning, you encounter a dirty
page with a not-already-synced LSN about once per trip around the ring
buffer.   That really destroys your vacuuming performance with a 256kB
ring if your fsync actually has to reach spinning disk.  What I ended
up doing is hacking it so that it used a BAS_BULKWRITE when the vacuum
was being run with a zero vacuum cost delay.

> It's really not just hydra. I've seen the same problem on 24 disk raid-0
> type installations. The small ringbuffer leads to reads/writes being
> constantly interspersed, apparently defeating readahead.

Was their a BBU on that?  I would think slow fsyncs are more likely
than defeated readahead.  On the other hand, I don't hear about too
many 24-disk RAIDS without a BBU.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-13 Thread Peter Geoghegan
On Tue, Apr 12, 2016 at 11:38 AM, Andres Freund  wrote:
>> And, on the other hand, if we don't do something like that, it will be
>> quite an exceptional case to find anything on the free list.  Doing it
>> just to speed up developer benchmarking runs seems like the wrong
>> idea.
>
> I don't think it's just developer benchmarks. I've seen a number of
> customer systems where significant portions of shared buffers were
> unused due to this.
>
> Unless you have an OLTP system, you can right now easily end up in a
> situation where, after a restart, you'll never fill shared_buffers.
> Just because sequential scans for OLAP and COPY use ringbuffers. It sure
> isn't perfect to address the problem while there's free space in s_b,
> but it sure is better than to just continue to have significant portions
> of s_b unused.

I agree that the ringbuffer heuristics are rather unhelpful in many
real-world scenarios. This is definitely a real problem that we should
try to solve soon.

An adaptive strategy based on actual cache pressure in the recent past
would be better. Maybe that would be as simple as not using a
ringbuffer based on simply not having used up all of shared_buffers
yet. That might not be good enough, but it would probably still be
better than what we have.

Separately, I agree that 256KB is way too low for VACUUM these days.
There is a comment in the buffer directory README about that being
"small enough to fit in L2 cache". I'm pretty sure that that's still
true at least one time over with the latest Raspberry Pi model, so it
should be revisited.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-13 Thread Amit Kapila
On Wed, Apr 13, 2016 at 12:08 AM, Andres Freund  wrote:
>
> On 2016-04-12 14:29:10 -0400, Robert Haas wrote:
> > On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund 
wrote:
> > > While benchmarking on hydra
> > > (c.f.
http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de
),
> > > which has quite slow IO, I was once more annoyed by how incredibly
long
> > > the vacuum at the the end of a pgbench -i takes.
> > >
> > > The issue is that, even for an entirely shared_buffers resident scale,
> > > essentially no data is cached in shared buffers. The COPY to load data
> > > uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which
means
> > > that copy immediately writes and evicts all data. Then vacuum reads &
> > > writes the data in small chunks; again evicting nearly all buffers.
Then
> > > the creation of the ringbuffer has to read that data *again*.
> > >
> > > That's fairly idiotic.
> > >
> > > While it's not easy to fix this in the general case, we introduced
those
> > > ringbuffers for a reason after all, I think we at least should add a
> > > special case for loads where shared_buffers isn't fully used yet.  Why
> > > not skip using buffers from the ringbuffer if there's buffers on the
> > > freelist? If we add buffers gathered from there to the ringlist, we
> > > should have few cases that regress.
> >
> > That does not seem like a good idea from here.  One of the ideas I
> > still want to explore at some point is having a background process
> > identify the buffers that are just about to be evicted and stick them
> > on the freelist so that the backends don't have to run the clock sweep
> > themselves on a potentially huge number of buffers, at perhaps
> > substantial CPU cost.  Amit's last attempt at this didn't really pan
> > out, but I'm not convinced that the approach is without merit.
>

Yeah and IIRC, I observed that there was lot of contention in dynahash
table (when data doesn't fit in shared buffers) due to which the
improvement hasn't shown measurable gain in terms of TPS.  As now in 9.6,
we have reduced the contention (spinlocks) in dynahash tables, it might be
interesting to run the tests again.

> FWIW, I've posted an implementation of this in the checkpoint flushing
> thread; I saw quite substantial gains with it. It was just entirely
> unrealistic to push that into 9.6.
>

Sounds good.  I remember last time you mentioned that such an idea could
benefit bulk load case when data doesn't fit in shared buffers, is it the
same case where you saw benefit or other cases like read-only and
read-write tests as well.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-13 Thread Andres Freund
On 2016-04-13 06:57:15 -0400, Robert Haas wrote:
> You will eventually, because each scan will pick a new ring buffer,
> and gradually more and more of the relation will get cached.  But it
> can take a while.

You really don't need much new data to make that an unobtainable goal
... :/


> I'd be more inclined to try to fix this by prewarming the buffers that
> were in shared_buffers at shutdown.

That doesn't solve the problem of not reacting to actual new data? It's
not that uncommon to regularly load new data with copy and drop old
partitions, just to keep the workload memory resident...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-13 Thread Robert Haas
On Tue, Apr 12, 2016 at 2:38 PM, Andres Freund  wrote:
>> And, on the other hand, if we don't do something like that, it will be
>> quite an exceptional case to find anything on the free list.  Doing it
>> just to speed up developer benchmarking runs seems like the wrong
>> idea.
>
> I don't think it's just developer benchmarks. I've seen a number of
> customer systems where significant portions of shared buffers were
> unused due to this.
>
> Unless you have an OLTP system, you can right now easily end up in a
> situation where, after a restart, you'll never fill shared_buffers.
> Just because sequential scans for OLAP and COPY use ringbuffers. It sure
> isn't perfect to address the problem while there's free space in s_b,
> but it sure is better than to just continue to have significant portions
> of s_b unused.

You will eventually, because each scan will pick a new ring buffer,
and gradually more and more of the relation will get cached.  But it
can take a while.

I'd be more inclined to try to fix this by prewarming the buffers that
were in shared_buffers at shutdown.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-12 Thread Stephen Frost
Robert, Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2016-04-12 14:29:10 -0400, Robert Haas wrote:
> > On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund  wrote:
> > That does not seem like a good idea from here.  One of the ideas I
> > still want to explore at some point is having a background process
> > identify the buffers that are just about to be evicted and stick them
> > on the freelist so that the backends don't have to run the clock sweep
> > themselves on a potentially huge number of buffers, at perhaps
> > substantial CPU cost.  Amit's last attempt at this didn't really pan
> > out, but I'm not convinced that the approach is without merit.
> 
> FWIW, I've posted an implementation of this in the checkpoint flushing
> thread; I saw quite substantial gains with it. It was just entirely
> unrealistic to push that into 9.6.

That is fantastic to hear and I certainly agree that we should be
working on that approach.

> > And, on the other hand, if we don't do something like that, it will be
> > quite an exceptional case to find anything on the free list.  Doing it
> > just to speed up developer benchmarking runs seems like the wrong
> > idea.
> 
> I don't think it's just developer benchmarks. I've seen a number of
> customer systems where significant portions of shared buffers were
> unused due to this.

Ditto.

I agree that we should be smarter when we have a bunch of free
shared_buffers space and we're doing sequential work.  I don't think we
want to immediately grab all that free space for the sequential work but
perhaps there's a reasonable heuristic we could use- such as if the free
space available is twice what we expect our sequential read to be, then
go ahead and load it into shared buffers?

The point here isn't to get rid of the ring buffers but rather to use
the shared buffer space when we have plenty of it and there isn't
contention for it.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-12 Thread Andres Freund
On 2016-04-12 14:29:10 -0400, Robert Haas wrote:
> On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund  wrote:
> > While benchmarking on hydra
> > (c.f. 
> > http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
> > which has quite slow IO, I was once more annoyed by how incredibly long
> > the vacuum at the the end of a pgbench -i takes.
> >
> > The issue is that, even for an entirely shared_buffers resident scale,
> > essentially no data is cached in shared buffers. The COPY to load data
> > uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
> > that copy immediately writes and evicts all data. Then vacuum reads &
> > writes the data in small chunks; again evicting nearly all buffers. Then
> > the creation of the ringbuffer has to read that data *again*.
> >
> > That's fairly idiotic.
> >
> > While it's not easy to fix this in the general case, we introduced those
> > ringbuffers for a reason after all, I think we at least should add a
> > special case for loads where shared_buffers isn't fully used yet.  Why
> > not skip using buffers from the ringbuffer if there's buffers on the
> > freelist? If we add buffers gathered from there to the ringlist, we
> > should have few cases that regress.
> 
> That does not seem like a good idea from here.  One of the ideas I
> still want to explore at some point is having a background process
> identify the buffers that are just about to be evicted and stick them
> on the freelist so that the backends don't have to run the clock sweep
> themselves on a potentially huge number of buffers, at perhaps
> substantial CPU cost.  Amit's last attempt at this didn't really pan
> out, but I'm not convinced that the approach is without merit.

FWIW, I've posted an implementation of this in the checkpoint flushing
thread; I saw quite substantial gains with it. It was just entirely
unrealistic to push that into 9.6.


> And, on the other hand, if we don't do something like that, it will be
> quite an exceptional case to find anything on the free list.  Doing it
> just to speed up developer benchmarking runs seems like the wrong
> idea.

I don't think it's just developer benchmarks. I've seen a number of
customer systems where significant portions of shared buffers were
unused due to this.

Unless you have an OLTP system, you can right now easily end up in a
situation where, after a restart, you'll never fill shared_buffers.
Just because sequential scans for OLAP and COPY use ringbuffers. It sure
isn't perfect to address the problem while there's free space in s_b,
but it sure is better than to just continue to have significant portions
of s_b unused.


> > Additionally, maybe we ought to increase the ringbuffer sizes again one
> > of these days? 256kb for VACUUM is pretty damn low.
> 
> But all that does is force the backend to write to the operating
> system, which is where the real buffering happens.

Relying on that has imo proven to be a pretty horrible idea.


> The bottom line
> here, IMHO, is not that there's anything wrong with our ring buffer
> implementation, but that if you run PostgreSQL on a system where the
> I/O is hitting a 5.25" floppy (not to say 8") the performance may be
> less than ideal.  I really appreciate IBM donating hydra - it's been
> invaluable over the years for improving PostgreSQL performance - but I
> sure wish they had donated a better I/O subsystem.

It's really not just hydra. I've seen the same problem on 24 disk raid-0
type installations. The small ringbuffer leads to reads/writes being
constantly interspersed, apparently defeating readahead.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Detrimental performance impact of ringbuffers on performance

2016-04-12 Thread Robert Haas
On Wed, Apr 6, 2016 at 6:57 AM, Andres Freund  wrote:
> While benchmarking on hydra
> (c.f. 
> http://archives.postgresql.org/message-id/20160406104352.5bn3ehkcsceja65c%40alap3.anarazel.de),
> which has quite slow IO, I was once more annoyed by how incredibly long
> the vacuum at the the end of a pgbench -i takes.
>
> The issue is that, even for an entirely shared_buffers resident scale,
> essentially no data is cached in shared buffers. The COPY to load data
> uses a 16MB ringbuffer. Then vacuum uses a 256KB ringbuffer. Which means
> that copy immediately writes and evicts all data. Then vacuum reads &
> writes the data in small chunks; again evicting nearly all buffers. Then
> the creation of the ringbuffer has to read that data *again*.
>
> That's fairly idiotic.
>
> While it's not easy to fix this in the general case, we introduced those
> ringbuffers for a reason after all, I think we at least should add a
> special case for loads where shared_buffers isn't fully used yet.  Why
> not skip using buffers from the ringbuffer if there's buffers on the
> freelist? If we add buffers gathered from there to the ringlist, we
> should have few cases that regress.

That does not seem like a good idea from here.  One of the ideas I
still want to explore at some point is having a background process
identify the buffers that are just about to be evicted and stick them
on the freelist so that the backends don't have to run the clock sweep
themselves on a potentially huge number of buffers, at perhaps
substantial CPU cost.  Amit's last attempt at this didn't really pan
out, but I'm not convinced that the approach is without merit.

And, on the other hand, if we don't do something like that, it will be
quite an exceptional case to find anything on the free list.  Doing it
just to speed up developer benchmarking runs seems like the wrong
idea.

> Additionally, maybe we ought to increase the ringbuffer sizes again one
> of these days? 256kb for VACUUM is pretty damn low.

But all that does is force the backend to write to the operating
system, which is where the real buffering happens.  The bottom line
here, IMHO, is not that there's anything wrong with our ring buffer
implementation, but that if you run PostgreSQL on a system where the
I/O is hitting a 5.25" floppy (not to say 8") the performance may be
less than ideal.  I really appreciate IBM donating hydra - it's been
invaluable over the years for improving PostgreSQL performance - but I
sure wish they had donated a better I/O subsystem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers