Re: [HACKERS] Perf Benchmarking and regression.

2016-06-10 Thread Andres Freund
On 2016-06-09 17:19:34 -0700, Andres Freund wrote:
> On 2016-06-09 14:37:31 -0700, Andres Freund wrote:
> > I'm writing a patch right now, planning to post it later today, commit
> > it tomorrow.
> 
> Attached.

And pushed. Thanks to Michael for noticing the missing addition of
header file hunk.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-09 Thread Michael Paquier
On Fri, Jun 10, 2016 at 9:42 AM, Andres Freund  wrote:
> On 2016-06-10 09:41:09 +0900, Michael Paquier wrote:
>> On Fri, Jun 10, 2016 at 9:37 AM, Andres Freund  wrote:
>> > On 2016-06-10 09:34:33 +0900, Michael Paquier wrote:
>> >> On Fri, Jun 10, 2016 at 9:19 AM, Andres Freund  wrote:
>> >> > On 2016-06-09 14:37:31 -0700, Andres Freund wrote:
>> >> >> I'm writing a patch right now, planning to post it later today, commit
>> >> >> it tomorrow.
>> >> >
>> >> > Attached.
>> >>
>> >> -/* see bufmgr.h: OS dependent default */
>> >> -DEFAULT_BACKEND_FLUSH_AFTER, 0, WRITEBACK_MAX_PENDING_FLUSHES,
>> >> +0, 0, WRITEBACK_MAX_PENDING_FLUSHES,
>> >> Wouldn't it be better to still use LT_BACKEND_FLUSH_AFTER here, and
>> >> just enforce it to 0 for all the OSes at the top of bufmgr.h?
>> >
>> > What would be the point? The only reason for DEFAULT_BACKEND_FLUSH_AFTER
>> > was that it differed between operating systems. Now it doesn't anymore.
>>
>> Then why do you keep it defined?
>
> Ooops. Missing git add.

:)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-09 Thread Andres Freund
On 2016-06-10 09:41:09 +0900, Michael Paquier wrote:
> On Fri, Jun 10, 2016 at 9:37 AM, Andres Freund  wrote:
> > On 2016-06-10 09:34:33 +0900, Michael Paquier wrote:
> >> On Fri, Jun 10, 2016 at 9:19 AM, Andres Freund  wrote:
> >> > On 2016-06-09 14:37:31 -0700, Andres Freund wrote:
> >> >> I'm writing a patch right now, planning to post it later today, commit
> >> >> it tomorrow.
> >> >
> >> > Attached.
> >>
> >> -/* see bufmgr.h: OS dependent default */
> >> -DEFAULT_BACKEND_FLUSH_AFTER, 0, WRITEBACK_MAX_PENDING_FLUSHES,
> >> +0, 0, WRITEBACK_MAX_PENDING_FLUSHES,
> >> Wouldn't it be better to still use LT_BACKEND_FLUSH_AFTER here, and
> >> just enforce it to 0 for all the OSes at the top of bufmgr.h?
> >
> > What would be the point? The only reason for DEFAULT_BACKEND_FLUSH_AFTER
> > was that it differed between operating systems. Now it doesn't anymore.
> 
> Then why do you keep it defined?

Ooops. Missing git add.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-09 Thread Michael Paquier
On Fri, Jun 10, 2016 at 9:37 AM, Andres Freund  wrote:
> On 2016-06-10 09:34:33 +0900, Michael Paquier wrote:
>> On Fri, Jun 10, 2016 at 9:19 AM, Andres Freund  wrote:
>> > On 2016-06-09 14:37:31 -0700, Andres Freund wrote:
>> >> I'm writing a patch right now, planning to post it later today, commit
>> >> it tomorrow.
>> >
>> > Attached.
>>
>> -/* see bufmgr.h: OS dependent default */
>> -DEFAULT_BACKEND_FLUSH_AFTER, 0, WRITEBACK_MAX_PENDING_FLUSHES,
>> +0, 0, WRITEBACK_MAX_PENDING_FLUSHES,
>> Wouldn't it be better to still use LT_BACKEND_FLUSH_AFTER here, and
>> just enforce it to 0 for all the OSes at the top of bufmgr.h?
>
> What would be the point? The only reason for DEFAULT_BACKEND_FLUSH_AFTER
> was that it differed between operating systems. Now it doesn't anymore.

Then why do you keep it defined?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-09 Thread Andres Freund
On 2016-06-10 09:34:33 +0900, Michael Paquier wrote:
> On Fri, Jun 10, 2016 at 9:19 AM, Andres Freund  wrote:
> > On 2016-06-09 14:37:31 -0700, Andres Freund wrote:
> >> I'm writing a patch right now, planning to post it later today, commit
> >> it tomorrow.
> >
> > Attached.
> 
> -/* see bufmgr.h: OS dependent default */
> -DEFAULT_BACKEND_FLUSH_AFTER, 0, WRITEBACK_MAX_PENDING_FLUSHES,
> +0, 0, WRITEBACK_MAX_PENDING_FLUSHES,
> Wouldn't it be better to still use LT_BACKEND_FLUSH_AFTER here, and
> just enforce it to 0 for all the OSes at the top of bufmgr.h?

What would be the point? The only reason for DEFAULT_BACKEND_FLUSH_AFTER
was that it differed between operating systems. Now it doesn't anymore.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-09 Thread Michael Paquier
On Fri, Jun 10, 2016 at 9:19 AM, Andres Freund  wrote:
> On 2016-06-09 14:37:31 -0700, Andres Freund wrote:
>> I'm writing a patch right now, planning to post it later today, commit
>> it tomorrow.
>
> Attached.

-/* see bufmgr.h: OS dependent default */
-DEFAULT_BACKEND_FLUSH_AFTER, 0, WRITEBACK_MAX_PENDING_FLUSHES,
+0, 0, WRITEBACK_MAX_PENDING_FLUSHES,
Wouldn't it be better to still use LT_BACKEND_FLUSH_AFTER here, and
just enforce it to 0 for all the OSes at the top of bufmgr.h?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-09 Thread Andres Freund
On 2016-06-09 14:37:31 -0700, Andres Freund wrote:
> I'm writing a patch right now, planning to post it later today, commit
> it tomorrow.

Attached.
>From d86fc0c966efe544b1926652196059539966b137 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 9 Jun 2016 17:15:42 -0700
Subject: [PATCH] Change default of backend_flush_after GUC to 0 (disabled).

While beneficial, both for performance and average/worst case latency,
in a significant number of workloads, there are other workloads in which
backend_flush_after can cause significant performance regressions in
comparison to < 9.6 releases. The regression is most likely when the hot
data set is bigger than shared buffers, but significantly smaller than
the operating system's page cache.

I personally think that the benefit of enabling backend flush control is
bigger than the potential downsides, but a fair argument can be made
that not regressing is more important than improving
performance/latency. As the latter is the consensus, change the default
to 0.

The other settings introduced in 428b1d6b2 do not have the same
potential for regressions, so leave them enabled.

Benchmarks leading up to changing the default have been performed by
Mithun Cy, Ashutosh Sharma and Robert Haas.

Discussion: CAD__OuhPmc6XH=wYRm_+Q657yQE88DakN4=ybh2ovefashk...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 6 +++---
 src/backend/utils/misc/guc.c  | 3 +--
 src/backend/utils/misc/postgresql.conf.sample | 3 +--
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4f93e70..e0e5a1e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2038,9 +2038,9 @@ include_dir 'conf.d'
  than the OS's page cache, where performance might degrade.  This
  setting may have no effect on some platforms.  The valid range is
  between 0, which disables controlled writeback,
- and 2MB.  The default is 128Kb on
- Linux, 0 elsewhere.  (Non-default values of
- BLCKSZ change the default and maximum.)
+ and 2MB.  The default is 0 (i.e. no
+ flush control).  (Non-default values of BLCKSZ
+ change the maximum.)
 

   
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index cf3eb1a..9b02111 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2457,8 +2457,7 @@ static struct config_int ConfigureNamesInt[] =
 			GUC_UNIT_BLOCKS
 		},
 		_flush_after,
-		/* see bufmgr.h: OS dependent default */
-		DEFAULT_BACKEND_FLUSH_AFTER, 0, WRITEBACK_MAX_PENDING_FLUSHES,
+		0, 0, WRITEBACK_MAX_PENDING_FLUSHES,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 3ef2a97..8260e37 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -170,8 +170,7 @@
 #max_parallel_workers_per_gather = 2	# taken from max_worker_processes
 #old_snapshot_threshold = -1		# 1min-60d; -1 disables; 0 is immediate
 	# (change requires restart)
-#backend_flush_after = 0		# 0 disables,
-	# default is 128kb on linux, 0 otherwise
+#backend_flush_after = 0		# 0 disables, default is 0
 
 
 #--
-- 
2.8.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-09 Thread Andres Freund
On 2016-06-08 23:00:15 -0400, Noah Misch wrote:
> On Sun, May 29, 2016 at 01:26:03AM -0400, Noah Misch wrote:
> > On Thu, May 12, 2016 at 10:49:06AM -0400, Robert Haas wrote:
> > > On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma  
> > > wrote:
> > > > Please find the test results for the following set of combinations 
> > > > taken at
> > > > 128 client counts:
> > > >
> > > > 1) Unpatched master, default *_flush_after :  TPS = 10925.882396
> > > >
> > > > 2) Unpatched master, *_flush_after=0 :  TPS = 18613.343529
> > > >
> > > > 3) That line removed with #if 0, default *_flush_after :  TPS = 
> > > > 9856.809278
> > > >
> > > > 4) That line removed with #if 0, *_flush_after=0 :  TPS = 18158.648023
> > > 
> > > I'm getting increasingly unhappy about the checkpoint flush control.
> > > I saw major regressions on my parallel COPY test, too:
> > > 
> > > http://www.postgresql.org/message-id/ca+tgmoyouqf9cgcpgygngzqhcy-gcckryaqqtdu8kfe4n6h...@mail.gmail.com
> > > 
> > > That was a completely different machine (POWER7 instead of Intel,
> > > lousy disks instead of good ones) and a completely different workload.
> > > Considering these results, I think there's now plenty of evidence to
> > > suggest that this feature is going to be horrible for a large number
> > > of users.  A 45% regression on pgbench is horrible.  (Nobody wants to
> > > take even a 1% hit for snapshot too old, right?)  Sure, it might not
> > > be that way for every user on every Linux system, and I'm sure it
> > > performed well on the systems where Andres benchmarked it, or he
> > > wouldn't have committed it.  But our goal can't be to run well only on
> > > the newest hardware with the least-buggy kernel...
> > 
> > [This is a generic notification.]
> > 
> > The above-described topic is currently a PostgreSQL 9.6 open item.  Andres,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > 9.6 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within 72 hours of this
> > message.  Include a date for your subsequent status update.  Testers may
> > discover new open items at any time, and I want to plan to get them all 
> > fixed
> > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> > efforts toward speedy resolution.  Thanks.
> > 
> > [1] 
> > http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com
> 
> This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

I'm writing a patch right now, planning to post it later today, commit
it tomorrow.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-08 Thread Noah Misch
On Sun, May 29, 2016 at 01:26:03AM -0400, Noah Misch wrote:
> On Thu, May 12, 2016 at 10:49:06AM -0400, Robert Haas wrote:
> > On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma  
> > wrote:
> > > Please find the test results for the following set of combinations taken 
> > > at
> > > 128 client counts:
> > >
> > > 1) Unpatched master, default *_flush_after :  TPS = 10925.882396
> > >
> > > 2) Unpatched master, *_flush_after=0 :  TPS = 18613.343529
> > >
> > > 3) That line removed with #if 0, default *_flush_after :  TPS = 
> > > 9856.809278
> > >
> > > 4) That line removed with #if 0, *_flush_after=0 :  TPS = 18158.648023
> > 
> > I'm getting increasingly unhappy about the checkpoint flush control.
> > I saw major regressions on my parallel COPY test, too:
> > 
> > http://www.postgresql.org/message-id/ca+tgmoyouqf9cgcpgygngzqhcy-gcckryaqqtdu8kfe4n6h...@mail.gmail.com
> > 
> > That was a completely different machine (POWER7 instead of Intel,
> > lousy disks instead of good ones) and a completely different workload.
> > Considering these results, I think there's now plenty of evidence to
> > suggest that this feature is going to be horrible for a large number
> > of users.  A 45% regression on pgbench is horrible.  (Nobody wants to
> > take even a 1% hit for snapshot too old, right?)  Sure, it might not
> > be that way for every user on every Linux system, and I'm sure it
> > performed well on the systems where Andres benchmarked it, or he
> > wouldn't have committed it.  But our goal can't be to run well only on
> > the newest hardware with the least-buggy kernel...
> 
> [This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 9.6 open item.  Andres,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
> 
> [1] 
> http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com

This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 20:41:33 -0400, Noah Misch wrote:
> Disabling just backend_flush_after by default works for me, so let's do that.
> Though I would not elect, on behalf of PostgreSQL, the risk of enabling
> {bgwriter,checkpoint,wal_writer}_flush_after by default, a reasonable person
> may choose to do so.  I doubt the community could acquire the data necessary
> to ascertain which choice has more utility.

Note that wal_writer_flush_after was essentially already enabled before,
just a lot more *aggressively*.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Noah Misch
On Fri, Jun 03, 2016 at 03:17:06PM -0400, Robert Haas wrote:
> On Fri, Jun 3, 2016 at 2:20 PM, Andres Freund  wrote:

> >> > I'm inclined to give up and disable backend_flush_after (not the rest),
> >> > because it's new and by far the "riskiest". But I do think it's a
> >> > disservice for the majority of our users.
> >>
> >> I think that's the right course of action.  I wasn't arguing for
> >> disabling either of the other two.
> >
> > Noah was...
> 
> I know, but I'm not Noah.  :-)
> 
> We have no evidence of the other settings causing any problems yet, so
> I see no reason to second-guess the decision to leave them on by
> default at this stage.  Other people may disagree with that analysis,
> and that's fine, but my analysis is that the case for
> disable-by-default has been made for backend_flush_after but not the
> others.  I also agree that backend_flush_after is much more dangerous
> on theoretical grounds; the checkpointer is in a good position to sort
> the requests to achieve locality, but backends are not.

Disabling just backend_flush_after by default works for me, so let's do that.
Though I would not elect, on behalf of PostgreSQL, the risk of enabling
{bgwriter,checkpoint,wal_writer}_flush_after by default, a reasonable person
may choose to do so.  I doubt the community could acquire the data necessary
to ascertain which choice has more utility.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 15:17:06 -0400, Robert Haas wrote:
> On Fri, Jun 3, 2016 at 2:20 PM, Andres Freund  wrote:
> >> I've always heard that guideline as "roughly 1/4, but not more than
> >> about 8GB" - and the number of people with more than 32GB of RAM is
> >> going to just keep going up.
> >
> > I think that upper limit is wrong.  But even disregarding that:
>
> Many people think the upper limit should be even lower, based on good,
> practical experience.  Like I've seen plenty of people recommend
> 2-2.5GB.

Which largely imo is because of the writeback issue. And the locking
around buffer replacement, if you're doing it highly concurrently (which
is now mostly solved).


> > To hit the issue in that case you have to access more data than
> > shared_buffers (8GB), and very frequently re-dirty already dirtied
> > data. So you're basically (on a very rough approximation) going to have
> > to write more than 8GB within 30s (256MB/s).  Unless your hardware can
> > handle that many mostly random writes, you are likely to hit the worst
> > case behaviour of pending writeback piling up and stalls.
>
> I'm not entire sure that this is true, because my experience is that
> the background writing behavior under Linux is not very aggressive.  I
> agree you need a working set >8GB, but I think if you have that you
> might not actually need to write data this quickly, because if Linux
> decides to only do background writing (as opposed to blocking
> processes) it may not actually keep up.

But that's *bad*. Then a checkpoint comes around and latency and
throughput is shot to hell while the writeback from the fsyncs is
preventing any concurrent write activity. And if it's not keeping up
before, it's now really bad.


> And in fact I
> think what the testing shows so far is that when they can't achieve
> locality, backend flush control sucks.

FWIW, I don't think that's generally enough true. For pgbench
bigger-than-20%-of-avail-memory there's pretty much no locality, and
backend flushing helps considerably,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Robert Haas
On Fri, Jun 3, 2016 at 2:20 PM, Andres Freund  wrote:
>> I've always heard that guideline as "roughly 1/4, but not more than
>> about 8GB" - and the number of people with more than 32GB of RAM is
>> going to just keep going up.
>
> I think that upper limit is wrong.  But even disregarding that:

Many people think the upper limit should be even lower, based on good,
practical experience.  Like I've seen plenty of people recommend
2-2.5GB.

> To hit the issue in that case you have to access more data than
> shared_buffers (8GB), and very frequently re-dirty already dirtied
> data. So you're basically (on a very rough approximation) going to have
> to write more than 8GB within 30s (256MB/s).  Unless your hardware can
> handle that many mostly random writes, you are likely to hit the worst
> case behaviour of pending writeback piling up and stalls.

I'm not entire sure that this is true, because my experience is that
the background writing behavior under Linux is not very aggressive.  I
agree you need a working set >8GB, but I think if you have that you
might not actually need to write data this quickly, because if Linux
decides to only do background writing (as opposed to blocking
processes) it may not actually keep up.

Also, 256MB/s is not actually all that crazy write rate.  I mean, it's
a lot, but even if each random UPDATE touched only 1 8kB block, that
would be about 32k TPS.  When you add in index updates and TOAST
traffic, the actual number of block writes per TPS could be
considerably higher, so we might be talking about something <10k TPS.
That's well within the range of what people try to do with PostgreSQL,
at least IME.

>> > I'm inclined to give up and disable backend_flush_after (not the rest),
>> > because it's new and by far the "riskiest". But I do think it's a
>> > disservice for the majority of our users.
>>
>> I think that's the right course of action.  I wasn't arguing for
>> disabling either of the other two.
>
> Noah was...

I know, but I'm not Noah.  :-)

We have no evidence of the other settings causing any problems yet, so
I see no reason to second-guess the decision to leave them on by
default at this stage.  Other people may disagree with that analysis,
and that's fine, but my analysis is that the case for
disable-by-default has been made for backend_flush_after but not the
others.  I also agree that backend_flush_after is much more dangerous
on theoretical grounds; the checkpointer is in a good position to sort
the requests to achieve locality, but backends are not.  And in fact I
think what the testing shows so far is that when they can't achieve
locality, backend flush control sucks.  When it can, it's neutral or
positive.  But I really see no reason to believe that that's likely to
be true on general workloads.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 13:47:58 -0400, Robert Haas wrote:
> On Fri, Jun 3, 2016 at 1:43 PM, Andres Freund  wrote:
> >> I really don't get it.  There's nothing in any set of guidelines for
> >> setting shared_buffers that I've ever seen which would cause people to
> >> avoid this scenario.
> >
> > The "roughly 1/4" of memory guideline already mostly avoids it? It's
> > hard to constantly re-dirty a written-back page within 30s, before the
> > 10% (background)/20% (foreground) limits apply; if your shared buffers
> > are larger than the 10%/20% limits (which only apply to *available* not
> > total memory btw).
> 
> I've always heard that guideline as "roughly 1/4, but not more than
> about 8GB" - and the number of people with more than 32GB of RAM is
> going to just keep going up.

I think that upper limit is wrong.  But even disregarding that:

To hit the issue in that case you have to access more data than
shared_buffers (8GB), and very frequently re-dirty already dirtied
data. So you're basically (on a very rough approximation) going to have
to write more than 8GB within 30s (256MB/s).  Unless your hardware can
handle that many mostly random writes, you are likely to hit the worst
case behaviour of pending writeback piling up and stalls.


> > I'm inclined to give up and disable backend_flush_after (not the rest),
> > because it's new and by far the "riskiest". But I do think it's a
> > disservice for the majority of our users.
> 
> I think that's the right course of action.  I wasn't arguing for
> disabling either of the other two.

Noah was...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Robert Haas
On Fri, Jun 3, 2016 at 1:43 PM, Andres Freund  wrote:
>> I really don't get it.  There's nothing in any set of guidelines for
>> setting shared_buffers that I've ever seen which would cause people to
>> avoid this scenario.
>
> The "roughly 1/4" of memory guideline already mostly avoids it? It's
> hard to constantly re-dirty a written-back page within 30s, before the
> 10% (background)/20% (foreground) limits apply; if your shared buffers
> are larger than the 10%/20% limits (which only apply to *available* not
> total memory btw).

I've always heard that guideline as "roughly 1/4, but not more than
about 8GB" - and the number of people with more than 32GB of RAM is
going to just keep going up.

>> You're the first person I've ever heard describe this as a
>> misconfiguration.
>
> Huh? People tried addressing this problem for *years* with bigger /
> smaller shared buffers, but couldn't easily.

I'm saying that setting 8GB of shared_buffers on a system with
lotsamem is not widely regarded as misconfiguration.

> I'm inclined to give up and disable backend_flush_after (not the rest),
> because it's new and by far the "riskiest". But I do think it's a
> disservice for the majority of our users.

I think that's the right course of action.  I wasn't arguing for
disabling either of the other two.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 13:42:09 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > On Fri, Jun 3, 2016 at 12:39 PM, Andres Freund  wrote:
> >> Note that other operating systems like windows and freebsd *alreaddy*
> >> write back much more aggressively (independent of this change). I seem
> >> to recall you yourself being quite passionately arguing that the linux
> >> behaviour around this is broken.
> 
> > Sure, but being unhappy about the Linux behavior doesn't mean that I
> > want our TPS on Linux to go down.  Whether I like the behavior or not,
> > we have to live with it.
> 
> Yeah.  Bug or not, it's reality for lots of our users.

That means we need to address it. Which is what the feature does. So
yes, some linux specific tuning might need to be tweaked in the more
extreme cases. But that's better than relying on linux' extreme
writeback behaviour, which changes every few releases to boot. From the
tuning side this makes shared buffer sizing more common between unixoid
OSs.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 13:33:31 -0400, Robert Haas wrote:
> On Fri, Jun 3, 2016 at 12:39 PM, Andres Freund  wrote:
> > On 2016-06-03 12:31:58 -0400, Robert Haas wrote:
> >> Now, what varies IME is how much total RAM there is in the system and
> >> how frequently they write that data, as opposed to reading it.  If
> >> they are on a tightly RAM-constrained system, then this situation
> >> won't arise because they won't be under the dirty background limit.
> >> And if they aren't writing that much data then they'll be fine too.
> >> But even putting all of that together I really don't see why you're
> >> trying to suggest that this is some bizarre set of circumstances that
> >> should only rarely happen in the real world.
> >
> > I'm saying that if that happens constantly, you're better off adjusting
> > shared_buffers, because you're likely already suffering from latency
> > spikes and other issues. Optimizing for massive random write throughput
> > in a system that's not configured appropriately, at the cost of well
> > configured systems to suffer, doesn't seem like a good tradeoff to me.
> 
> I really don't get it.  There's nothing in any set of guidelines for
> setting shared_buffers that I've ever seen which would cause people to
> avoid this scenario.

The "roughly 1/4" of memory guideline already mostly avoids it? It's
hard to constantly re-dirty a written-back page within 30s, before the
10% (background)/20% (foreground) limits apply; if your shared buffers
are larger than the 10%/20% limits (which only apply to *available* not
total memory btw).


> You're the first person I've ever heard describe this as a
> misconfiguration.

Huh? People tried addressing this problem for *years* with bigger /
smaller shared buffers, but couldn't easily.

I'm inclined to give up and disable backend_flush_after (not the rest),
because it's new and by far the "riskiest". But I do think it's a
disservice for the majority of our users.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jun 3, 2016 at 12:39 PM, Andres Freund  wrote:
>> Note that other operating systems like windows and freebsd *alreaddy*
>> write back much more aggressively (independent of this change). I seem
>> to recall you yourself being quite passionately arguing that the linux
>> behaviour around this is broken.

> Sure, but being unhappy about the Linux behavior doesn't mean that I
> want our TPS on Linux to go down.  Whether I like the behavior or not,
> we have to live with it.

Yeah.  Bug or not, it's reality for lots of our users.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Robert Haas
On Fri, Jun 3, 2016 at 12:39 PM, Andres Freund  wrote:
> On 2016-06-03 12:31:58 -0400, Robert Haas wrote:
>> Now, what varies IME is how much total RAM there is in the system and
>> how frequently they write that data, as opposed to reading it.  If
>> they are on a tightly RAM-constrained system, then this situation
>> won't arise because they won't be under the dirty background limit.
>> And if they aren't writing that much data then they'll be fine too.
>> But even putting all of that together I really don't see why you're
>> trying to suggest that this is some bizarre set of circumstances that
>> should only rarely happen in the real world.
>
> I'm saying that if that happens constantly, you're better off adjusting
> shared_buffers, because you're likely already suffering from latency
> spikes and other issues. Optimizing for massive random write throughput
> in a system that's not configured appropriately, at the cost of well
> configured systems to suffer, doesn't seem like a good tradeoff to me.

I really don't get it.  There's nothing in any set of guidelines for
setting shared_buffers that I've ever seen which would cause people to
avoid this scenario.  You're the first person I've ever heard describe
this as a misconfiguration.

> Note that other operating systems like windows and freebsd *alreaddy*
> write back much more aggressively (independent of this change). I seem
> to recall you yourself being quite passionately arguing that the linux
> behaviour around this is broken.

Sure, but being unhappy about the Linux behavior doesn't mean that I
want our TPS on Linux to go down.  Whether I like the behavior or not,
we have to live with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 12:31:58 -0400, Robert Haas wrote:
> Now, what varies IME is how much total RAM there is in the system and
> how frequently they write that data, as opposed to reading it.  If
> they are on a tightly RAM-constrained system, then this situation
> won't arise because they won't be under the dirty background limit.
> And if they aren't writing that much data then they'll be fine too.
> But even putting all of that together I really don't see why you're
> trying to suggest that this is some bizarre set of circumstances that
> should only rarely happen in the real world.

I'm saying that if that happens constantly, you're better off adjusting
shared_buffers, because you're likely already suffering from latency
spikes and other issues. Optimizing for massive random write throughput
in a system that's not configured appropriately, at the cost of well
configured systems to suffer, doesn't seem like a good tradeoff to me.

Note that other operating systems like windows and freebsd *alreaddy*
write back much more aggressively (independent of this change). I seem
to recall you yourself being quite passionately arguing that the linux
behaviour around this is broken.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Robert Haas
On Fri, Jun 3, 2016 at 2:09 AM, Andres Freund  wrote:
> On 2016-06-03 01:57:33 -0400, Noah Misch wrote:
>> > Which means that transactional workloads that are bigger than the OS
>> > memory, or which have a non-uniform distribution leading to some
>> > locality, are likely to be faster. In practice those are *hugely* more
>> > likely than the uniform distribution that pgbench has.
>>
>> That is formally true; non-benchmark workloads rarely issue uniform writes.
>> However, enough non-benchmark workloads have too little locality to benefit
>> from caches.  Those will struggle against *_flush_after like uniform writes
>> do, so discounting uniform writes wouldn't simplify this project.
>
> But such workloads rarely will hit the point of constantly re-dirtying
> already dirty pages in kernel memory within 30s.

I don't know why not.  It's not exactly uncommon to update the same
data frequently, nor is it uncommon for the hot data set to be larger
than shared_buffers and smaller than the OS cache, even significantly
smaller.  Any workload of that type is going to have this problem
regardless of whether the access pattern is uniform.  If you have a
highly non-uniform access pattern then you just have this problem on
the small subset of the data that is hot.  I think that asserting that
there's something wrong with this test is just wrong.  Many people
have done many tests very similar to this one on Linux systems over
many years to assess PostgreSQL performance.  It's a totally
legitimate test configuration.

Indeed, I'd argue that this is actually a pretty common real-world
scenario.  Most people's hot data fits in memory, because if it
doesn't, their performance sucks so badly that they either redesign
something or buy more memory until it does.  Also, most people have
more hot data than shared_buffers.  There are some who don't because
their data set is very small, and that's nice when it happens; and
there are others who don't because they carefully crank shared_buffers
up high enough that everything fits, but most don't, either because it
causes other problems, or because they just don't think to tinkering
with it, or because they set it up that way initially but then the
data grows over time.  There are a LOT of people running with 8GB or
less of shared_buffers and a working set that is in the tens of GB.

Now, what varies IME is how much total RAM there is in the system and
how frequently they write that data, as opposed to reading it.  If
they are on a tightly RAM-constrained system, then this situation
won't arise because they won't be under the dirty background limit.
And if they aren't writing that much data then they'll be fine too.
But even putting all of that together I really don't see why you're
trying to suggest that this is some bizarre set of circumstances that
should only rarely happen in the real world.  I think it clearly does
happen, and I doubt it's particularly uncommon.  If your testing
didn't discover this scenario, I feel rather strongly that that's an
oversight in your testing rather than a problem with the scenario.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 09:24:28 -0700, Andres Freund wrote:
> This unstable performance issue, with the minute-long stalls, is the
> worst and most frequent production problem people hit with postgres in
> my experience, besides issues with autovacuum.  Ignoring that is just
> hurting our users.

Oh, and a good proportion of the "autovacuum causes my overall systems
to slow down unacceptably" issues come from exactly this.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 10:48:18 -0400, Noah Misch wrote:
> On Thu, Jun 02, 2016 at 11:09:22PM -0700, Andres Freund wrote:
> > > Today's defaults for *_flush_after greatly smooth and accelerate 
> > > performance
> > > for one class of plausible workloads while greatly slowing a different 
> > > class
> > > of plausible workloads.
> 
> The usual PostgreSQL handling of a deeply workload-dependent performance
> feature is to disable it by default.

Meh. That's not actually all that often the case.  This unstable
performance issue, with the minute-long stalls, is the worst and most
frequent production problem people hit with postgres in my experience,
besides issues with autovacuum.  Ignoring that is just hurting our
users.


> > I don't think checkpoint_flush_after is in that class, due to the
> > fsync()s we already emit at the end of checkpoints.
> 
> That's a promising hypothesis.  Some future project could impose a nonzero
> default checkpoint_flush_after, having demonstrated that it imposes negligible
> harm in the plausible cases it does not help.

Have you actually looked at the thread with all the numbers? This isn't
an issue that has been decided willy-nilly. It's been discussed *over
months*.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Fabien COELHO


Hello Noah,


The usual PostgreSQL handling of a deeply workload-dependent performance
feature is to disable it by default.  That's what I'm inclined to do here, for
every GUC the feature added.  Sophisticated users will nonetheless fully
exploit this valuable mechanism in 9.6.



I don't think checkpoint_flush_after is in that class, due to the
fsync()s we already emit at the end of checkpoints.


I agree with Andres that checkpoint_flush_after shoud not be treated as 
other _flush_after settings.



That's a promising hypothesis.


This is not an hypothesis but a proven fact. There has been hundreds of 
hours of pgbenchs runs to test and demonstrate the positive impact in 
various reasonable configurations.


Some future project could impose a nonzero default 
checkpoint_flush_after, having demonstrated that it imposes negligible 
harm in the plausible cases it does not help.


I think that the significant and general benefit of checkpoint_flush_after 
has been largely demonstrated and reported on the hacker thread at various 
point of the development of the feature, and that it is safe, and even 
highly advisable to keep it on by default.


The key point is that it is flushing sorted buffers so that it mostly 
results in sequential writes. It avoids in a lot of case where the final 
sync at the end of the checkpoint generates too many ios which results in 
putting postgresql off line till the fsync is completed, from seconds to 
minutes at a time.


The other *_flush_after do not benefit for any buffer reordering, so their 
positive impact is maybe more questionnable, so I would be okay if these 
are disabled by default.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Noah Misch
On Thu, Jun 02, 2016 at 11:09:22PM -0700, Andres Freund wrote:
> On 2016-06-03 01:57:33 -0400, Noah Misch wrote:
> > > Which means that transactional workloads that are bigger than the OS
> > > memory, or which have a non-uniform distribution leading to some
> > > locality, are likely to be faster. In practice those are *hugely* more
> > > likely than the uniform distribution that pgbench has.
> > 
> > That is formally true; non-benchmark workloads rarely issue uniform writes.
> > However, enough non-benchmark workloads have too little locality to benefit
> > from caches.  Those will struggle against *_flush_after like uniform writes
> > do, so discounting uniform writes wouldn't simplify this project.
> 
> But such workloads rarely will hit the point of constantly re-dirtying
> already dirty pages in kernel memory within 30s.

Rarely, yes.  Not rarely enough to discount.

> > Today's defaults for *_flush_after greatly smooth and accelerate performance
> > for one class of plausible workloads while greatly slowing a different class
> > of plausible workloads.

The usual PostgreSQL handling of a deeply workload-dependent performance
feature is to disable it by default.  That's what I'm inclined to do here, for
every GUC the feature added.  Sophisticated users will nonetheless fully
exploit this valuable mechanism in 9.6.

> I don't think checkpoint_flush_after is in that class, due to the
> fsync()s we already emit at the end of checkpoints.

That's a promising hypothesis.  Some future project could impose a nonzero
default checkpoint_flush_after, having demonstrated that it imposes negligible
harm in the plausible cases it does not help.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-03 Thread Andres Freund
On 2016-06-03 01:57:33 -0400, Noah Misch wrote:
> > Which means that transactional workloads that are bigger than the OS
> > memory, or which have a non-uniform distribution leading to some
> > locality, are likely to be faster. In practice those are *hugely* more
> > likely than the uniform distribution that pgbench has.
> 
> That is formally true; non-benchmark workloads rarely issue uniform writes.
> However, enough non-benchmark workloads have too little locality to benefit
> from caches.  Those will struggle against *_flush_after like uniform writes
> do, so discounting uniform writes wouldn't simplify this project.

But such workloads rarely will hit the point of constantly re-dirtying
already dirty pages in kernel memory within 30s.


> Today's defaults for *_flush_after greatly smooth and accelerate performance
> for one class of plausible workloads while greatly slowing a different class
> of plausible workloads.

I don't think checkpoint_flush_after is in that class, due to the
fsync()s we already emit at the end of checkpoints.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-02 Thread Noah Misch
On Wed, Jun 01, 2016 at 03:33:18PM -0700, Andres Freund wrote:
> On 2016-05-31 16:03:46 -0400, Robert Haas wrote:
> > On Fri, May 27, 2016 at 12:37 AM, Andres Freund  wrote:
> > > I don't think the situation is quite that simple. By *disabling* backend 
> > > flushing it's also easy to see massive performance regressions.  In 
> > > situations where shared buffers was configured appropriately for the 
> > > workload (not the case here IIRC).
> > 
> > On what kind of workload does setting backend_flush_after=0 represent
> > a large regression vs. the default settings?
> > 
> > I think we have to consider that pgbench and parallel copy are pretty
> > common things to want to do, and a non-zero default setting hurts
> > those workloads a LOT.
> 
> I don't think pgbench's workload has much to do with reality. Even less
> so in the setup presented here.
> 
> The slowdown comes from the fact that default pgbench randomly, but
> uniformly, updates a large table. Which is slower with
> backend_flush_after if the workload is considerably bigger than
> shared_buffers, but, and that's a very important restriction, the
> workload at the same time largely fits in to less than
> /proc/sys/vm/dirty_ratio / 20% (probably even 10% /
> /proc/sys/vm/dirty_background_ratio) of the free os memory.

Looking at some of the top hits for 'postgresql shared_buffers':

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
https://www.postgresql.org/docs/current/static/runtime-config-resource.html
http://rhaas.blogspot.com/2012/03/tuning-sharedbuffers-and-walbuffers.html
https://www.keithf4.com/a-large-database-does-not-mean-large-shared_buffers/
http://www.cybertec.at/2014/02/postgresql-9-3-shared-buffers-performance-1/

Choices mentioned (some in comments on a main post):

1. .25 * RAM
2. min(8GB, .25 * RAM)
3. Sizing procedure that arrived at 4GB for 900GB of data
4. Equal to data size

Thus, it is not outlandish to have the write portion of a working set exceed
shared_buffers while remaining under 10-20% of system RAM.  Choice (4) won't
achieve that, but (2) and (3) may achieve it given a mere 64 GiB of RAM.
Choice (1) can go either way; if read-mostly data occupies half of
shared_buffers, then writes passing through the other 12.5% of system RAM may
exhibit the property you describe.

Incidentally, a typical reason for a site to use low shared_buffers is to
avoid the latency spikes that *_flush_after combat:
https://www.postgresql.org/message-id/flat/4DDE270502250003DD4F%40gw.wicourts.gov

> > I have a really hard time believing that the benefits on other
> > workloads are large enough to compensate for the slowdowns we're
> > seeing here.
> 
> As a random example, without looking for good parameters, on my laptop:
> pgbench -i -q -s 1000
> 
> Cpu: i7-6820HQ
> Ram: 24GB of memory
> Storage: Samsung SSD 850 PRO 1TB, encrypted
> postgres -c shared_buffers=6GB -c backend_flush_after=128 -c 
> max_wal_size=100GB -c fsync=on -c synchronous_commit=off
> pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N
> (note the -N)
> disabled:
> latency average = 2.774 ms
> latency stddev = 10.388 ms
> tps = 5761.883323 (including connections establishing)
> tps = 5762.027278 (excluding connections establishing)
> 
> 128:
> latency average = 2.543 ms
> latency stddev = 3.554 ms
> tps = 6284.069846 (including connections establishing)
> tps = 6284.184570 (excluding connections establishing)
> 
> Note the latency dev which is 3x better. And the improved throughput.

That is an improvement.  The workload is no less realistic than the ones
having shown regressions.

> Which means that transactional workloads that are bigger than the OS
> memory, or which have a non-uniform distribution leading to some
> locality, are likely to be faster. In practice those are *hugely* more
> likely than the uniform distribution that pgbench has.

That is formally true; non-benchmark workloads rarely issue uniform writes.
However, enough non-benchmark workloads have too little locality to benefit
from caches.  Those will struggle against *_flush_after like uniform writes
do, so discounting uniform writes wouldn't simplify this project.


Today's defaults for *_flush_after greatly smooth and accelerate performance
for one class of plausible workloads while greatly slowing a different class
of plausible workloads.

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-01 Thread Andres Freund
On 2016-06-01 15:33:18 -0700, Andres Freund wrote:
> Cpu: i7-6820HQ
> Ram: 24GB of memory
> Storage: Samsung SSD 850 PRO 1TB, encrypted
> postgres -c shared_buffers=6GB -c backend_flush_after=128 -c 
> max_wal_size=100GB -c fsync=on -c synchronous_commit=off
> pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N

Using scale 5000 database, with wal compression enabled (otherwise the
whole thing is too slow in both cases), and 64 clients gives:

disabled:
latency average = 11.896 ms
latency stddev = 42.187 ms
tps = 5378.025369 (including connections establishing)
tps = 5378.248569 (excluding connections establishing)

128:
latency average = 11.002 ms
latency stddev = 10.621 ms
tps = 5814.586813 (including connections establishing)
tps = 5814.840249 (excluding connections establishing)


With flushing disabled, rougly every 30s you see:
progress: 150.0 s, 6223.3 tps, lat 10.036 ms stddev 9.521
progress: 151.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 152.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 153.0 s, 4952.9 tps, lat 39.050 ms stddev 249.839

progress: 172.0 s, 4888.0 tps, lat 12.851 ms stddev 11.507
progress: 173.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 174.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 175.0 s, 4636.8 tps, lat 41.421 ms stddev 268.416

progress: 196.0 s, 1119.2 tps, lat 9.618 ms stddev 8.321
progress: 197.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 198.0 s, 1920.9 tps, lat 94.375 ms stddev 429.756
progress: 199.0 s, 5260.8 tps, lat 12.087 ms stddev 11.595


With backend flushing enabled there's not a single such pause.


If you use spinning rust instead of SSDs, the pauses aren't 1-2s
anymore, but easily 10-30s.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-06-01 Thread Andres Freund
On 2016-05-31 16:03:46 -0400, Robert Haas wrote:
> On Fri, May 27, 2016 at 12:37 AM, Andres Freund  wrote:
> > I don't think the situation is quite that simple. By *disabling* backend 
> > flushing it's also easy to see massive performance regressions.  In 
> > situations where shared buffers was configured appropriately for the 
> > workload (not the case here IIRC).
> 
> On what kind of workload does setting backend_flush_after=0 represent
> a large regression vs. the default settings?
> 
> I think we have to consider that pgbench and parallel copy are pretty
> common things to want to do, and a non-zero default setting hurts
> those workloads a LOT.

I don't think pgbench's workload has much to do with reality. Even less
so in the setup presented here.

The slowdown comes from the fact that default pgbench randomly, but
uniformly, updates a large table. Which is slower with
backend_flush_after if the workload is considerably bigger than
shared_buffers, but, and that's a very important restriction, the
workload at the same time largely fits in to less than
/proc/sys/vm/dirty_ratio / 20% (probably even 10% /
/proc/sys/vm/dirty_background_ratio) of the free os memory.  The "trick"
in that case is that very often, before a buffer has been written back
to storage by the OS, it'll be re-dirtied by postgres.  Which means
triggering flushing by postgres increases the total amount of writes.
That only matters if the kernel doesn't trigger writeback because of the
above ratios, or because of time limits (30s /
dirty_writeback_centisecs).


> I have a really hard time believing that the benefits on other
> workloads are large enough to compensate for the slowdowns we're
> seeing here.

As a random example, without looking for good parameters, on my laptop:
pgbench -i -q -s 1000

Cpu: i7-6820HQ
Ram: 24GB of memory
Storage: Samsung SSD 850 PRO 1TB, encrypted
postgres -c shared_buffers=6GB -c backend_flush_after=128 -c max_wal_size=100GB 
-c fsync=on -c synchronous_commit=off
pgbench -M prepared -c 16 -j 16 -T 520 -P 1 -n -N
(note the -N)
disabled:
latency average = 2.774 ms
latency stddev = 10.388 ms
tps = 5761.883323 (including connections establishing)
tps = 5762.027278 (excluding connections establishing)

128:
latency average = 2.543 ms
latency stddev = 3.554 ms
tps = 6284.069846 (including connections establishing)
tps = 6284.184570 (excluding connections establishing)

Note the latency dev which is 3x better. And the improved throughput.

That's for a workload which even fits into the OS memory. Without
backend flushing there's several periods looking like

progress: 249.0 s, 7237.6 tps, lat 1.997 ms stddev 4.365
progress: 250.0 s, 0.0 tps, lat -nan ms stddev -nan
progress: 251.0 s, 1880.6 tps, lat 17.761 ms stddev 169.682
progress: 252.0 s, 6904.4 tps, lat 2.328 ms stddev 3.256

i.e. moments in which no transactions are executed. And that's on
storage that can do 500MB/sec, and tens of thousand IOPs.


If you change the workload workload that uses synchronous_commit, is
bigger than OS memory and/or doesn't have very fast storage, the
differences can be a *LOT* bigger.


In general, any workload which doesn't fit a) the above criteria of
likely re-dirtying blocks it already dirtied, before kernel triggered
writeback happens b) concurrently COPYs into an indvidual file, is
likely to be faster (or unchanged if within s_b) with backend flushing.


Which means that transactional workloads that are bigger than the OS
memory, or which have a non-uniform distribution leading to some
locality, are likely to be faster. In practice those are *hugely* more
likely than the uniform distribution that pgbench has.

Similarly, this *considerably* reduces the impact a concurrently running
vacuum or COPY has on concurrent queries. Because suddenly VACUUM/COPY
can't create a couple gigabytes of dirty buffers which will be written
back at some random point in time later, stalling everything.


I think the benefits of a more predictable (and often faster!)
performance in a bunch of actual real-worl-ish workloads are higher than
optimizing for benchmarks.



> We have nobody writing in to say that
> backend_flush_after>0 is making things way better for them, and
> Ashutosh and I have independently hit massive slowdowns on unrelated
> workloads.

Actually, we have some of evidence of that? Just so far not in this
thread; which I don't find particularly surprising.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-31 Thread Robert Haas
On Fri, May 27, 2016 at 12:37 AM, Andres Freund  wrote:
> I don't think the situation is quite that simple. By *disabling* backend 
> flushing it's also easy to see massive performance regressions.  In 
> situations where shared buffers was configured appropriately for the workload 
> (not the case here IIRC).

On what kind of workload does setting backend_flush_after=0 represent
a large regression vs. the default settings?

I think we have to consider that pgbench and parallel copy are pretty
common things to want to do, and a non-zero default setting hurts
those workloads a LOT.  I have a really hard time believing that the
benefits on other workloads are large enough to compensate for the
slowdowns we're seeing here.  We have nobody writing in to say that
backend_flush_after>0 is making things way better for them, and
Ashutosh and I have independently hit massive slowdowns on unrelated
workloads.  We weren't looking for slowdowns in this patch.  We were
trying to measure other stuff, and ended up tracing the behavior back
to this patch.  That really, really suggests that other people will
have similar experiences.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-28 Thread Noah Misch
On Thu, May 12, 2016 at 10:49:06AM -0400, Robert Haas wrote:
> On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma  
> wrote:
> > Please find the test results for the following set of combinations taken at
> > 128 client counts:
> >
> > 1) Unpatched master, default *_flush_after :  TPS = 10925.882396
> >
> > 2) Unpatched master, *_flush_after=0 :  TPS = 18613.343529
> >
> > 3) That line removed with #if 0, default *_flush_after :  TPS = 9856.809278
> >
> > 4) That line removed with #if 0, *_flush_after=0 :  TPS = 18158.648023
> 
> I'm getting increasingly unhappy about the checkpoint flush control.
> I saw major regressions on my parallel COPY test, too:
> 
> http://www.postgresql.org/message-id/ca+tgmoyouqf9cgcpgygngzqhcy-gcckryaqqtdu8kfe4n6h...@mail.gmail.com
> 
> That was a completely different machine (POWER7 instead of Intel,
> lousy disks instead of good ones) and a completely different workload.
> Considering these results, I think there's now plenty of evidence to
> suggest that this feature is going to be horrible for a large number
> of users.  A 45% regression on pgbench is horrible.  (Nobody wants to
> take even a 1% hit for snapshot too old, right?)  Sure, it might not
> be that way for every user on every Linux system, and I'm sure it
> performed well on the systems where Andres benchmarked it, or he
> wouldn't have committed it.  But our goal can't be to run well only on
> the newest hardware with the least-buggy kernel...

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Andres,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] 
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-26 Thread Andres Freund


Hi,

On May 26, 2016 9:29:51 PM PDT, Ashutosh Sharma  wrote:
>Hi All,
>
>As we have seen the regression of more than 45% with
>"*backend_flush_after*"
>enabled and set to its default value i.e. 128KB or even when it is set
>to
>some higher value like 2MB, i think we should disable it such that it
>does
>not impact the read write performance and here is the attached patch
>for
>the same.  Please have a look and let me know your thoughts on this.
>Thanks!

I don't think the situation is quite that simple. By *disabling* backend 
flushing it's also easy to see massive performance regressions.  In situations 
where shared buffers was configured appropriately for the workload (not the 
case here IIRC).

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-26 Thread Ashutosh Sharma
Hi All,

As we have seen the regression of more than 45% with "*backend_flush_after*"
enabled and set to its default value i.e. 128KB or even when it is set to
some higher value like 2MB, i think we should disable it such that it does
not impact the read write performance and here is the attached patch for
the same.  Please have a look and let me know your thoughts on this. Thanks!

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Sun, May 15, 2016 at 1:26 AM, Fabien COELHO  wrote:

>
> These raw tps suggest that {backend,bgwriter}_flush_after should better be
>>> zero for this kind of load.Whether it should be the default is unclear
>>> yet,
>>> because as Andres pointed out this is one kind of load.
>>>
>>
>> FWIW, I don't think {backend,bgwriter} are the same here. It's primarily
>> backend that matters.
>>
>
> Indeed, I was a little hasty to put bgwriter together based on this report.
>
> I'm a little wary of "bgwriter_flush_after" though, I would not be
> surprised if someone reports some regressions, although probably not with a
> pgbench tpcb kind of load.
>
> --
> Fabien.
>
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 38b6027..fff7104 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -60,7 +60,7 @@ extern PGDLLIMPORT int NBuffers;
 /* FIXME: Also default to on for mmap && msync(MS_ASYNC)? */
 #ifdef HAVE_SYNC_FILE_RANGE
 #define DEFAULT_CHECKPOINT_FLUSH_AFTER 32
-#define DEFAULT_BACKEND_FLUSH_AFTER 16
+#define DEFAULT_BACKEND_FLUSH_AFTER 0
 #define DEFAULT_BGWRITER_FLUSH_AFTER 64
 #else
 #define DEFAULT_CHECKPOINT_FLUSH_AFTER 0

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-14 Thread Fabien COELHO



These raw tps suggest that {backend,bgwriter}_flush_after should better be
zero for this kind of load.Whether it should be the default is unclear yet,
because as Andres pointed out this is one kind of load.


FWIW, I don't think {backend,bgwriter} are the same here. It's primarily
backend that matters.


Indeed, I was a little hasty to put bgwriter together based on this 
report.


I'm a little wary of "bgwriter_flush_after" though, I would not be 
surprised if someone reports some regressions, although probably not with 
a pgbench tpcb kind of load.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-14 Thread Andres Freund
On 2016-05-14 18:49:27 +0200, Fabien COELHO wrote:
> 
> Hello,
> 
> > Please find the results for the following 3 scenarios with unpatched master:
> > 
> > 1. Default settings for *_flush_after : TPS = *10677.662356*
> > 2. backend_flush_after=0, rest defaults : TPS = *18452.655936*
> > 3. backend_flush_after=0, bgwriter_flush_after=0,
> > wal_writer_flush_after=0, checkpoint_flush_after=0 : TPS = *18614.479962*
> 
> Thanks for these runs.

Yes!

> These raw tps suggest that {backend,bgwriter}_flush_after should better be
> zero for this kind of load.Whether it should be the default is unclear yet,
> because as Andres pointed out this is one kind of load.

FWIW, I don't think {backend,bgwriter} are the same here. It's primarily
backend that matters.  This is treating the os page cache as an
extension of postgres' buffer cache. That really primarily matters for
backend_, because otherwise backends spend time waiting for IO.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-14 Thread Fabien COELHO


Hello,


Please find the results for the following 3 scenarios with unpatched master:

1. Default settings for *_flush_after : TPS = *10677.662356*
2. backend_flush_after=0, rest defaults : TPS = *18452.655936*
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0 : TPS = *18614.479962*


Thanks for these runs.

These raw tps suggest that {backend,bgwriter}_flush_after should better be 
zero for this kind of load.Whether it should be the default is unclear 
yet, because as Andres pointed out this is one kind of load.


Note: these options have been added to smooth ios over time and to help 
avoid "io panics" on sync, especially with HDDs without a large BBU cache 
in front. The real benefit is that the performance are much more constant 
over time, and pg is much more responsive.


If you do other runs, it would be nice to report some stats about tps 
variability (eg latency & latency stddev which should be in the report). 
For experiments I did I used to log "-P 1" output (tps every second) and 
to compute stats on these tps (avg, stddev, min, q1, median, q3, max, pc 
of time with tps below a low threshold...), which provides some indication 
of the overall tps distribution.


--
Fabien


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-14 Thread Ashutosh Sharma
Hi,

Please find the results for the following 3 scenarios with unpatched master:

1. Default settings for *_flush_after : TPS = *10677.662356*
2. backend_flush_after=0, rest defaults : TPS = *18452.655936*
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0 : TPS = *18614.479962*

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

On Fri, May 13, 2016 at 7:50 PM, Robert Haas  wrote:

> On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma 
> wrote:
> > Following are the performance results for read write test observed with
> > different numbers of "backend_flush_after".
> >
> > 1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
> > 2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
> > 3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
> > 4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417
>
> So even at 2MB we don't come close to recovering all of the lost
> performance.  Can you please test these three scenarios?
>
> 1. Default settings for *_flush_after
> 2. backend_flush_after=0, rest defaults
> 3. backend_flush_after=0, bgwriter_flush_after=0,
> wal_writer_flush_after=0, checkpoint_flush_after=0
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-13 Thread Amit Kapila
On Fri, May 13, 2016 at 11:13 PM, Andres Freund  wrote:
>
> On 2016-05-13 10:20:04 -0400, Robert Haas wrote:
> > On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma 
wrote:
> > > Following are the performance results for read write test observed
with
> > > different numbers of "backend_flush_after".
> > >
> > > 1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
> > > 2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
> > > 3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
> > > 4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417
> >
> > So even at 2MB we don't come close to recovering all of the lost
> > performance.  Can you please test these three scenarios?
> >
> > 1. Default settings for *_flush_after
> > 2. backend_flush_after=0, rest defaults
> > 3. backend_flush_after=0, bgwriter_flush_after=0,
> > wal_writer_flush_after=0, checkpoint_flush_after=0
>
> 4) 1) + a shared_buffers setting appropriate to the workload.
>

If by 4th point, you mean to test the case when data fits in shared
buffers, then Mithun has already reported above [1] that it didn't see any
regression for that case


[1] -
http://www.postgresql.org/message-id/cad__ouiobznvtt_ho__p5aenu4inqcfwgarxr4tblke-uxy...@mail.gmail.com
Read line - Even for READ-WRITE when data fits into shared buffer
(scale_factor=300 and shared_buffers=8GB) performance has improved.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-13 Thread Andres Freund
On 2016-05-13 14:43:15 -0400, Robert Haas wrote:
> On Fri, May 13, 2016 at 1:43 PM, Andres Freund  wrote:
> > I just want to emphasize what we're discussing here is a bit of an
> > extreme setup. A workload that's bigger than shared buffers, but smaller
> > than the OS's cache size; with a noticeable likelihood of rewriting
> > individual OS page cache pages within 30s.
>
> You're just describing pgbench with a scale factor too large to fit in
> shared_buffers.

Well, that *and* a scale factor smaller than 20% of the memory
available, *and* a scale factor small enough that make re-dirtying of
already written out pages likely.


> I think it's unfair to paint that as some kind of niche use case.

I'm not saying we don't need to do something about it. Just that it's a
hard tradeoff to make. The massive performance / latency we've observed
originate from the kernel caching too much dirty IO. The fix is making
is cache fewer dirty pages.  But there's workloads where the kernel's
buffer cache works as an extension of our page cache.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-13 Thread Robert Haas
On Fri, May 13, 2016 at 1:43 PM, Andres Freund  wrote:
> On 2016-05-13 10:20:04 -0400, Robert Haas wrote:
>> On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma  
>> wrote:
>> > Following are the performance results for read write test observed with
>> > different numbers of "backend_flush_after".
>> >
>> > 1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
>> > 2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
>> > 3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
>> > 4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417
>>
>> So even at 2MB we don't come close to recovering all of the lost
>> performance.  Can you please test these three scenarios?
>>
>> 1. Default settings for *_flush_after
>> 2. backend_flush_after=0, rest defaults
>> 3. backend_flush_after=0, bgwriter_flush_after=0,
>> wal_writer_flush_after=0, checkpoint_flush_after=0
>
> 4) 1) + a shared_buffers setting appropriate to the workload.
>
>
> I just want to emphasize what we're discussing here is a bit of an
> extreme setup. A workload that's bigger than shared buffers, but smaller
> than the OS's cache size; with a noticeable likelihood of rewriting
> individual OS page cache pages within 30s.

You're just describing pgbench with a scale factor too large to fit in
shared_buffers.  I think it's unfair to paint that as some kind of
niche use case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-13 Thread Andres Freund
On 2016-05-13 10:20:04 -0400, Robert Haas wrote:
> On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma  
> wrote:
> > Following are the performance results for read write test observed with
> > different numbers of "backend_flush_after".
> >
> > 1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
> > 2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
> > 3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
> > 4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417
> 
> So even at 2MB we don't come close to recovering all of the lost
> performance.  Can you please test these three scenarios?
>
> 1. Default settings for *_flush_after
> 2. backend_flush_after=0, rest defaults
> 3. backend_flush_after=0, bgwriter_flush_after=0,
> wal_writer_flush_after=0, checkpoint_flush_after=0

4) 1) + a shared_buffers setting appropriate to the workload.


I just want to emphasize what we're discussing here is a bit of an
extreme setup. A workload that's bigger than shared buffers, but smaller
than the OS's cache size; with a noticeable likelihood of rewriting
individual OS page cache pages within 30s.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-13 Thread Robert Haas
On Fri, May 13, 2016 at 7:08 AM, Ashutosh Sharma  wrote:
> Following are the performance results for read write test observed with
> different numbers of "backend_flush_after".
>
> 1) backend_flush_after = 256kb (32*8kb), tps = 10841.178815
> 2) backend_flush_after = 512kb (64*8kb), tps = 11098.702707
> 3) backend_flush_after = 1MB (128*8kb), tps = 11434.964545
> 4) backend_flush_after = 2MB (256*8kb), tps = 13477.089417

So even at 2MB we don't come close to recovering all of the lost
performance.  Can you please test these three scenarios?

1. Default settings for *_flush_after
2. backend_flush_after=0, rest defaults
3. backend_flush_after=0, bgwriter_flush_after=0,
wal_writer_flush_after=0, checkpoint_flush_after=0

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-13 Thread Ashutosh Sharma
Hi,

Following are the performance results for read write test observed with
different numbers of "*backend_flush_after*".

1) backend_flush_after = *256kb* (32*8kb), tps = *10841.178815*
2) backend_flush_after = *512kb* (64*8kb), tps = *11098.702707*
3) backend_flush_after = *1MB* (128*8kb), tps = *11434.964545*
4) backend_flush_after = *2MB* (256*8kb), tps = *13477.089417*


*Note:* Above test has been performed on Unpatched master with default
values for checkpoint_flush_after, bgwriter_flush_after
and wal_writer_flush_after.

With Regards,
Ashutosh Sharma
EnterpriseDB:* http://www.enterprisedb.com *

On Thu, May 12, 2016 at 9:20 PM, Andres Freund  wrote:

> On 2016-05-12 11:27:31 -0400, Robert Haas wrote:
> > On Thu, May 12, 2016 at 11:13 AM, Andres Freund 
> wrote:
> > > Could you run this one with a number of different backend_flush_after
> > > settings?  I'm suspsecting the primary issue is that the default is
> too low.
> >
> > What values do you think would be good to test?  Maybe provide 3 or 4
> > suggested values to try?
>
> 0 (disabled), 16 (current default), 32, 64, 128, 256?
>
> I'm suspecting that only backend_flush_after_* has these negative
> performance implications at this point.  One path is to increase that
> option's default value, another is to disable only backend guided
> flushing. And add a strong hint that if you care about predictable
> throughput you might want to enable it.
>
> Greetings,
>
> Andres Freund
>


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-12 Thread Fabien COELHO



I'm getting increasingly unhappy about the checkpoint flush control.
I saw major regressions on my parallel COPY test, too:


Yes, I'm concerned too.


A few thoughts:

 - focussing on raw tps is not a good idea, because it may be a lot of tps
   followed by a sync panic, with an unresponsive database. I wish the
   performance reports would include some indication of the distribution
   (eg min/q1/median/d3/max tps per second seen, standard deviation), not
   just the final "tps" figure.

 - checkpoint flush control (checkpoint_flush_after) should mostly always
   beneficial because it flushes sorted data. I would be surprised
   to see significant regressions with this on. A lot of tests showed
   maybe improved tps, but mostly greatly improved performance stability,
   where a database unresponsive 60% of the time (60% of seconds in the
   the tps show very low or zero tps) and then becomes always responsive.

 - other flush controls ({backend,bgwriter}_flush_after) may just increase
   random writes, so are more risky in nature because the data is not
   sorted, and it may or may not be a good idea depending on detailed
   conditions. A "parallel copy" would be just such a special IO load
   which degrade performance under these settings.

   Maybe these two should be disabled by default because they lead to
   possibly surprising regressions?

 - for any particular load, the admin can decide to disable these if
   they think it is better not to flush. Also, as suggested by Andres,
   with 128 parallel queries the default value may not be appropriate
   at all.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-12 Thread Andres Freund
On 2016-05-12 10:49:06 -0400, Robert Haas wrote:
> On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma  
> wrote:
> > Please find the test results for the following set of combinations taken at
> > 128 client counts:
> >
> > 1) Unpatched master, default *_flush_after :  TPS = 10925.882396
> >
> > 2) Unpatched master, *_flush_after=0 :  TPS = 18613.343529
> >
> > 3) That line removed with #if 0, default *_flush_after :  TPS = 9856.809278
> >
> > 4) That line removed with #if 0, *_flush_after=0 :  TPS = 18158.648023
> 
> I'm getting increasingly unhappy about the checkpoint flush control.
> I saw major regressions on my parallel COPY test, too:

Yes, I'm concerned too.

The workload in this thread is a bit of an "artificial" workload (all
data is constantly updated, doesn't fit into shared_buffers, fits into
the OS page cache), and only measures throughput not latency.  But I
agree that that's way too large a regression to accept, and that there's
a significant number of machines with way undersized shared_buffer
values.


> http://www.postgresql.org/message-id/ca+tgmoyouqf9cgcpgygngzqhcy-gcckryaqqtdu8kfe4n6h...@mail.gmail.com
> 
> That was a completely different machine (POWER7 instead of Intel,
> lousy disks instead of good ones) and a completely different workload.
> Considering these results, I think there's now plenty of evidence to
> suggest that this feature is going to be horrible for a large number
> of users.  A 45% regression on pgbench is horrible.

I asked you over there whether you could benchmark with just different
values for backend_flush_after... I chose the current value because it
gives the best latency / most consistent throughput numbers, but 128kb
isn't a large window.  I suspect we might need to disable backend guided
flushing if that's not sufficient :(


> > Here, That line points to "AddWaitEventToSet(FeBeWaitSet,
> > WL_POSTMASTER_DEATH, -1, NULL, NULL); in pq_init()."
> 
> Given the above results, it's not clear whether that is making things
> better or worse.

Yea, me neither. I think it's doubful that you'd see performance
difference due to the original ac1d7945f866b1928c2554c0f80fd52d7f92
, independent of the WaitEventSet stuff, at these throughput rates.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-12 Thread Andres Freund
On 2016-05-12 11:27:31 -0400, Robert Haas wrote:
> On Thu, May 12, 2016 at 11:13 AM, Andres Freund  wrote:
> > Could you run this one with a number of different backend_flush_after
> > settings?  I'm suspsecting the primary issue is that the default is too low.
> 
> What values do you think would be good to test?  Maybe provide 3 or 4
> suggested values to try?

0 (disabled), 16 (current default), 32, 64, 128, 256?

I'm suspecting that only backend_flush_after_* has these negative
performance implications at this point.  One path is to increase that
option's default value, another is to disable only backend guided
flushing. And add a strong hint that if you care about predictable
throughput you might want to enable it.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-12 Thread Robert Haas
On Thu, May 12, 2016 at 11:13 AM, Andres Freund  wrote:
> Could you run this one with a number of different backend_flush_after
> settings?  I'm suspsecting the primary issue is that the default is too low.

What values do you think would be good to test?  Maybe provide 3 or 4
suggested values to try?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-12 Thread Andres Freund
Hi,

On 2016-05-12 18:09:07 +0530, Ashutosh Sharma wrote:
> Please find the test results for the following set of combinations taken at
> 128 client counts:

Thanks.


> *1)* *Unpatched master, default *_flush_after :*  TPS = 10925.882396

Could you run this one with a number of different backend_flush_after
settings?  I'm suspsecting the primary issue is that the default is too low.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-12 Thread Robert Haas
On Thu, May 12, 2016 at 8:39 AM, Ashutosh Sharma  wrote:
> Please find the test results for the following set of combinations taken at
> 128 client counts:
>
> 1) Unpatched master, default *_flush_after :  TPS = 10925.882396
>
> 2) Unpatched master, *_flush_after=0 :  TPS = 18613.343529
>
> 3) That line removed with #if 0, default *_flush_after :  TPS = 9856.809278
>
> 4) That line removed with #if 0, *_flush_after=0 :  TPS = 18158.648023

I'm getting increasingly unhappy about the checkpoint flush control.
I saw major regressions on my parallel COPY test, too:

http://www.postgresql.org/message-id/ca+tgmoyouqf9cgcpgygngzqhcy-gcckryaqqtdu8kfe4n6h...@mail.gmail.com

That was a completely different machine (POWER7 instead of Intel,
lousy disks instead of good ones) and a completely different workload.
Considering these results, I think there's now plenty of evidence to
suggest that this feature is going to be horrible for a large number
of users.  A 45% regression on pgbench is horrible.  (Nobody wants to
take even a 1% hit for snapshot too old, right?)  Sure, it might not
be that way for every user on every Linux system, and I'm sure it
performed well on the systems where Andres benchmarked it, or he
wouldn't have committed it.  But our goal can't be to run well only on
the newest hardware with the least-buggy kernel...

> Here, That line points to "AddWaitEventToSet(FeBeWaitSet,
> WL_POSTMASTER_DEATH, -1, NULL, NULL); in pq_init()."

Given the above results, it's not clear whether that is making things
better or worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-12 Thread Ashutosh Sharma
Hi,

Please find the test results for the following set of combinations taken at
128 client counts:

*1)* *Unpatched master, default *_flush_after :*  TPS = 10925.882396

*2) Unpatched master, *_flush_after=0 :*  TPS = 18613.343529

*3)* *That line removed with #if 0, default *_flush_after :*  TPS =
9856.809278

*4)* *That line removed with #if 0, *_flush_after=0 :*  TPS = 18158.648023

Here, *That line* points to "*AddWaitEventToSet(FeBeWaitSet,
WL_POSTMASTER_DEATH, -1, NULL, NULL);* in pq_init()."

Please note that earlier i had taken readings with data directory and
pg_xlog directory at the same location in HDD. But this time i have changed
the location of pg_xlog to ssd and taken the readings. With pg_xlog and
data directory at the same location in HDD i was seeing much lesser
performance like for "*That line removed with #if 0, *_flush_after=0 :*"
case i was getting 7367.709378 tps.


Also, the commit-id on which i have taken above readings along with pgbench
commands used are mentioned below:

commit 8a13d5e6d1bb9ff9460c72992657077e57e30c32
Author: Tom Lane 
Date:   Wed May 11 17:06:53 2016 -0400

Fix infer_arbiter_indexes() to not barf on system columns.

*Non Default settings and test*:
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9 &

./pgbench -i -s 1000 postgres

./pgbench -c 128 -j 128 -T 1800 -M prepared postgres

With Regards,
Ashutosh Sharma
EnterpriseDB: *http://www.enterprisedb.com *

On Thu, May 12, 2016 at 9:22 AM, Robert Haas  wrote:

> On Wed, May 11, 2016 at 12:51 AM, Ashutosh Sharma 
> wrote:
> > I am extremely sorry for the delayed response.  As suggested by you, I
> have
> > taken the performance readings at 128 client counts after making the
> > following two changes:
> >
> > 1). Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
> > NULL); from pq_init(). Below is the git diff for the same.
> >
> > diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
> > index 8d6eb0b..399d54b 100644
> > --- a/src/backend/libpq/pqcomm.c
> > +++ b/src/backend/libpq/pqcomm.c
> > @@ -206,7 +206,9 @@ pq_init(void)
> > AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
> > MyProcPort->sock,
> >   NULL, NULL);
> > AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
> > +#if 0
> > AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
> NULL);
> > +#endif
> >
> > 2). Disabled the guc vars "bgwriter_flush_after",
> "checkpointer_flush_after"
> > and "backend_flush_after" by setting them to zero.
> >
> > After doing the above two changes below are the readings i got for 128
> > client counts:
> >
> > CASE : Read-Write Tests when data exceeds shared buffers.
> >
> > Non Default settings and test
> > ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
> > max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB
> -c
> > checkpoint_completion_target=0.9 &
> >
> > ./pgbench -i -s 1000 postgres
> >
> > ./pgbench -c 128 -j 128 -T 1800 -M prepared postgres
> >
> > Run1 : tps = 9690.678225
> > Run2 : tps = 9904.320645
> > Run3 : tps = 9943.547176
> >
> > Please let me know if i need to take readings with other client counts as
> > well.
>
> Can you please take four new sets of readings, like this:
>
> - Unpatched master, default *_flush_after
> - Unpatched master, *_flush_after=0
> - That line removed with #if 0, default *_flush_after
> - That line removed with #if 0, *_flush_after=0
>
> 128 clients is fine.  But I want to see four sets of numbers that were
> all taken by the same person at the same time using the same script.
>
> Thanks,
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 12:51 AM, Ashutosh Sharma  wrote:
> I am extremely sorry for the delayed response.  As suggested by you, I have
> taken the performance readings at 128 client counts after making the
> following two changes:
>
> 1). Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
> NULL); from pq_init(). Below is the git diff for the same.
>
> diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
> index 8d6eb0b..399d54b 100644
> --- a/src/backend/libpq/pqcomm.c
> +++ b/src/backend/libpq/pqcomm.c
> @@ -206,7 +206,9 @@ pq_init(void)
> AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
> MyProcPort->sock,
>   NULL, NULL);
> AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
> +#if 0
> AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
> +#endif
>
> 2). Disabled the guc vars "bgwriter_flush_after", "checkpointer_flush_after"
> and "backend_flush_after" by setting them to zero.
>
> After doing the above two changes below are the readings i got for 128
> client counts:
>
> CASE : Read-Write Tests when data exceeds shared buffers.
>
> Non Default settings and test
> ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
> max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
> checkpoint_completion_target=0.9 &
>
> ./pgbench -i -s 1000 postgres
>
> ./pgbench -c 128 -j 128 -T 1800 -M prepared postgres
>
> Run1 : tps = 9690.678225
> Run2 : tps = 9904.320645
> Run3 : tps = 9943.547176
>
> Please let me know if i need to take readings with other client counts as
> well.

Can you please take four new sets of readings, like this:

- Unpatched master, default *_flush_after
- Unpatched master, *_flush_after=0
- That line removed with #if 0, default *_flush_after
- That line removed with #if 0, *_flush_after=0

128 clients is fine.  But I want to see four sets of numbers that were
all taken by the same person at the same time using the same script.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-10 Thread Ashutosh Sharma
Hi Andres,

I am extremely sorry for the delayed response.  As suggested by you, I have
taken the performance readings at 128 client counts after making the
following two changes:

*1).* Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
NULL); from pq_init(). Below is the git diff for the same.

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 8d6eb0b..399d54b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -206,7 +206,9 @@ pq_init(void)
AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
MyProcPort->sock,
  NULL, NULL);
AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
+#if 0
AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
+#endif

*2).* Disabled the guc vars "bgwriter_flush_after",
"checkpointer_flush_after" and "backend_flush_after" by setting them to
zero.

After doing the above two changes below are the readings i got for 128
client counts:

*CASE :* Read-Write Tests when data exceeds shared buffers.

Non Default settings and test
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9 &

./pgbench -i -s 1000 postgres

./pgbench -c 128 -j 128 -T 1800 -M prepared postgres

*Run1 :* tps = 9690.678225
*Run2 :* tps = 9904.320645
*Run3 :* tps = 9943.547176

Please let me know if i need to take readings with other client counts as
well.

*Note:* I have taken these readings on postgres master head at,

commit 91fd1df4aad2141859310564b498a3e28055ee28
Author: Tom Lane 
Date:   Sun May 8 16:53:55 2016 -0400

With Regards,
Ashutosh Sharma
EnterpriseDB: *http://www.enterprisedb.com *

On Wed, May 11, 2016 at 3:53 AM, Andres Freund  wrote:

> Hi,
>
> On 2016-05-06 21:21:11 +0530, Mithun Cy wrote:
> > I will try to run the tests as you have suggested and will report the
> same.
>
> Any news on that front?
>
> Regards,
>
> Andres
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-10 Thread Andres Freund
Hi,

On 2016-05-06 21:21:11 +0530, Mithun Cy wrote:
> I will try to run the tests as you have suggested and will report the same.

Any news on that front?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-06 Thread Mithun Cy
On Fri, May 6, 2016 at 8:35 PM, Andres Freund  wrote:
> Also, do you see read-only workloads to be affected too?
Thanks, I have not tested with above specific commitid which reported
performance issue but
At HEAD commit 72a98a639574d2e25ed94652848555900c81a799
Author: Andres Freund 
Date:   Tue Apr 26 20:32:51 2016 -0700

READ-Only (prepared) tests (both when data fits to shared buffers or it
exceeds shared buffer=8GB) performance of master has improved over 9.5

*Sessions* *PostgreSQL-9.5 scale 300* *PostgreSQL-9.6 scale 300* *%diff*
*1* 5287.561594 5213.723197 -1.396454598
*8* 84265.389083 84871.305689 0.719057507
*16* 148330.4155 158661.128315 6.9646624936
*24* 207062.803697 219958.12974 6.2277366155
*32* 265145.089888 290190.501443 9.4459269699
*40* 311688.752973 34.551772 9.0833559212
*48* 327169.9673 372408.073033 13.8270960829
*56* 274426.530496 390629.24948 42.3438356248
*64* 261777.692042 384613.9666 46.9238893505
*72* 210747.55937 376390.162022 78.5976374517
*80* 220192.818648 398128.779329 80.8091570713
*88* 185176.91888 423906.711882 128.9198429512
*96* 161579.719039 421541.656474 160.8877271115
*104* 146935.568434 450672.740567 206.7145316618
*112* 136605.466232 432047.309248 216.2738074582
*120* 127687.175016 455458.086889 256.6983816753
*128* 120413.936453 428127.879242 255.5467845776

*Sessions* *PostgreSQL-9.5 scale 1000* *PostgreSQL-9.6 scale 1000* %diff
*1* 5103.812202 5155.434808 1.01145191
*8* 47741.9041 53117.805096 11.2603405694
*16* 89722.57031 86965.10079 -3.0733287182
*24* 130914.537373 153849.634245 17.5191367836
*32* 197125.725706 212454.474264 7.7761279017
*40* 248489.551052 270304.093767 8.7788571482
*48* 291884.652232 317257.836746 8.6928806705
*56* 304526.216047 359676.785476 18.1102862489
*64* 301440.463174 388324.710185 28.8230206709
*72* 194239.941979 393676.628802 102.6754254511
*80* 144879.527847 383365.678053 164.6099719885
*88* 122894.325326 372905.436117 203.4358463076
*96* 109836.31148 362208.867756 229.7715144249
*104* 103791.981583 352330.402278 239.4582094921
*112* 105189.206682 345722.499429 228.6672752217
*120* 108095.811432 342597.969088 216.939171416
*128* 113242.59492 333821.98763 194.7848270925

Even for READ-WRITE when data fits into shared buffer (scale_factor=300 and
shared_buffers=8GB) performance has improved.
Only case is when data exceeds shared_buffer(scale_factor=1000 and
shared_buffers=8GB) I see some regression.

I will try to run the tests as you have suggested and will report the same.


Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-06 Thread Andres Freund
Hi,

Thanks for benchmarking!

On 2016-05-06 19:43:52 +0530, Mithun Cy wrote:
> 1. # first bad commit: [ac1d7945f866b1928c2554c0f80fd52d7f92] Make idle
> backends exit if the postmaster dies.
> this made performance to drop from
> 
> 15947.21546 (15K +) to 13409.758510 (arround 13K+).

Let's debug this one first, it's a lot more local.  I'm rather surprised
that you're seing a big effect with that "few" TPS/socket operations;
and even more that our efforts to address that problem haven't been
fruitful (given we've verified the fix on a number of machines).

Can you verify that removing
AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
in src/backend/libpq/pqcomm.c : pq_init() restores performance?

I think it'd be best to test the back/forth on master with
bgwriter_flush_after = 0
checkpointer_flush_after = 0
backend_flush_after = 0
to isolate the issue.

Also, do you see read-only workloads to be affected too?

> 2. # first bad commit: [428b1d6b29ca599c5700d4bc4f4ce4c5880369bf] Allow to
> trigger kernel writeback after a configurable number of writes.

FWIW, it'd be very interesting to test again with a bigger
backend_flush_after setting.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers