Re: [HACKERS] subscription worker signalling wal writer too much

2017-10-05 Thread Robert Haas
On Tue, Sep 19, 2017 at 8:48 AM, Kyotaro HORIGUCHI
 wrote:
> Though a bit uneasy to have similar code on both side
> (XLogBackgroundFlush and XLogSetAsyncXactLSN) but +1 to this from
> me.

This patch wasn't formatted very nicely; attached is a version that
pgindent likes better and doesn't bust past 80 columns.

I think this patch needs significantly better comments.  There's no
explanation of the rather complex formula the patch proposed to use.
Maybe it's just copied from someplace else that does have a comment,
but in that case, the comments here should mention that.  And the
comment in the other place should probably be updated too, like:

/* Make sure to keep this in sync with the similar logic in
XLogDoAwesomeStuff() */

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


change_walwriter_wakeup_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker signalling wal writer too much

2017-09-19 Thread Kyotaro HORIGUCHI
At Sat, 26 Aug 2017 14:45:20 -0700, Jeff Janes  wrote in 

Re: [HACKERS] subscription worker signalling wal writer too much

2017-08-26 Thread Jeff Janes
On Mon, Jul 3, 2017 at 8:26 PM, Jeff Janes  wrote:

> On Sat, Jun 24, 2017 at 5:09 PM, Andres Freund  wrote:
>
>> On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
>>
>>
>>
>
>> > Wouldn't it
>> > better, and much simpler, just to have reverted the first change once
>> the
>> > second one was done?
>>
>> I think both can actually happen independently, no? It's pretty common
>> for the page lsn to be *older* than the corresponding commit.  In fact
>> you'll always hit that case unless there's also concurrent writes also
>> touching said page.
>>
>
> That is true, but I don't see any residual regression when going from
> pre-db76b1efbbab2441428 to 7975c5e0a992ae9a4-with-my-patch applied.  So I
> think the hint bit issue only shows up when hitting the same page
> aggressively, in which case the LSN does get advanced quickly enough that
> db76b1efbba solves it.  Perhaps a different test case could show that
> issue.  I also didn't see any improvement from the original 4de82f7d7c50a81e,
> so maybe 8 CPUs just isn't enough to detect the problem with setting
> hint-buts with permanent tables with synchronous_commit=false.
>

I've refined my benchmarking, using a smaller scaling factor (8, same as my
CPU count) and running the tests for longer. The smaller size means there
are more intensive updates on individual pgbench_accounts pages, and the
longer run times allows be unset hint bits to build up for longer (usually
I like short test runs with a large number of randomized repetitions, but
that doesn't work well here).  Since the scale factor is the same as the
number of CPUs, I've bumped the thread count so that it is more likely
someone will choose a non-contended value of pgbench_branches.bid to update
at any given moment.

pgbench -c16 -j16 -T1200 -M prepared -b tpcb-func --startup='set
synchronous_commit=false'

This testing did show the importance of waking up the wal writer so that
hint bits can be set:

commit TPS
cfafd8beadcd6f 22200.48
cfafd8beadcd6f_nowake 30766.83
median of 14 runs reported, p-val on difference of means is 2e-22.

Where nowake is a hackish disabling of the wake up introduced in
4de82f7d7c50a8,
forward ported to 9.6 branch.  (I still wake it if is is doing the big
sleep, I just took out the part that wake it up even from small sleeps)

So my revised benchmark is capable of detecting this effect, even with only
8 CPUs.

When I move to next commit where we set hint bits as long as the page was
re-dirtied after, get these numbers:

db76b1efbbab24 30742.69
db76b1efbbab24_nowake 31072.97

The difference between these is not statistically different, nor is the
difference between them and cfafd8beadcd6f.

So I think the logic you introduced in db76b1efbbab24 captures all the gain
there is to be captured, and 4de82f7d7c50a8 can be reverted.  The only
reason to wake up the wal writer is if it is taking the big sleep.

Of course, I can't prove a negative.  There could always be some test case
which demonstrates that 4de82f7d7c50a8 still matters in spite of
db76b1efbbab24.
So to be ultra-conservative, attached is a patch which should preserve all
wake-ups other than the ones known to be useless.

7975c5e0a 27810.66
7975c5e0a_patched 30821.41

So 7975c5 causes a significant regression (10% regression, p-val 7e-16 on
difference of the means).  It repeatedly wakes the wal-writer up to flush a
page which the wal-writer simply refuses to flush.  This can't serve any
purpose.  My patch still wakes it up to write the page if it has not been
written yet, and also still wakes it to flush a range of pages of
WalWriterFlushAfter is met.  But it just doesn't wake it up to flush a page
which has already been written and will not get flushed.

I'd prefer the code-simplifying change of just not waking it up anymore
except for when it is doing the big sleep, but I can't reliably detect a
performance difference between that and the more conservative change posted
here.

Cheers,

Jeff


change_walwriter_wakeup_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker signalling wal writer too much

2017-08-25 Thread Robert Haas
On Mon, Jul 3, 2017 at 11:26 PM, Jeff Janes  wrote:
> My understanding is that you can't start on a new file until the old file is
> completely synced, because the book keeping can currently only handle one
> file at a time.  So if you signal the wal writer to do the sync, you would
> just end up immediately blocking on it to finish.  Getting around that would
> require substantially more changes; we would need to implement a queue of
> full log files not yet synced so that we could move on without waiting.

Yeah.  I bet that effort would be reasonably worthwhile.  I believe
that the forced end-of-segment syncs are costing us a noticeable
amount of performance.  Once or twice I took a very half-hearted run
at doing what you describe here, but gave up pretty quickly; it seems
like a somewhat tricky problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker signalling wal writer too much

2017-07-03 Thread Jeff Janes
On Sat, Jun 24, 2017 at 5:09 PM, Andres Freund  wrote:

> Hi,
>
> On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
>
> > It looks like only limited consolidation was happening, the number of
> kills
> > invoked was more than half of the number of SetLatch.  I think the reason
> > is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately
> > suspends the calling process and gives the signalled process a chance to
> > run in that time slice.  The Wal Writer gets woken up, sees that it only
> > has a partial page to write and decides not to do anything, but resets
> the
> > latch.  It does this fast enough that the subscription worker hasn't
> > migrated CPUs yet.
>
> I think part of the problem here is that latches signal the owning
> process even if the owning process isn't actually sleeping - that's
> obviously quite pointless.  In cases where walwriter is busy, that
> actually causes a lot of superflous interrupted syscalls, because it
> actually never ends up waiting on the latch. There's also a lot of
> superflous context switches.  I think we can avoid doing so quite
> easily, as e.g. with the attached patch.  Could you check how much that
> helps your benchmark?
>

That patch doesn't make any improvement to the pgbench --unlogged-tables
benchmark.  I expected that, because this benchmark exposes the opposite
problem.  The wal writer isn't busy, it is constantly being woken up but
then immediately going back to sleep.



> > The first change made the WALWriter wake up and do a write and flush
> > whenever an async commit occurred and there was a completed WAL page to
> be
> > written.  This way the hint bits could be set while the heap page was
> still
> > hot, because they couldn't get set until the WAL covering the hinted-at
> > transaction commit is flushed.
> >
> > The second change said we can set hint bits before the WAL covering the
> > hinted-at transaction is flushed, as long the page LSN is newer than that
> > (so the wal covering the hinted-at transaction commit must be flushed
> > before the page containing the hint bit can be written).
> >
> > Then the third change makes the wal writer only write the full pages most
> > of the times it is woken up, not flush them.  It only flushes them once
> > every wal_writer_delay or wal_writer_flush_after, regardless of how often
> > it is woken up.
> >
> > It seems like the third change rendered the first one useless.
>
> I don't think so. Isn't the walwriter writing out the contents of the
> WAL is quite important because it makes room in wal_buffers for new WAL?
>

If we wanted to do that, I think the asynchronous committer should just
issue the write directly (unless wal_sync_method is open_datasync
or open_sync).  Writes are usually extremely fast and it not worth trying
to offload them to a background process, as your half of the signalling
takes longer than just doing the write yourself.  When that is not true and
writes start blocking due to extreme io congestion, them everything is
going to start blocking anyway (wal_buffers head will run into the tail,
and the tail can't advance because the writes are blocking) so while
offloading writes to a background process is then good in theory, it isn't
very effective.  If everyone is stuck, it doesn't matter which process is
directly stuck and which ones are indirectly stuck.


>
> I suspect we actually should wake up wal-writer *more* aggressively, to
> offload wal fsyncs from individual backends, even when they're not
> committing.  Right now we fsync whenever a segment is finished - we
> really don't want to do that in backends that could do other useful
> work.  I suspect it'd be a good idea to remove that logic from
> XLogWrite() and replace it with
> if (ProcGlobal->walwriterLatch)
> SetLatch(ProcGlobal->walwriterLatch);
>
>
My understanding is that you can't start on a new file until the old file
is completely synced, because the book keeping can currently only handle
one file at a time.  So if you signal the wal writer to do the sync, you
would just end up immediately blocking on it to finish.  Getting around
that would require substantially more changes; we would need to implement a
queue of full log files not yet synced so that we could move on without
waiting.  If we had such a queue, then sure, this would be a good idea to
just wake the wal writer. But currently, it only matters for large
transactions (\copy, insert ... select, bulk updates).  With small
transactions, you can't fill up log files quickly enough for it to make
much of a difference.  So I think this is something to do, but it is
independent from what I am currently going on about.



>
> > Wouldn't it
> > better, and much simpler, just to have reverted the first change once the
> > second one was done?
>
> I think both can actually happen independently, no? It's pretty common
> for the page lsn to be *older* than the corresponding commit.  In fact
> you'll always hit 

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-24 Thread Andres Freund
Hi,

On 2017-06-15 15:06:43 -0700, Jeff Janes wrote:
> > Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
> > this afaics would allow wal writer to go into sleep mode with half a
> > page filled, and it'd not be woken up again until the page is filled.
> > No?
> >
> 
> If it is taking the big sleep, then we always wake it up, since
> acd4c7d58baf09f.
> 
> I didn't change that part.  I only changed what we do if it not hibernating.

Right, I hadn't read enough of the context.


> It looks like only limited consolidation was happening, the number of kills
> invoked was more than half of the number of SetLatch.  I think the reason
> is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately
> suspends the calling process and gives the signalled process a chance to
> run in that time slice.  The Wal Writer gets woken up, sees that it only
> has a partial page to write and decides not to do anything, but resets the
> latch.  It does this fast enough that the subscription worker hasn't
> migrated CPUs yet.

I think part of the problem here is that latches signal the owning
process even if the owning process isn't actually sleeping - that's
obviously quite pointless.  In cases where walwriter is busy, that
actually causes a lot of superflous interrupted syscalls, because it
actually never ends up waiting on the latch. There's also a lot of
superflous context switches.  I think we can avoid doing so quite
easily, as e.g. with the attached patch.  Could you check how much that
helps your benchmark?


> The first change made the WALWriter wake up and do a write and flush
> whenever an async commit occurred and there was a completed WAL page to be
> written.  This way the hint bits could be set while the heap page was still
> hot, because they couldn't get set until the WAL covering the hinted-at
> transaction commit is flushed.
> 
> The second change said we can set hint bits before the WAL covering the
> hinted-at transaction is flushed, as long the page LSN is newer than that
> (so the wal covering the hinted-at transaction commit must be flushed
> before the page containing the hint bit can be written).
> 
> Then the third change makes the wal writer only write the full pages most
> of the times it is woken up, not flush them.  It only flushes them once
> every wal_writer_delay or wal_writer_flush_after, regardless of how often
> it is woken up.
> 
> It seems like the third change rendered the first one useless.

I don't think so. Isn't the walwriter writing out the contents of the
WAL is quite important because it makes room in wal_buffers for new WAL?

I suspect we actually should wake up wal-writer *more* aggressively, to
offload wal fsyncs from individual backends, even when they're not
committing.  Right now we fsync whenever a segment is finished - we
really don't want to do that in backends that could do other useful
work.  I suspect it'd be a good idea to remove that logic from
XLogWrite() and replace it with
if (ProcGlobal->walwriterLatch)
SetLatch(ProcGlobal->walwriterLatch);


> Wouldn't it
> better, and much simpler, just to have reverted the first change once the
> second one was done?

I think both can actually happen independently, no? It's pretty common
for the page lsn to be *older* than the corresponding commit.  In fact
you'll always hit that case unless there's also concurrent writes also
touching said page.


> If that were done, would the third change still be
> needed?  (It did seem to add some other features as well, so I'm going to
> assume we still want those...).

It's still useful, yes.  It avoids flushing the WAL too often
(page-by-page when there's a lot of writes), which can eat up a lot of
IOPS on fast storage.



> But now the first change is even worse than useless, it is positively
> harmful.  The only thing to stop it from waking the WALWriter for every
> async commit is this line:
> 
> /* if we have already flushed that far, we're done */
> if (WriteRqstPtr <= LogwrtResult.Flush)
> return;
> 
> Since LogwrtResult.Flush doesn't advance anymore, this condition doesn't
> becomes false due to us waking walwriter, it only becomes false once the
> timeout expires (which it would have done anyway with no help from us), or
> once wal_writer_flush_after is exceeded.  So now every async commit is
> waking the walwriter.  Most of those wake up do nothing (there is not a
> completely new patch to write), some write one completed page but don't
> flush anything, and very few do a flush, and that one would have been done
> anyway.

s/completely new patch/completely new page/?

In my opinion we actually *do* want to write (but not flush!) out such
pages, so I'm not sure I agree with that logic.  Have to think about
this some more...


Greetings,

Andres Freund
>From 50a3e62b16b752837f5a388e3259fee2bca6c2ab Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 24 Jun 2017 16:46:11 

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-20 Thread Jeff Janes
On Thu, Jun 15, 2017 at 3:06 PM, Jeff Janes  wrote:

>
> This new patch is simpler than the previous one, and more effective at
> speeding up replication. I assume it would speed up pgbench with
> synchronous_commit turned off (or against unlogged tables) as well, but I
> don't think I have the hardware needed to test that.
>

If I use the 'tpcb-func' script embodied in the attached patch to pgbench,
then I can see the performance difference against unlogged tables using 8
clients on a 8 CPU virtual machine. The normal tpcb-like script has too
much communication overhead, bouncing from pgbench to the postgres backend
7 times per transaction, to see the difference. I also had to
make autovacuum_vacuum_cost_delay=0, otherwise auto analyze holds a
snapshot long enough to bloat the HOT chains which injects a great deal of
variability into the timings.

Commit 7975c5e0a992ae9 in the 9.6 branch causes a regression of about 10%,
and the my patch from the previous email redeems that regression.  It also
gives the same improvement against 10dev HEAD.

Cheers,

Jeff


pgbench_function.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-15 Thread Jeff Janes
On Wed, Jun 14, 2017 at 4:29 PM, Andres Freund  wrote:

> On 2017-06-14 16:24:27 -0700, Jeff Janes wrote:
> > On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund 
> wrote:
> >
> > > On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > > > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes 
> > > wrote:
> > > >
> > > > > If I publish a pgbench workload and subscribe to it, the
> subscription
> > > > > worker is signalling the wal writer thousands of times a second,
> once
> > > for
> > > > > every async commit.  This has a noticeable performance cost.
> > > > >
> > > >
> > > > I've used a local variable to avoid waking up the wal writer more
> than
> > > once
> > > > for the same page boundary.  This reduces the number of wake-ups by
> about
> > > > 7/8.
> > >
> > > Maybe I'm missing something here, but isn't that going to reduce our
> > > guarantees about when asynchronously committed xacts are flushed out?
> > > You can easily fit a number of commits into the same page...   As this
> > > isn't specific to logical-rep, I don't think that's ok.
> > >
> >
> > The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't
> think
> > this changes that. (Also, it isn't really a guarantee, the fsync can take
> > many seconds to complete once we do initiate it, and there is absolutely
> > nothing we can do about that, other than do the fsync synchronously in
> the
> > first place).
>
> Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
> this afaics would allow wal writer to go into sleep mode with half a
> page filled, and it'd not be woken up again until the page is filled.
> No?
>

If it is taking the big sleep, then we always wake it up, since
acd4c7d58baf09f.

I didn't change that part.  I only changed what we do if it not hibernating.


> > > Have you chased down why there's that many wakeups?  Normally I'd have
> > > expected that a number of the SetLatch() calls get consolidated
> > > together, but I guess walwriter is "too quick" in waking up and
> > > resetting the latch?
>
> > I'll have to dig into that some more.  The 7/8 reduction I cited was just
> > in calls to SetLatch from that part of the code, I didn't measure whether
> > the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
> > determined that reduction, so it wasn't truly wake ups I measured.
> Actual
> > wake ups were measured only indirectly via the impact on performance.
> I'll
> > need to figure out how to instrument that without distorting the
> > performance too much in the process..
>
> I'd suspect that just measuring the number of kill() calls should be
> doable, if measured via perf or something like hta.t
>


It looks like only limited consolidation was happening, the number of kills
invoked was more than half of the number of SetLatch.  I think the reason
is what when kill(owner_pid, SIGUSR1) is called, the kernel immediately
suspends the calling process and gives the signalled process a chance to
run in that time slice.  The Wal Writer gets woken up, sees that it only
has a partial page to write and decides not to do anything, but resets the
latch.  It does this fast enough that the subscription worker hasn't
migrated CPUs yet.

But, if the WAL Writer sees it has already flushed up to the highest
eligible page boundary, why doesn't the SetAsync code see the same thing?
Because it isn't flushed that high, but only written that high.  It looks
like a story of three git commits:

commit 4de82f7d7c50a81ec8e70e2cb0ab413ab9134c0b
Author: Simon Riggs 
Date:   Sun Nov 13 09:00:57 2011 +

Wakeup WALWriter as needed for asynchronous commit performance.


commit db76b1efbbab2441428a9ef21f7ac9ba43c52482
Author: Andres Freund 
Date:   Mon Feb 15 22:15:35 2016 +0100

Allow SetHintBits() to succeed if the buffer's LSN is new enough.


commit 7975c5e0a992ae9a45e03d145e0d37e2b5a707f5
Author: Andres Freund 
Date:   Mon Feb 15 23:52:38 2016 +0100

Allow the WAL writer to flush WAL at a reduced rate.


The first change made the WALWriter wake up and do a write and flush
whenever an async commit occurred and there was a completed WAL page to be
written.  This way the hint bits could be set while the heap page was still
hot, because they couldn't get set until the WAL covering the hinted-at
transaction commit is flushed.

The second change said we can set hint bits before the WAL covering the
hinted-at transaction is flushed, as long the page LSN is newer than that
(so the wal covering the hinted-at transaction commit must be flushed
before the page containing the hint bit can be written).

Then the third change makes the wal writer only write the full pages most
of the times it is woken up, not flush them.  It only flushes them once
every wal_writer_delay or wal_writer_flush_after, regardless of how often
it is woken up.

It seems like the third change rendered the first one 

Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Andres Freund
On 2017-06-14 16:24:27 -0700, Jeff Janes wrote:
> On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund  wrote:
> 
> > On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes 
> > wrote:
> > >
> > > > If I publish a pgbench workload and subscribe to it, the subscription
> > > > worker is signalling the wal writer thousands of times a second, once
> > for
> > > > every async commit.  This has a noticeable performance cost.
> > > >
> > >
> > > I've used a local variable to avoid waking up the wal writer more than
> > once
> > > for the same page boundary.  This reduces the number of wake-ups by about
> > > 7/8.
> >
> > Maybe I'm missing something here, but isn't that going to reduce our
> > guarantees about when asynchronously committed xacts are flushed out?
> > You can easily fit a number of commits into the same page...   As this
> > isn't specific to logical-rep, I don't think that's ok.
> >
> 
> The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
> this changes that. (Also, it isn't really a guarantee, the fsync can take
> many seconds to complete once we do initiate it, and there is absolutely
> nothing we can do about that, other than do the fsync synchronously in the
> first place).

Well, wal_writer_delay doesn't work if walwriter is in sleep mode, and
this afaics would allow wal writer to go into sleep mode with half a
page filled, and it'd not be woken up again until the page is filled.
No?


> > Have you chased down why there's that many wakeups?  Normally I'd have
> > expected that a number of the SetLatch() calls get consolidated
> > together, but I guess walwriter is "too quick" in waking up and
> > resetting the latch?

> I'll have to dig into that some more.  The 7/8 reduction I cited was just
> in calls to SetLatch from that part of the code, I didn't measure whether
> the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
> determined that reduction, so it wasn't truly wake ups I measured.  Actual
> wake ups were measured only indirectly via the impact on performance.  I'll
> need to figure out how to instrument that without distorting the
> performance too much in the process..

I'd suspect that just measuring the number of kill() calls should be
doable, if measured via perf or something like hta.t


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes
On Wed, Jun 14, 2017 at 3:20 PM, Andres Freund  wrote:

> On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> > On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes 
> wrote:
> >
> > > If I publish a pgbench workload and subscribe to it, the subscription
> > > worker is signalling the wal writer thousands of times a second, once
> for
> > > every async commit.  This has a noticeable performance cost.
> > >
> >
> > I've used a local variable to avoid waking up the wal writer more than
> once
> > for the same page boundary.  This reduces the number of wake-ups by about
> > 7/8.
>
> Maybe I'm missing something here, but isn't that going to reduce our
> guarantees about when asynchronously committed xacts are flushed out?
> You can easily fit a number of commits into the same page...   As this
> isn't specific to logical-rep, I don't think that's ok.
>

The guarantee is based on wal_writer_delay not on SIGUSR1, so I don't think
this changes that. (Also, it isn't really a guarantee, the fsync can take
many seconds to complete once we do initiate it, and there is absolutely
nothing we can do about that, other than do the fsync synchronously in the
first place).

The reason for kicking the wal writer at page boundaries is so that hint
bits can get set earlier than they otherwise could. But I don't think
kicking it multiple times per page boundary can help in that effort.


>
> Have you chased down why there's that many wakeups?  Normally I'd have
> expected that a number of the SetLatch() calls get consolidated
> together, but I guess walwriter is "too quick" in waking up and
> resetting the latch?
>

I'll have to dig into that some more.  The 7/8 reduction I cited was just
in calls to SetLatch from that part of the code, I didn't measure whether
the SetLatch actually called kill(owner_pid, SIGUSR1) or not when I
determined that reduction, so it wasn't truly wake ups I measured.  Actual
wake ups were measured only indirectly via the impact on performance.  I'll
need to figure out how to instrument that without distorting the
performance too much in the process..

Cheers,

Jeff


Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Andres Freund
On 2017-06-14 15:08:49 -0700, Jeff Janes wrote:
> On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes  wrote:
> 
> > If I publish a pgbench workload and subscribe to it, the subscription
> > worker is signalling the wal writer thousands of times a second, once for
> > every async commit.  This has a noticeable performance cost.
> >
> 
> I've used a local variable to avoid waking up the wal writer more than once
> for the same page boundary.  This reduces the number of wake-ups by about
> 7/8.

Maybe I'm missing something here, but isn't that going to reduce our
guarantees about when asynchronously committed xacts are flushed out?
You can easily fit a number of commits into the same page...   As this
isn't specific to logical-rep, I don't think that's ok.


Have you chased down why there's that many wakeups?  Normally I'd have
expected that a number of the SetLatch() calls get consolidated
together, but I guess walwriter is "too quick" in waking up and
resetting the latch?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes
On Wed, Jun 14, 2017 at 11:55 AM, Jeff Janes  wrote:

> If I publish a pgbench workload and subscribe to it, the subscription
> worker is signalling the wal writer thousands of times a second, once for
> every async commit.  This has a noticeable performance cost.
>

I've used a local variable to avoid waking up the wal writer more than once
for the same page boundary.  This reduces the number of wake-ups by about
7/8.

I'm testing it by doing 1e6 transactions over 8 clients while replication
is in effect, then waiting for the logical replica to catch up.  This cycle
takes 183.1 seconds in HEAD, and 162.4 seconds with the attached patch.
N=14, p-value for difference of the means 6e-17.

If I suppress all wake-ups just to see what would happen, it further
reduces the runtime to 153.7.

Cheers,

Jeff


wake_wal_writer_less_aggressively.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] subscription worker signalling wal writer too much

2017-06-14 Thread Jeff Janes
If I publish a pgbench workload and subscribe to it, the subscription
worker is signalling the wal writer thousands of times a second, once for
every async commit.  This has a noticeable performance cost.

I don't think it is ever necessary to signal the wal writer here, unless
wal writer is taking the long sleep for power saving purposes.  If up
to wal_writer_delay of "committed" transactions get lost on a crash, it
doesn't really matter because they will be replayed again once replication
resumes. However, might delaying the advance of the hot_standby_feedback by
up to the amount of wal_writer_delay be a problem?  I would think any
set-up which depends on the standby never being a few dozen milliseconds
behind is already doomed.

If I want to suppress signalling, what would be the right way to
communicate to XLogSetAsyncXactLSN that it is being called in a
subscription worker?

Another approach would be to make XLogSetAsyncXactLSN signalling less
aggressive for everyone, not just subscription workers.  There is no point
in signalling it more than once for a given WriteRqstPtr page boundary.  So
each backend could locally remember the last boundary for which it
signalled wal writer, and not signal again for the same boundary.  This
would be especially effective for a subscription worker, as it should be
pretty common for almost all the async commits to be coming from the same
process.  Or, the last boundary could be kept in shared memory (XLogCtl) so
all backends can share it, at the expense of additional work needed to be
done under a spin lock.

Other ideas?

Thanks,

Jeff