Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-07-11 Thread Heikki Linnakangas

On 11/07/18 04:16, Thomas Munro wrote:

On Tue, Jul 10, 2018 at 11:39 PM, Heikki Linnakangas  wrote:

I don't have a FreeBSD machine at hand, so I didn't try fixing that
patch.


I updated the FreeBSD version to use the header test approach you
showed, and pushed that too.  FWIW the build farm has some FreeBSD
animals with and without PROC_PDEATHSIG_CTL.


Thanks!


I suppose it's possibly that we might want to reconsider the choice of
signal in the future (SIGINFO or SIGPWR).


We could reuse SIGUSR1 for this. If we set the flag in SIGUSR1 handler, 
then some PostmasterIsAlive() calls would take the slow path 
unnecessarily, but it would probably be OK. The slow path isn't that 
slow. But using SIGINFO/SIGPWR seems fine.


- Heikki



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-07-10 Thread Thomas Munro
On Tue, Jul 10, 2018 at 11:39 PM, Heikki Linnakangas  wrote:
> The 'postmaster_possibly_dead' flag is not reset anywhere. So if a process
> receives a spurious death signal, even though postmaster is still alive,
> PostmasterIsAlive() will continue to use the slow path.

+1

> postmaster_possibly_dead needs to be marked as 'volatile', no?

+1

> The autoconf check for PR_SET_PDEATHSIG seems slightly misplaced. And I
> think we can simplify it with AC_CHECK_HEADER(). I'd also like to avoid
> adding code to c.h for this, that seems too global.

+1, much nicer, thanks.

> After some kibitzing, I ended up with the attached. It fixes the
> postmaster_possible_dead issues mentioned above, and moves around the
> autoconf and #ifdef logic a bit to make it a bit nicer, at least in my
> opinion.

Thanks, that looks good to me.  I added your name as co-author and
pushed to master.

I also made a couple of minor cosmetic changes in
PostmasterDeathSignalInit() to make the follow-up patch prettier (#if
defined() instead of #ifdef, and a signum variable because I later
need its address).

> I don't have a FreeBSD machine at hand, so I didn't try fixing that
> patch.

I updated the FreeBSD version to use the header test approach you
showed, and pushed that too.  FWIW the build farm has some FreeBSD
animals with and without PROC_PDEATHSIG_CTL.

I suppose it's possibly that we might want to reconsider the choice of
signal in the future (SIGINFO or SIGPWR).

(Random archeological note: TIL that Linux stole  from
Irix (RIP), but it had PR_TERMCHILD instead of PR_SET_PRDEATHSIG.)

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-07-10 Thread Heikki Linnakangas

On 27/06/18 08:26, Thomas Munro wrote:

On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro
 wrote:

On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier  wrote:

Thomas, trying to understand here...  Why this place for the signal
initialization?  Wouldn't InitPostmasterChild() be a more logical place
as we'd want to have this logic caught by all other processes?


Yeah, you're right.  Here's one like that.


Amazingly, due to the way the project schedules fell and thanks to the
help of a couple of very supportive FreeBSD committers and reviewers,
the new PROC_PDEATHSIG_CTL kernel facility[1] landed in a production
release today, beating the Commitfest by several days.

My question is whether this patch set is enough, or people (Andres?) want
to go further and actually kill the postmaster death pipe completely.
My kqueue patch would almost completely kill it on BSDs as it happens,
but for Linux there are a number of races and complications to plug to
do that IIUC.  I don't immediately see what there is to gain by doing
that work (assuming we reuse WaitEventSet objects in enough places),
and we'll always need to maintain the pipe code for other OSes anyway.
So I'm -0.5 on doing that, even though the technical puzzle is
appealing...

[1] 
https://www.freebsd.org/cgi/man.cgi?query=procctl=0=2=FreeBSD+11.2-RELEASE=default=html


Yeah, I don't think we can kill the pipe completely. As long as we still 
need it for the other OSes, I don't see much point in eliminating it on 
Linux and BSDs either. I'd rather keep the code similar across platforms.


Looking at the patch:

The 'postmaster_possibly_dead' flag is not reset anywhere. So if a 
process receives a spurious death signal, even though postmaster is 
still alive, PostmasterIsAlive() will continue to use the slow path.


postmaster_possibly_dead needs to be marked as 'volatile', no?

The autoconf check for PR_SET_PDEATHSIG seems slightly misplaced. And I 
think we can simplify it with AC_CHECK_HEADER(). I'd also like to avoid 
adding code to c.h for this, that seems too global.


After some kibitzing, I ended up with the attached. It fixes the 
postmaster_possible_dead issues mentioned above, and moves around the 
autoconf and #ifdef logic a bit to make it a bit nicer, at least in my 
opinion. I don't have a FreeBSD machine at hand, so I didn't try fixing 
that patch.


- Heikki
>From 9418ed472d113a80bf0fbb209efb0835767a5c50 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 10 Jul 2018 14:31:48 +0300
Subject: [PATCH 1/1] Use signals for postmaster death on Linux.

Linux provides a way to ask for a signal when your parent process dies.
Use that to make PostmasterIsAlive() very cheap.

Author: Thomas Munro, based on a suggestion from Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi
Discussion: https://postgr.es/m/20180411002643.6buofht4ranhei7k%40alap3.anarazel.de
---
 configure  |   2 +-
 configure.in   |   2 +-
 src/backend/storage/ipc/latch.c|   6 +-
 src/backend/storage/ipc/pmsignal.c | 123 +
 src/backend/utils/init/miscinit.c  |   4 ++
 src/include/pg_config.h.in |   3 +
 src/include/pg_config.h.win32  |   3 +
 src/include/storage/pmsignal.h |  33 +-
 8 files changed, 157 insertions(+), 19 deletions(-)

diff --git a/configure b/configure
index 1bc465b942..41e0e1cf34 100755
--- a/configure
+++ b/configure
@@ -12494,7 +12494,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h
 fi
 
 
-for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/prctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do :
   as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh`
 ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default"
diff --git a/configure.in b/configure.in
index a6b3b88cfa..1e76c9ee46 100644
--- a/configure.in
+++ b/configure.in
@@ -1260,7 +1260,7 @@ AC_SUBST(UUID_LIBS)
 
 AC_HEADER_STDBOOL
 
-AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
+AC_CHECK_HEADERS([atomic.h crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/ipc.h sys/prctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/un.h termios.h ucred.h utime.h 

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-06-26 Thread Thomas Munro
On Wed, Apr 25, 2018 at 6:23 PM, Thomas Munro
 wrote:
> On Tue, Apr 24, 2018 at 7:37 PM, Michael Paquier  wrote:
>> Thomas, trying to understand here...  Why this place for the signal
>> initialization?  Wouldn't InitPostmasterChild() be a more logical place
>> as we'd want to have this logic caught by all other processes?
>
> Yeah, you're right.  Here's one like that.

Amazingly, due to the way the project schedules fell and thanks to the
help of a couple of very supportive FreeBSD committers and reviewers,
the new PROC_PDEATHSIG_CTL kernel facility[1] landed in a production
release today, beating the Commitfest by several days.

My question is whether this patch set is enough, or people (Andres?) want
to go further and actually kill the postmaster death pipe completely.
My kqueue patch would almost completely kill it on BSDs as it happens,
but for Linux there are a number of races and complications to plug to
do that IIUC.  I don't immediately see what there is to gain by doing
that work (assuming we reuse WaitEventSet objects in enough places),
and we'll always need to maintain the pipe code for other OSes anyway.
So I'm -0.5 on doing that, even though the technical puzzle is
appealing...

[1] 
https://www.freebsd.org/cgi/man.cgi?query=procctl=0=2=FreeBSD+11.2-RELEASE=default=html

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-24 Thread Michael Paquier
On Sat, Apr 21, 2018 at 12:25:27PM +1200, Thomas Munro wrote:
> Here's a new version, because FreeBSD's new interface changed slightly.

I have been looking at the proposed set for Linux, and the numbers are
here.  By replaying 1GB worth of WAL after a pgbench run with the data
folder on a tmpfs the recovery time goes from 33s to 28s, so that's a
nice gain.

Do you have numbers with FreeBSD?  I get that this would be more
difficult to set up without a GA release perhaps...

I can also see the difference in profiles by looking for
HandleStartupProcInterrupts which gets close 10% of the attention when
unpatched, and down to 0.1% when patched.

@@ -2484,6 +2484,8 @@ ClosePostmasterPorts(bool am_syslogger)
 if (bonjour_sdref)
 close(DNSServiceRefSockFD(bonjour_sdref));
 #endif
+
+PostmasterDeathInit();

Thomas, trying to understand here...  Why this place for the signal
initialization?  Wouldn't InitPostmasterChild() be a more logical place
as we'd want to have this logic caught by all other processes?
--
Michael


signature.asc
Description: PGP signature


Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-20 Thread Thomas Munro
Here's a new version, because FreeBSD's new interface changed slightly.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Use-signals-for-postmaster-death-on-Linux-v3.patch
Description: Binary data


0002-Use-signals-for-postmaster-death-on-FreeBSD-v3.patch
Description: Binary data


Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-19 Thread Thomas Munro
On Thu, Apr 19, 2018 at 6:20 PM, Andres Freund  wrote:
> On April 18, 2018 8:05:50 PM PDT, Thomas Munro 
>  wrote:
>>By the way, these patches only use the death signal to make
>>PostmasterIsAlive() fast, for use by busy loops like recovery.  The
>>postmaster pipe is still used for IO/timeout loops to detect
>>postmaster death.  In theory you could get rid of the postmaster pipe
>>completely when USE_POSTMASTER_DEATH_SIGNAL is defined and make it
>>like the latch code, using the same self-pipe.  I'm not sure if there
>>is anything to be gained by that (that wasn't already gained by using
>>epoll/kqueue) so I'm not proposing it.
>
> In my local prototype patch I'd done so. And I think it makes sense, because 
> it's s somewhat contended object, even when using epoll/kqueue. On the other 
> hand, it makes the chide changed a bit harder, making it pretty was were I 
> suspended the work for a bit...

Hmm.  I thought the whole idea of these interfaces was "don't call us,
we'll call you" inside the kernel, so you can add thousands of pipes
if you like, but epoll/kevent will only check the queue and then wait
for notification, rather than interacting with the referenced objects?

If that is true, and given that we need to maintain the pipe-based
code anyway for platforms that need it, then what would we gain by
adding a different code path just because we can?  Obviously
WaitLatch() (the thing that creates, adds pipe, waits, then destroys
every time) could save time by not having to deal with a postmaster
pipe, but the solution to that is another patch that switches more
things to reusable WES objects.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-19 Thread Andres Freund


On April 18, 2018 8:05:50 PM PDT, Thomas Munro  
wrote:
>On Wed, Apr 18, 2018 at 5:04 PM, Thomas Munro
> wrote:
>> On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas
> wrote:
 On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund
>
 wrote:
> That person said he'd work on adding an equivalent of linux'
> prctl(PR_SET_PDEATHSIG) to FreeBSD.
>>
>> Here is an implementation of Andres's idea for Linux, and also for
>> patched FreeBSD (for later if/when that lands).  Do you think this
>> makes sense Heikki?  I am planning to add this to the next CF.
>
>Here's a new version with a stupid bug fixed (I accidentally posted a
>testing version that returned false instead of true, as cfbot quickly
>pointed out -- d'oh).
>
>By the way, these patches only use the death signal to make
>PostmasterIsAlive() fast, for use by busy loops like recovery.  The
>postmaster pipe is still used for IO/timeout loops to detect
>postmaster death.  In theory you could get rid of the postmaster pipe
>completely when USE_POSTMASTER_DEATH_SIGNAL is defined and make it
>like the latch code, using the same self-pipe.  I'm not sure if there
>is anything to be gained by that (that wasn't already gained by using
>epoll/kqueue) so I'm not proposing it.

In my local prototype patch I'd done so. And I think it makes sense, because 
it's s somewhat contended object, even when using epoll/kqueue. On the other 
hand, it makes the chide changed a bit harder, making it pretty was were I 
suspended the work for a bit...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-18 Thread Thomas Munro
On Wed, Apr 18, 2018 at 5:04 PM, Thomas Munro
 wrote:
> On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas  wrote:
>>> On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund 
>>> wrote:
 That person said he'd work on adding an equivalent of linux'
 prctl(PR_SET_PDEATHSIG) to FreeBSD.
>
> Here is an implementation of Andres's idea for Linux, and also for
> patched FreeBSD (for later if/when that lands).  Do you think this
> makes sense Heikki?  I am planning to add this to the next CF.

Here's a new version with a stupid bug fixed (I accidentally posted a
testing version that returned false instead of true, as cfbot quickly
pointed out -- d'oh).

By the way, these patches only use the death signal to make
PostmasterIsAlive() fast, for use by busy loops like recovery.  The
postmaster pipe is still used for IO/timeout loops to detect
postmaster death.  In theory you could get rid of the postmaster pipe
completely when USE_POSTMASTER_DEATH_SIGNAL is defined and make it
like the latch code, using the same self-pipe.  I'm not sure if there
is anything to be gained by that (that wasn't already gained by using
epoll/kqueue) so I'm not proposing it.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Use-signals-for-postmaster-death-on-Linux-v2.patch
Description: Binary data


0002-Use-signals-for-postmaster-death-on-FreeBSD-v2.patch
Description: Binary data


Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-11 Thread Thomas Munro
On Wed, Apr 11, 2018 at 10:22 PM, Heikki Linnakangas  wrote:
> On 10/04/18 04:36, Thomas Munro wrote:
>> Just an idea, not tested: what about a reusable WaitEventSet with zero
>> timeout?  Using the kqueue patch, that'd call kevent() which'd return
>> immediately and tell you if any postmaster death notifications had
>> arrive on your queue since last time you asked.  It doesn't even touch
>> the pipe, or any other kernel objects apart from your own queue IIUC.
>
> Hmm. In PostmasterIsAlive(), you'd still need to call kevent() to check if
> postmaster has died. It would just replace the current read() syscall on the
> pipe with the kevent() syscall. Is it faster?

It should be (based on the report of read() being slow here because of
contention on the pipe itself, I guess because of frequent poll() in
WaitLatch() elsewhere?).

But as I said over on another thread[1] (sorry, it got tangled up with
that other conversation about a related topic), maybe testing
getppid() would be simpler and about as fast as possible given you
have to make a syscall (all processes should normally be children of
postmaster, right?).  And only check every nth time through the loop,
as you said, to avoid high frequency syscalls.  I think I might have
been guilty of having a solution looking for a problem, there ;-)

[1] 
https://www.postgresql.org/message-id/CAEepm%3D298omvRS2C8WO%3DCxp%2BWcM-Vn8V3x4_QhxURhKTRCSfYg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-11 Thread Heikki Linnakangas

On 10/04/18 04:36, Thomas Munro wrote:

On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund  wrote:

I coincidentally got pinged about our current approach causing
performance problems on FreeBSD and started writing a patch.  The
problem there appears to be that constantly attaching events to the read
pipe end, from multiple processes, causes significant contention inside
the kernel. Which isn't that surprising.   That's distinct from the
problem netbsd/openbsd reported a while back (superflous wakeups).

That person said he'd work on adding an equivalent of linux'
prctl(PR_SET_PDEATHSIG) to FreeBSD.


Just an idea, not tested: what about a reusable WaitEventSet with zero
timeout?  Using the kqueue patch, that'd call kevent() which'd return
immediately and tell you if any postmaster death notifications had
arrive on your queue since last time you asked.  It doesn't even touch
the pipe, or any other kernel objects apart from your own queue IIUC.


Hmm. In PostmasterIsAlive(), you'd still need to call kevent() to check 
if postmaster has died. It would just replace the current read() syscall 
on the pipe with the kevent() syscall. Is it faster?


- Heikki



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund


On April 9, 2018 6:57:23 PM PDT, Alvaro Herrera  wrote:
>Andres Freund wrote:
>> 
>> On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera
> wrote:
>
>> >Would it work to use this second pipe, to which each child writes a
>> >byte that postmaster never reads, and then rely on SIGPIPE when
>> >postmaster dies?  Then we never need to do a syscall.
>> 
>> I'm not following, could you expand on what you're suggesting?  Note
>> that you do not get SIGPIPE for already buffered writes.  Which
>> syscall can we avoid?
>
>Ah.  I was thinking we'd get SIGPIPE from the byte sent at the start,
>as
>soon as the kernel saw that postmaster abandoned the fd by dying.
>Scratch that then.

Had the same idea, but unfortunately reality, in the form of a test program, 
cured me of my hope ;)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Alvaro Herrera
Andres Freund wrote:
> 
> On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera  
> wrote:

> >Would it work to use this second pipe, to which each child writes a
> >byte that postmaster never reads, and then rely on SIGPIPE when
> >postmaster dies?  Then we never need to do a syscall.
> 
> I'm not following, could you expand on what you're suggesting?  Note
> that you do not get SIGPIPE for already buffered writes.  Which
> syscall can we avoid?

Ah.  I was thinking we'd get SIGPIPE from the byte sent at the start, as
soon as the kernel saw that postmaster abandoned the fd by dying.
Scratch that then.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund


On April 9, 2018 6:36:19 PM PDT, Thomas Munro  
wrote:
>On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund 
>wrote:
>> I coincidentally got pinged about our current approach causing
>> performance problems on FreeBSD and started writing a patch.  The
>> problem there appears to be that constantly attaching events to the
>read
>> pipe end, from multiple processes, causes significant contention
>inside
>> the kernel. Which isn't that surprising.   That's distinct from the
>> problem netbsd/openbsd reported a while back (superflous wakeups).
>>
>> That person said he'd work on adding an equivalent of linux'
>> prctl(PR_SET_PDEATHSIG) to FreeBSD.
>
>Just an idea, not tested: what about a reusable WaitEventSet with zero
>timeout?  Using the kqueue patch, that'd call kevent() which'd return
>immediately and tell you if any postmaster death notifications had
>arrive on your queue since last time you asked.  It doesn't even touch
>the pipe, or any other kernel objects apart from your own queue IIUC.

We still create a lot of WES adhoc in a number of places. I don't think this 
would solve that?  The problem isn't that IsAlive is expensive, it's that 
polling is expensive. It's possible that using kqueue would fix that, because 
the highest frequency cases use a persistent WES.

Andres

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Thomas Munro
On Tue, Apr 10, 2018 at 12:53 PM, Andres Freund  wrote:
> I coincidentally got pinged about our current approach causing
> performance problems on FreeBSD and started writing a patch.  The
> problem there appears to be that constantly attaching events to the read
> pipe end, from multiple processes, causes significant contention inside
> the kernel. Which isn't that surprising.   That's distinct from the
> problem netbsd/openbsd reported a while back (superflous wakeups).
>
> That person said he'd work on adding an equivalent of linux'
> prctl(PR_SET_PDEATHSIG) to FreeBSD.

Just an idea, not tested: what about a reusable WaitEventSet with zero
timeout?  Using the kqueue patch, that'd call kevent() which'd return
immediately and tell you if any postmaster death notifications had
arrive on your queue since last time you asked.  It doesn't even touch
the pipe, or any other kernel objects apart from your own queue IIUC.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund


On April 9, 2018 6:31:07 PM PDT, Alvaro Herrera  wrote:
>Andres Freund wrote:
>
>> Another approach, that's simpler to implement, is to simply have a
>> second selfpipe, just for WL_POSTMASTER_DEATH.
>
>Would it work to use this second pipe, to which each child writes a
>byte
>that postmaster never reads, and then rely on SIGPIPE when postmaster
>dies?  Then we never need to do a syscall.

I'm not following, could you expand on what you're suggesting?  Note that you 
do not get SIGPIPE for already buffered writes.  Which syscall can we avoid?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Alvaro Herrera
Andres Freund wrote:

> Another approach, that's simpler to implement, is to simply have a
> second selfpipe, just for WL_POSTMASTER_DEATH.

Would it work to use this second pipe, to which each child writes a byte
that postmaster never reads, and then rely on SIGPIPE when postmaster
dies?  Then we never need to do a syscall.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-09 Thread Andres Freund
Hi,

On 2018-04-05 12:20:38 -0700, Andres Freund wrote:
> > While it's not POSIX, at least some platforms are capable of delivering
> > a separate signal on parent process death.  Perhaps using that where
> > available would be enough of an answer.
> 
> Yea, that'd work on linux. Which is probably the platform 80-95% of
> performance critical PG workloads run on.  There's
> JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
> but I'm not sure it provides enough opportunity for cleanup.

I coincidentally got pinged about our current approach causing
performance problems on FreeBSD and started writing a patch.  The
problem there appears to be that constantly attaching events to the read
pipe end, from multiple processes, causes significant contention inside
the kernel. Which isn't that surprising.   That's distinct from the
problem netbsd/openbsd reported a while back (superflous wakeups).

That person said he'd work on adding an equivalent of linux'
prctl(PR_SET_PDEATHSIG) to FreeBSD.

It's not particularly hard to whip something up that kind of
works. Getting some of the details right isn't entirely as clear
however:

It's trivial to make PostmasterIsAlive() cheap. In my prototype I've
setup PR_SET_PDEATHSIG to deliver SIGINFO to postmaster children. That
sets parent_potentially_died to true. PostmasterIsAlive() only runs the
main body when it's true, making it very cheap in the common case.

What I'm however not quite as clear on is how to best change
WL_POSTMASTER_DEATH.

One appraoch approach I can come up with is to use the self-pipe for
both WL_POSTMASTER_DEATH and WL_LATCH_SET. As we can cheaply check if
either set->latch->is_set or parent_potentially_died, re-using the
selfpipe works well enough.  What I'm however not quite sure about is
how to best do so with epoll() - there we explicitly do not iterate over
all registered events, but use epoll_event->data to get just the wait
event associated with a readyness event.  Therefor we've to keep track
of whether a WaitEventSet has both WL_POSTMASTER_DEATH and WL_LATCH_SET
registered, and check in both WL_POSTMASTER_DEATH/WL_LATCH_SET whether
the event is being waited for.

Another approach, that's simpler to implement, is to simply have a
second selfpipe, just for WL_POSTMASTER_DEATH.


A third question is whether we want to keep postmaster_alive_fds if we
have PR_SET_PDEATHSIG. I'd want to continue re-checking that the parent
is actually dead, but we could also do so by kill()ing postmaster like
we used to do before 89fd72cbf26f5d2e3d86ab19c1ead73ab8fac0fe.  It's not
bad to get rid of unnecessary filedescriptors. Would imply a semantic
change however.

Greetings,

Andres Freund



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Stephen Frost
Greetings,

* Heikki Linnakangas (hlinn...@iki.fi) wrote:
> On 06/04/18 19:39, Andres Freund wrote:
> >On 2018-04-06 07:39:28 -0400, Stephen Frost wrote:
> >>While I tend to agree that it'd be nice to just make it cheaper, that
> >>doesn't seem like something that we'd be likely to back-patch and I tend
> >>to share Heikki's feelings that this is a performance regression we
> >>should be considering fixing in released versions.
> 
> To be clear, this isn't a performance *regression*. It's always been bad.

Oh, I see, apologies for the confusion, my initial read was that this
was due to some patch that had gone in previously, hence it was an
actual regression.  I suppose I tend to view performance issues as
either "regression" or "opportunity for improvement" and when you said
"bug" it made me think it was a regression. :)

> I'm not sure if I'd backpatch this. Maybe after it's been in 'master' for a
> while and we've gotten some field testing of it.

If it's not a regression then there's I definitely think the bar is much
higher to consider this something to back-patch.  I wouldn't typically
argue for back-patching a performance improvement unless it's to address
a specific regression.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Heikki Linnakangas

On 06/04/18 19:39, Andres Freund wrote:

On 2018-04-06 07:39:28 -0400, Stephen Frost wrote:

While I tend to agree that it'd be nice to just make it cheaper, that
doesn't seem like something that we'd be likely to back-patch and I tend
to share Heikki's feelings that this is a performance regression we
should be considering fixing in released versions.


To be clear, this isn't a performance *regression*. It's always been bad.

I'm not sure if I'd backpatch this. Maybe after it's been in 'master' 
for a while and we've gotten some field testing of it.



I'm doubtful about fairly characterizing this as a performance bug. It's
not like we've O(n^2) behaviour on our hand, and if your replay isn't of
a toy workload normally that one syscall isn't going to make a huge
difference because you've actual IO and such going on.


If all the data fits in the buffer cache, then there would be no I/O. 
Think of a smallish database that's heavily updated. There are a lot of 
real applications like that.



I'm also doubtful that it's sane to just check every 32 records. There's
records that can take a good chunk of time, and just continuing for
another 31 records seems like a bad idea.


It's pretty arbitrary, I admit. It's the best I could come with, though. 
If we could get a signal on postmaster death, that'd be best, but that's 
a much bigger patch, and I'm worried that it would bring new portability 
and reliability issues.


I'm not too worried about 32 records being too long an interval. True, 
replaying 32 CREATE DATABASE records would take a long time. But pretty 
much all other WAL records are fast enough to apply. We could make it 
every 8 records rather than 32, if that makes you feel better. Or add 
some extra conditions, like always check it when stepping to a new WAL 
segment. In any case, the fundamental difference would be though to not 
check it between every record.


- Heikki



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Andres Freund
Hi,

On 2018-04-06 07:39:28 -0400, Stephen Frost wrote:
> While I tend to agree that it'd be nice to just make it cheaper, that
> doesn't seem like something that we'd be likely to back-patch and I tend
> to share Heikki's feelings that this is a performance regression we
> should be considering fixing in released versions.

I'm doubtful about fairly characterizing this as a performance bug. It's
not like we've O(n^2) behaviour on our hand, and if your replay isn't of
a toy workload normally that one syscall isn't going to make a huge
difference because you've actual IO and such going on.

I'm also doubtful that it's sane to just check every 32 records. There's
records that can take a good chunk of time, and just continuing for
another 31 records seems like a bad idea.

Greetings,

Andres Freund



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-06 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-04-05 14:39:27 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > ISTM the better approach would be to try to reduce the cost of
> > > PostmasterIsAlive() on common platforms - it should be nearly free if
> > > done right.
> > 
> > +1 if it's doable.
[...]
> > While it's not POSIX, at least some platforms are capable of delivering
> > a separate signal on parent process death.  Perhaps using that where
> > available would be enough of an answer.
> 
> Yea, that'd work on linux. Which is probably the platform 80-95% of
> performance critical PG workloads run on.  There's
> JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
> but I'm not sure it provides enough opportunity for cleanup.

While I tend to agree that it'd be nice to just make it cheaper, that
doesn't seem like something that we'd be likely to back-patch and I tend
to share Heikki's feelings that this is a performance regression we
should be considering fixing in released versions.

What Alvaro posted up-thread seems like it might be a small enough
change to still be reasonable to back-patch and we can still think about
ways to make PostmasterIsAlive() cheaper in the future.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Michael Paquier
On Thu, Apr 05, 2018 at 02:39:27PM -0400, Tom Lane wrote:
> While it's not POSIX, at least some platforms are capable of delivering
> a separate signal on parent process death.  Perhaps using that where
> available would be enough of an answer.

Are you referring to prctl here?

+1 on improving performance of PostmasterIsAlive() if possible instead
of checking for it every N records.  You barely see records creating a
database from a large template close to each other but that could hurt
the response time of the redo process.
--
Michael


signature.asc
Description: PGP signature


Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Andres Freund
On 2018-04-05 14:39:27 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > ISTM the better approach would be to try to reduce the cost of
> > PostmasterIsAlive() on common platforms - it should be nearly free if
> > done right.
> 
> +1 if it's doable.
> 
> > One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
> > check for postmaster death inside the handler, without reacting to
> > it. Then the the actual PostmasterIsAlive() checks are just a check of a
> > single sig_atomic_t.
> 
> AFAIR, we do not get SIGPIPE on the postmaster pipe, because nobody
> ever writes to it.  So this sketch seems off to me, even assuming that
> not-ignoring SIGPIPE causes no problems elsewhere.

Yea, you're probably right. I'm mostly brainstorming here.

(FWIW, I don't think not ignoring SIGPIPE would be a huge issue if we
don't immediately take any action on its account)


> While it's not POSIX, at least some platforms are capable of delivering
> a separate signal on parent process death.  Perhaps using that where
> available would be enough of an answer.

Yea, that'd work on linux. Which is probably the platform 80-95% of
performance critical PG workloads run on.  There's
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE on windows, which might also work,
but I'm not sure it provides enough opportunity for cleanup.

Greetings,

Andres Freund



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Tom Lane
Andres Freund  writes:
> ISTM the better approach would be to try to reduce the cost of
> PostmasterIsAlive() on common platforms - it should be nearly free if
> done right.

+1 if it's doable.

> One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
> check for postmaster death inside the handler, without reacting to
> it. Then the the actual PostmasterIsAlive() checks are just a check of a
> single sig_atomic_t.

AFAIR, we do not get SIGPIPE on the postmaster pipe, because nobody
ever writes to it.  So this sketch seems off to me, even assuming that
not-ignoring SIGPIPE causes no problems elsewhere.

While it's not POSIX, at least some platforms are capable of delivering
a separate signal on parent process death.  Perhaps using that where
available would be enough of an answer.

regards, tom lane



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Andres Freund
Hi,

On 2018-04-05 10:23:43 +0300, Heikki Linnakangas wrote:
> Profiling that, without any patches applied, I noticed that a lot of time
> was spent in read()s on the postmaster-death pipe, i.e. in
> PostmasterIsAlive(). We call that between *every* WAL record.

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records.

I agree this is a performance problem. I do however not like the fix.
ISTM the better approach would be to try to reduce the cost of
PostmasterIsAlive() on common platforms - it should be nearly free if
done right.

One way to achieve that would e.g. to stop ignoring SIGPIPE and instead
check for postmaster death inside the handler, without reacting to
it. Then the the actual PostmasterIsAlive() checks are just a check of a
single sig_atomic_t.

Greetings,

Andres Freund



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Simon Riggs
On 5 April 2018 at 08:23, Heikki Linnakangas  wrote:

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records. That would make the main redo loop less
> responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
> There are also calls to HandleStartupProcInterrupts() in the various other
> loops, that wait for new WAL to arrive or recovery delay, so this would only
> affect the case where we're actively replaying records.

+1

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Alvaro Herrera
Heikki Linnakangas wrote:

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records. That would make the main redo loop less
> responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
> There are also calls to HandleStartupProcInterrupts() in the various other
> loops, that wait for new WAL to arrive or recovery delay, so this would only
> affect the case where we're actively replaying records.

I think calling PostmasterIsAlive only every 32 records is sensible, but
I'm not so sure about the other two things happening in
HandleStartupProcInterrupts (which are much cheaper anyway, since they
only read a local flag); if records are large or otherwise slow to
replay, I'd rather not wait for 32 of them before the process honoring
my shutdown request.  Why not split the function in two, or maybe add a
flag "check for postmaster alive too", passed as true every 32 records?

The lesson seems to be that PostmasterIsAlive is moderately expensive
(one syscall).  It's also used in WaitLatchOrSocket when
WL_POSTMASTER_DEATH is given.  I didn't audit its callers terribly
carefully but I think these uses are not as performance-sensitive.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Heikki Linnakangas
I started looking at the "Improve compactify_tuples and 
PageRepairFragmentation" patch, and set up a little performance test of 
WAL replay. I ran pgbench, scale 5, to generate about 1 GB of WAL, and 
timed how long it takes to replay that WAL. To focus purely on CPU 
overhead, I kept the data directory in /dev/shm/.


Profiling that, without any patches applied, I noticed that a lot of 
time was spent in read()s on the postmaster-death pipe, i.e. in 
PostmasterIsAlive(). We call that between *every* WAL record.


As a quick test to see how much that matters, I commented out the 
PostmasterIsAlive() call from HandleStartupProcInterrupts(). On 
unpatched master, replaying that 1 GB of WAL takes about 20 seconds on 
my laptop. Without the PostmasterIsAlive() call, 17 seconds.


That seems like an utter waste of time. I'm almost inclined to call that 
a performance bug. As a straightforward fix, I'd suggest that we call 
HandleStartupProcInterrupts() in the WAL redo loop, not on every record, 
but only e.g. every 32 records. That would make the main redo loop less 
responsive to shutdown, SIGHUP, or postmaster death, but that seems OK. 
There are also calls to HandleStartupProcInterrupts() in the various 
other loops, that wait for new WAL to arrive or recovery delay, so this 
would only affect the case where we're actively replaying records.


- Heikki
>From 596d3804c784b06f2125aa7727b82a265b08ccfb Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Thu, 5 Apr 2018 10:18:01 +0300
Subject: [PATCH 1/1] Call HandleStartupProcInterrupts() less frequently in WAL
 redo.

HandleStartupProcInterrupts() calls PostmasterIsAlive(), which calls read()
on the postmaster death watch pipe. That's relatively expensive, when we do
it between every WAL record.
---
 src/backend/access/transam/xlog.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b4fd8395b7..3406c58c9b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7072,6 +7072,7 @@ StartupXLOG(void)
 		{
 			ErrorContextCallback errcallback;
 			TimestampTz xtime;
+			uint32		records_replayed = 0;
 
 			InRedo = true;
 
@@ -7105,8 +7106,13 @@ StartupXLOG(void)
 }
 #endif
 
-/* Handle interrupt signals of startup process */
-HandleStartupProcInterrupts();
+/*
+ * Handle interrupt signals of startup process. This includes
+ * a call to PostmasterIsAlive(), which isn't totally free, so
+ * don't do this on every record, to avoid the overhead.
+ */
+if (records_replayed % 32 == 0)
+	HandleStartupProcInterrupts();
 
 /*
  * Pause WAL replay, if requested by a hot-standby session via
@@ -7269,6 +7275,7 @@ StartupXLOG(void)
 
 /* Remember this record as the last-applied one */
 LastRec = ReadRecPtr;
+records_replayed++;
 
 /* Allow read-only connections if we're consistent now */
 CheckRecoveryConsistency();
-- 
2.11.0