Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-19 Thread Andres Freund
On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:
> Yeah, I wondered why that was different than the pattern established
> elsewhere when I was hacking on replication code.  There are actually
> several places where we call PostmasterIsAlive() unconditionally in a
> loop that waits for WL_POSTMASTER_DEATH but ignores the return code:

Note that just changing this implies a behavioural change in at least
some of those: If the loop is busy with work, the WaitLatch might never
be reached.  I think that might be ok in most of those, but it does
require examination.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-19 Thread Thomas Munro
On Sat, Sep 17, 2016 at 6:23 AM, Andres Freund  wrote:
> On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote:
>> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
>> > Yikes, that's a pretty absurd implementation.
>>
>> Not when you take into account that it's been written over 20 years ago ;)
>
> Well, that doesn't mean it can't be fixed ;)
>
>> > I'm not quite sure I understand why this an issue here - there shouldn't
>> > ever be events on this fd, so why is the kernel waking up all processes?
>> > It'd kinda makes sense it'd wake up all processes if there's one
>> > waiting, but ... ?
>>
>> Every read is an event, and that's what PostmasterIsAlive does.
>
> But in most places we only do a PostmasterIsAlive if WaitLatch returns
> WL_POSTMASTER_DEATH.  The only walreceiver related place that doesn't is
> WalRcvWaitForStartPosition(). If that's indeed the cause of your issues
> this quite possibly could be fixed by doing the
> if (!PostmasterIsAlive())
> exit(1);
> check not unconditionally, but only after the WaitLatch at the end of
> the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
> That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
> seems fine, we'll just loop once more outside (after a quick glance at
> least).

Yeah, I wondered why that was different than the pattern established
elsewhere when I was hacking on replication code.  There are actually
several places where we call PostmasterIsAlive() unconditionally in a
loop that waits for WL_POSTMASTER_DEATH but ignores the return code:
in pgarch.c, syncrep.c, walsender.c and walreceiver.c.  Should we just
change them all to check the return code and exit/break/ereport/etc as
appropriate?  That would match the code from autovacuum.c,
checkpointer.c, pgstat.c, be-secure.c and bgworker.c.  Something like
the attached.

The code in basebackup.c also waits for WL_POSTMASTER_DEATH but
doesn't check for it in the return value *or* call
PostmasterIsAlive().  I'm not sure what to make of that.  I didn't
test it but it looks like maybe it would continue running after
postmaster death but not honour the throttling rate limit because
WaitLatch would keep returning immediately.

-- 
Thomas Munro
http://www.enterprisedb.com


check-return-code-for-postmaster-death.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-19 Thread Robert Haas
On Fri, Sep 16, 2016 at 2:23 PM, Andres Freund  wrote:
>> Every read is an event, and that's what PostmasterIsAlive does.
>
> But in most places we only do a PostmasterIsAlive if WaitLatch returns
> WL_POSTMASTER_DEATH.  The only walreceiver related place that doesn't is
> WalRcvWaitForStartPosition(). If that's indeed the cause of your issues
> this quite possibly could be fixed by doing the
> if (!PostmasterIsAlive())
> exit(1);
> check not unconditionally, but only after the WaitLatch at the end of
> the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
> That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
> seems fine, we'll just loop once more outside (after a quick glance at
> least).

At least some of the latch implementations already check
PostmasterIsAlive() internally to avoid returning spurious events; and
secure_read() at least assumes that the WL_POSTMASTER_DEATH return is
reliable and doesn't need a double-check.

So we can probably just remove the check altogether and instead bail
out if it returns WL_POSTMASTER_DEATH.  That probably saves a system
call per loop iteration even on platforms where the kernel doesn't
exhibit any surprising behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Andres Freund
Hi,

On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote:
> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
> > Yikes, that's a pretty absurd implementation.
>
> Not when you take into account that it's been written over 20 years ago ;)

Well, that doesn't mean it can't be fixed ;)

> > I'm not quite sure I understand why this an issue here - there shouldn't
> > ever be events on this fd, so why is the kernel waking up all processes?
> > It'd kinda makes sense it'd wake up all processes if there's one
> > waiting, but ... ?
>
> Every read is an event, and that's what PostmasterIsAlive does.

But in most places we only do a PostmasterIsAlive if WaitLatch returns
WL_POSTMASTER_DEATH.  The only walreceiver related place that doesn't is
WalRcvWaitForStartPosition(). If that's indeed the cause of your issues 
this quite possibly could be fixed by doing the
if (!PostmasterIsAlive())
exit(1);
check not unconditionally, but only after the WaitLatch at the end of
the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
seems fine, we'll just loop once more outside (after a quick glance at
least).

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Marco Pfatschbacher
On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:
> > the current implementation of PostmasterIsAlive() uses a pipe to
> > monitor the existence of the postmaster process.
> > One end of the pipe is held open in the postmaster, while the other end is
> > inherited to all the auxiliary and background processes when they fork.
> > This leads to multiple processes calling select(2), poll(2) and read(2)
> > on the same end of the pipe.
> > While this is technically perfectly ok, it has the unfortunate side
> > effect that it triggers an inefficient behaviour[0] in the select/poll
> > implementation on some operating systems[1]:
> > The kernel can only keep track of one pid per select address and
> > thus has no other choice than to wakeup(9) every process that
> > is waiting on select/poll.
> 
> Yikes, that's a pretty absurd implementation.

Not when you take into account that it's been written over 20 years ago ;) 

> Does openbsd's kqueue implementation have the same issue? There's
> http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

Looks ok, but your milage may vary. I've seen strange subtle bugs
using kqeue..

struct kevent {
__uintptr_t ident;  /* identifier for this event */
short   filter; /* filter for event */
u_short flags;
u_int   fflags;
quad_t  data;
void*udata; /* opaque user data identifier */
};

> > Attached patch avoids the select contention by using a
> > separate pipe for each auxiliary and background process.
> 
> I'm quite unenthusiastic about forcing that many additional file
> descriptors onto the postmaster...

In our use case it's about 30.

> I'm not quite sure I understand why this an issue here - there shouldn't
> ever be events on this fd, so why is the kernel waking up all processes?
> It'd kinda makes sense it'd wake up all processes if there's one
> waiting, but ... ?

Every read is an event, and that's what PostmasterIsAlive does. 

   Marco


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Marco Pfatschbacher
On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> > kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> > kqueue patch to perform better on large FreeBSD systems, it would also
> > be a solution to this problem.
> 
> I just noticed that kqueue appears to offer a solution to this problem,
> ie one of the things you can wait for is exit of another process (named
> by PID, looks like).  If that's portable to all kqueue platforms, then
> integrating a substitute for the postmaster death pipe might push that
> patch over the hump to being a net win.
 
That sounds plausible.
I could give this a try after I get back from my vacation :)

   Marco


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Marco Pfatschbacher
On Thu, Sep 15, 2016 at 04:24:07PM -0400, Tom Lane wrote:
> Marco Pfatschbacher  writes:
> > the current implementation of PostmasterIsAlive() uses a pipe to
> > monitor the existence of the postmaster process.
> > One end of the pipe is held open in the postmaster, while the other end is
> > inherited to all the auxiliary and background processes when they fork.
> > This leads to multiple processes calling select(2), poll(2) and read(2)
> > on the same end of the pipe.
> > While this is technically perfectly ok, it has the unfortunate side
> > effect that it triggers an inefficient behaviour[0] in the select/poll
> > implementation on some operating systems[1]:
> > The kernel can only keep track of one pid per select address and
> > thus has no other choice than to wakeup(9) every process that
> > is waiting on select/poll.
> 
> That seems like a rather bad kernel bug.

It's a limitation that has been there since at least BSD4.4.
But yeah, I wished things were better. 

> > In our case the system had to wakeup ~3000 idle ssh processes
> > every time postgresql did call PostmasterIsAlive.
> 
> Uh, these are processes not even connected to the pipe in question?
> That's *really* a kernel bug.

The kernel does not know if they were selecting on that pipe,
because the only slot to keep that mapping has been already taken.
It's not that bad of a performance hit, If the system doesn't
many processes.

> > Attached patch avoids the select contention by using a
> > separate pipe for each auxiliary and background process.
> 
> I think this would likely move the performance problems somewhere else.
> In particular, it would mean that every postmaster child would inherit
> pipes leading to all the older children. 

I kept them at a minimum. (See ClosePostmasterPorts)

>  We could close 'em again
> I guess, but we have previously found that having to do things that
> way is a rather serious performance drag --- see the problems we had

I think closing a few files doesn't hurt too much, but I see your point.

> with POSIX named semaphores, here for instance:
> https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
> I really don't want the postmaster to be holding any per-child kernel
> resources.
> 
> It'd certainly be nice if we could find another solution besides
> the pipe-based one, but I don't think "more pipes" is the answer.
> There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
> when available; do the BSDen have anything like that?

Not that I know of.
But since the WalReceiver process seemed to be the one calling
PostmasterIsAlive way more often than the rest, maybe we could limit
the performance hit by not calling it on every received wal chunk?

Cheers,
   Marco


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Tom Lane
Thomas Munro  writes:
> Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> kqueue patch to perform better on large FreeBSD systems, it would also
> be a solution to this problem.

I just noticed that kqueue appears to offer a solution to this problem,
ie one of the things you can wait for is exit of another process (named
by PID, looks like).  If that's portable to all kqueue platforms, then
integrating a substitute for the postmaster death pipe might push that
patch over the hump to being a net win.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Tom Lane
Marco Pfatschbacher  writes:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.

That seems like a rather bad kernel bug.

> In our case the system had to wakeup ~3000 idle ssh processes
> every time postgresql did call PostmasterIsAlive.

Uh, these are processes not even connected to the pipe in question?
That's *really* a kernel bug.

> Attached patch avoids the select contention by using a
> separate pipe for each auxiliary and background process.

I think this would likely move the performance problems somewhere else.
In particular, it would mean that every postmaster child would inherit
pipes leading to all the older children.  We could close 'em again
I guess, but we have previously found that having to do things that
way is a rather serious performance drag --- see the problems we had
with POSIX named semaphores, here for instance:
https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
I really don't want the postmaster to be holding any per-child kernel
resources.

It'd certainly be nice if we could find another solution besides
the pipe-based one, but I don't think "more pipes" is the answer.
There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
when available; do the BSDen have anything like that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Thomas Munro
On Fri, Sep 16, 2016 at 1:57 AM, Marco Pfatschbacher
 wrote:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.
>
> [...]
>
>  BUGS
>  [...]
>  "Internally to the kernel, select() and pselect() work poorly if multiple
>  processes wait on the same file descriptor. Given that, it is rather
>  surprising to see that many daemons are written that way."
>
> [1]
>  At least OpenBSD and NetBSD are affected, FreeBSD rewrote
>  their select implementation in 8.0.

Very interesting.  Perhaps that is why NetBSD shows a speedup with the
kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
kqueue patch to perform better on large FreeBSD systems, it would also
be a solution to this problem.

[1] 
https://www.postgresql.org/message-id/flat/CAEepm%3D2i78TOJeV4O0-0meiihiRfVQ29ur7%3DMBHxsUKaPSWeAg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-15 Thread Andres Freund
Hi,

On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.

Yikes, that's a pretty absurd implementation.

Does openbsd's kqueue implementation have the same issue? There's
http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

> Attached patch avoids the select contention by using a
> separate pipe for each auxiliary and background process.

I'm quite unenthusiastic about forcing that many additional file
descriptors onto the postmaster...

I'm not quite sure I understand why this an issue here - there shouldn't
ever be events on this fd, so why is the kernel waking up all processes?
It'd kinda makes sense it'd wake up all processes if there's one
waiting, but ... ?


>  BUGS
>  [...]
>  "Internally to the kernel, select() and pselect() work poorly if multiple
>  processes wait on the same file descriptor. Given that, it is rather
>  surprising to see that many daemons are written that way."

Gee. Maybe it's more surprising that that issue isn't being addressed?


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers