Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
On 2016-09-20 11:07:03 +1200, Thomas Munro wrote: > Yeah, I wondered why that was different than the pattern established > elsewhere when I was hacking on replication code. There are actually > several places where we call PostmasterIsAlive() unconditionally in a > loop that waits for WL_POSTMASTER_DEATH but ignores the return code: Note that just changing this implies a behavioural change in at least some of those: If the loop is busy with work, the WaitLatch might never be reached. I think that might be ok in most of those, but it does require examination. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
On Sat, Sep 17, 2016 at 6:23 AM, Andres Freund wrote: > On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote: >> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote: >> > Yikes, that's a pretty absurd implementation. >> >> Not when you take into account that it's been written over 20 years ago ;) > > Well, that doesn't mean it can't be fixed ;) > >> > I'm not quite sure I understand why this an issue here - there shouldn't >> > ever be events on this fd, so why is the kernel waking up all processes? >> > It'd kinda makes sense it'd wake up all processes if there's one >> > waiting, but ... ? >> >> Every read is an event, and that's what PostmasterIsAlive does. > > But in most places we only do a PostmasterIsAlive if WaitLatch returns > WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is > WalRcvWaitForStartPosition(). If that's indeed the cause of your issues > this quite possibly could be fixed by doing the > if (!PostmasterIsAlive()) > exit(1); > check not unconditionally, but only after the WaitLatch at the end of > the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()? > That'll be a minor behaviour change for the WALRCV_RESTARTING, but that > seems fine, we'll just loop once more outside (after a quick glance at > least). Yeah, I wondered why that was different than the pattern established elsewhere when I was hacking on replication code. There are actually several places where we call PostmasterIsAlive() unconditionally in a loop that waits for WL_POSTMASTER_DEATH but ignores the return code: in pgarch.c, syncrep.c, walsender.c and walreceiver.c. Should we just change them all to check the return code and exit/break/ereport/etc as appropriate? That would match the code from autovacuum.c, checkpointer.c, pgstat.c, be-secure.c and bgworker.c. Something like the attached. The code in basebackup.c also waits for WL_POSTMASTER_DEATH but doesn't check for it in the return value *or* call PostmasterIsAlive(). I'm not sure what to make of that. I didn't test it but it looks like maybe it would continue running after postmaster death but not honour the throttling rate limit because WaitLatch would keep returning immediately. -- Thomas Munro http://www.enterprisedb.com check-return-code-for-postmaster-death.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
On Fri, Sep 16, 2016 at 2:23 PM, Andres Freund wrote: >> Every read is an event, and that's what PostmasterIsAlive does. > > But in most places we only do a PostmasterIsAlive if WaitLatch returns > WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is > WalRcvWaitForStartPosition(). If that's indeed the cause of your issues > this quite possibly could be fixed by doing the > if (!PostmasterIsAlive()) > exit(1); > check not unconditionally, but only after the WaitLatch at the end of > the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()? > That'll be a minor behaviour change for the WALRCV_RESTARTING, but that > seems fine, we'll just loop once more outside (after a quick glance at > least). At least some of the latch implementations already check PostmasterIsAlive() internally to avoid returning spurious events; and secure_read() at least assumes that the WL_POSTMASTER_DEATH return is reliable and doesn't need a double-check. So we can probably just remove the check altogether and instead bail out if it returns WL_POSTMASTER_DEATH. That probably saves a system call per loop iteration even on platforms where the kernel doesn't exhibit any surprising behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Hi, On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote: > On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote: > > Yikes, that's a pretty absurd implementation. > > Not when you take into account that it's been written over 20 years ago ;) Well, that doesn't mean it can't be fixed ;) > > I'm not quite sure I understand why this an issue here - there shouldn't > > ever be events on this fd, so why is the kernel waking up all processes? > > It'd kinda makes sense it'd wake up all processes if there's one > > waiting, but ... ? > > Every read is an event, and that's what PostmasterIsAlive does. But in most places we only do a PostmasterIsAlive if WaitLatch returns WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is WalRcvWaitForStartPosition(). If that's indeed the cause of your issues this quite possibly could be fixed by doing the if (!PostmasterIsAlive()) exit(1); check not unconditionally, but only after the WaitLatch at the end of the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()? That'll be a minor behaviour change for the WALRCV_RESTARTING, but that seems fine, we'll just loop once more outside (after a quick glance at least). Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote: > Hi, > > On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote: > > the current implementation of PostmasterIsAlive() uses a pipe to > > monitor the existence of the postmaster process. > > One end of the pipe is held open in the postmaster, while the other end is > > inherited to all the auxiliary and background processes when they fork. > > This leads to multiple processes calling select(2), poll(2) and read(2) > > on the same end of the pipe. > > While this is technically perfectly ok, it has the unfortunate side > > effect that it triggers an inefficient behaviour[0] in the select/poll > > implementation on some operating systems[1]: > > The kernel can only keep track of one pid per select address and > > thus has no other choice than to wakeup(9) every process that > > is waiting on select/poll. > > Yikes, that's a pretty absurd implementation. Not when you take into account that it's been written over 20 years ago ;) > Does openbsd's kqueue implementation have the same issue? There's > http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com Looks ok, but your milage may vary. I've seen strange subtle bugs using kqeue.. struct kevent { __uintptr_t ident; /* identifier for this event */ short filter; /* filter for event */ u_short flags; u_int fflags; quad_t data; void*udata; /* opaque user data identifier */ }; > > Attached patch avoids the select contention by using a > > separate pipe for each auxiliary and background process. > > I'm quite unenthusiastic about forcing that many additional file > descriptors onto the postmaster... In our use case it's about 30. > I'm not quite sure I understand why this an issue here - there shouldn't > ever be events on this fd, so why is the kernel waking up all processes? > It'd kinda makes sense it'd wake up all processes if there's one > waiting, but ... ? Every read is an event, and that's what PostmasterIsAlive does. Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote: > Thomas Munro writes: > > Very interesting. Perhaps that is why NetBSD shows a speedup with the > > kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the > > kqueue patch to perform better on large FreeBSD systems, it would also > > be a solution to this problem. > > I just noticed that kqueue appears to offer a solution to this problem, > ie one of the things you can wait for is exit of another process (named > by PID, looks like). If that's portable to all kqueue platforms, then > integrating a substitute for the postmaster death pipe might push that > patch over the hump to being a net win. That sounds plausible. I could give this a try after I get back from my vacation :) Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
On Thu, Sep 15, 2016 at 04:24:07PM -0400, Tom Lane wrote: > Marco Pfatschbacher writes: > > the current implementation of PostmasterIsAlive() uses a pipe to > > monitor the existence of the postmaster process. > > One end of the pipe is held open in the postmaster, while the other end is > > inherited to all the auxiliary and background processes when they fork. > > This leads to multiple processes calling select(2), poll(2) and read(2) > > on the same end of the pipe. > > While this is technically perfectly ok, it has the unfortunate side > > effect that it triggers an inefficient behaviour[0] in the select/poll > > implementation on some operating systems[1]: > > The kernel can only keep track of one pid per select address and > > thus has no other choice than to wakeup(9) every process that > > is waiting on select/poll. > > That seems like a rather bad kernel bug. It's a limitation that has been there since at least BSD4.4. But yeah, I wished things were better. > > In our case the system had to wakeup ~3000 idle ssh processes > > every time postgresql did call PostmasterIsAlive. > > Uh, these are processes not even connected to the pipe in question? > That's *really* a kernel bug. The kernel does not know if they were selecting on that pipe, because the only slot to keep that mapping has been already taken. It's not that bad of a performance hit, If the system doesn't many processes. > > Attached patch avoids the select contention by using a > > separate pipe for each auxiliary and background process. > > I think this would likely move the performance problems somewhere else. > In particular, it would mean that every postmaster child would inherit > pipes leading to all the older children. I kept them at a minimum. (See ClosePostmasterPorts) > We could close 'em again > I guess, but we have previously found that having to do things that > way is a rather serious performance drag --- see the problems we had I think closing a few files doesn't hurt too much, but I see your point. > with POSIX named semaphores, here for instance: > https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com > I really don't want the postmaster to be holding any per-child kernel > resources. > > It'd certainly be nice if we could find another solution besides > the pipe-based one, but I don't think "more pipes" is the answer. > There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG) > when available; do the BSDen have anything like that? Not that I know of. But since the WalReceiver process seemed to be the one calling PostmasterIsAlive way more often than the rest, maybe we could limit the performance hit by not calling it on every received wal chunk? Cheers, Marco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Thomas Munro writes: > Very interesting. Perhaps that is why NetBSD shows a speedup with the > kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the > kqueue patch to perform better on large FreeBSD systems, it would also > be a solution to this problem. I just noticed that kqueue appears to offer a solution to this problem, ie one of the things you can wait for is exit of another process (named by PID, looks like). If that's portable to all kqueue platforms, then integrating a substitute for the postmaster death pipe might push that patch over the hump to being a net win. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Marco Pfatschbacher writes: > the current implementation of PostmasterIsAlive() uses a pipe to > monitor the existence of the postmaster process. > One end of the pipe is held open in the postmaster, while the other end is > inherited to all the auxiliary and background processes when they fork. > This leads to multiple processes calling select(2), poll(2) and read(2) > on the same end of the pipe. > While this is technically perfectly ok, it has the unfortunate side > effect that it triggers an inefficient behaviour[0] in the select/poll > implementation on some operating systems[1]: > The kernel can only keep track of one pid per select address and > thus has no other choice than to wakeup(9) every process that > is waiting on select/poll. That seems like a rather bad kernel bug. > In our case the system had to wakeup ~3000 idle ssh processes > every time postgresql did call PostmasterIsAlive. Uh, these are processes not even connected to the pipe in question? That's *really* a kernel bug. > Attached patch avoids the select contention by using a > separate pipe for each auxiliary and background process. I think this would likely move the performance problems somewhere else. In particular, it would mean that every postmaster child would inherit pipes leading to all the older children. We could close 'em again I guess, but we have previously found that having to do things that way is a rather serious performance drag --- see the problems we had with POSIX named semaphores, here for instance: https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com I really don't want the postmaster to be holding any per-child kernel resources. It'd certainly be nice if we could find another solution besides the pipe-based one, but I don't think "more pipes" is the answer. There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG) when available; do the BSDen have anything like that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
On Fri, Sep 16, 2016 at 1:57 AM, Marco Pfatschbacher wrote: > the current implementation of PostmasterIsAlive() uses a pipe to > monitor the existence of the postmaster process. > One end of the pipe is held open in the postmaster, while the other end is > inherited to all the auxiliary and background processes when they fork. > This leads to multiple processes calling select(2), poll(2) and read(2) > on the same end of the pipe. > While this is technically perfectly ok, it has the unfortunate side > effect that it triggers an inefficient behaviour[0] in the select/poll > implementation on some operating systems[1]: > The kernel can only keep track of one pid per select address and > thus has no other choice than to wakeup(9) every process that > is waiting on select/poll. > > [...] > > BUGS > [...] > "Internally to the kernel, select() and pselect() work poorly if multiple > processes wait on the same file descriptor. Given that, it is rather > surprising to see that many daemons are written that way." > > [1] > At least OpenBSD and NetBSD are affected, FreeBSD rewrote > their select implementation in 8.0. Very interesting. Perhaps that is why NetBSD shows a speedup with the kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the kqueue patch to perform better on large FreeBSD systems, it would also be a solution to this problem. [1] https://www.postgresql.org/message-id/flat/CAEepm%3D2i78TOJeV4O0-0meiihiRfVQ29ur7%3DMBHxsUKaPSWeAg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process
Hi, On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote: > the current implementation of PostmasterIsAlive() uses a pipe to > monitor the existence of the postmaster process. > One end of the pipe is held open in the postmaster, while the other end is > inherited to all the auxiliary and background processes when they fork. > This leads to multiple processes calling select(2), poll(2) and read(2) > on the same end of the pipe. > While this is technically perfectly ok, it has the unfortunate side > effect that it triggers an inefficient behaviour[0] in the select/poll > implementation on some operating systems[1]: > The kernel can only keep track of one pid per select address and > thus has no other choice than to wakeup(9) every process that > is waiting on select/poll. Yikes, that's a pretty absurd implementation. Does openbsd's kqueue implementation have the same issue? There's http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com > Attached patch avoids the select contention by using a > separate pipe for each auxiliary and background process. I'm quite unenthusiastic about forcing that many additional file descriptors onto the postmaster... I'm not quite sure I understand why this an issue here - there shouldn't ever be events on this fd, so why is the kernel waking up all processes? It'd kinda makes sense it'd wake up all processes if there's one waiting, but ... ? > BUGS > [...] > "Internally to the kernel, select() and pselect() work poorly if multiple > processes wait on the same file descriptor. Given that, it is rather > surprising to see that many daemons are written that way." Gee. Maybe it's more surprising that that issue isn't being addressed? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers