Re: [HACKERS] atexit_callback can be a net negative
Hi, On 2014-03-07 09:50:15 -0800, Peter LaDow wrote: > Also, the on_exit_reset() does clear up the issue, and we have > implemented it as suggested in this thread (calling it immediately > after the fork() in the child). And Andres' keen eye noted we were > calling exit() after an exec() failure, rather than _exit(). Thank > you, Andres, for pointing out this error. I actually didn't see any source code of yours ;), just answered Tom's question about what to do after exec() failed. There's several other wierd behaviours if you use exit() instead of _exit() after a fork, most prominently double flushes leading to repeated/corrupted output when you have have stream FILEs open in the parent. This is because the stream will be flushed in both, the parent (at some later write or exit) and the child (at exit). It's simply something that won't work well. > > I don't see any way it's be safe for a pg unaware library to start > > forking around and doing similar random things inside normal > > backends. > > We aren't exactly "forking around." We were trying to spawn an > asynchronous process without any ties to the postmaster. The important bit in the sentence you quote is "pg unaware library". My point is just that there are some special considerations you have to be aware of. fork() and exec() is a way to escape that environment, and should be fine. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Sorry for the bit of top-posting, but I wanted to make some things clear. Also, I wasn't subscribed to pgsql-hackers at the time this thread began, so I apologize for the missing headers that might cause threading issues. I'm the submitter of bug #9464. Here's the background on what we are doing. We are on a limited resource, embedded device. We make use of the database as an event driven system. In the case of an incoming event, such as a settings change, we make use of this fork/exec procedure to spawn an asynchronous process to handle certain events. Some of these external processes are long running, some need to run outside the current transaction context, and some we don't care about the result--we just want it to run. Also, the on_exit_reset() does clear up the issue, and we have implemented it as suggested in this thread (calling it immediately after the fork() in the child). And Andres' keen eye noted we were calling exit() after an exec() failure, rather than _exit(). Thank you, Andres, for pointing out this error. Andres Freund 2ndquadrant.com> writes: > On 2014-03-07 09:49:05 -0500, Tom Lane wrote: > > Andres Freund 2ndquadrant.com> writes: > > > What are you proposing to do in that case? This is only one of the > > > failure cases of forking carelessly, right? Just to be clear, we intended to fork careFULLy, not careLESSly. Hence the reason for the double fork with an eventual exec(). We intended to be clearly separated from the backend and executing in our own clean, unrelated context. > I don't think it's a reasonable pattern run background processes that > aren't managed by postgres with all shared memory still > accessible. You'll have to either also detach from shared memory and In this case we definitely did NOT want to be managed by postgres. Hence the double fork. The intent was that the first level child would NOT be managed by postgres, hence the exit(). > related things, or you have to fork() and exec(). At the very least, not Which is _exactly_ what we were trying to do. The first fork was only so that we could fork again and spawn a subprocess completely detached from the postmaster. And also to have something for the postmaster to reap via waitpid and prevent a zombie from hanging around. The typical daemon stuff. > integrating the child with the postmaster's lifetime will prevent > postgres from restarting because there's still a child attached to the > shared memory. Which is _exactly_ what we were NOT trying to do. We did not want to integrate with postmaster. > I don't see any way it's be safe for a pg unaware library to start > forking around and doing similar random things inside normal > backends. We aren't exactly "forking around." We were trying to spawn an asynchronous process without any ties to the postmaster. This was expected to be well-defined, clean behavior. A fork/exec isn't an unusual thing to do, and we didn't expect that exiting a child would invoke behavior that would cause problems. Thanks, Pete -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Hi, On 2014-03-07 10:24:31 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-03-07 09:49:05 -0500, Tom Lane wrote: > >> No, I think it should do nothing. The coding pattern shown in bug #9464 > >> seems perfectly reasonable and I think we should allow it. > > > I don't think it's a reasonable pattern run background processes that > > aren't managed by postgres with all shared memory still > > accessible. You'll have to either also detach from shared memory and > > related things, or you have to fork() and exec(). > > The code in question is trying to do that. And what do you think will > happen if the exec() fails? In code that's written properly, not much. It will use _exit() after the exec() failure. That's pretty established best practice after forking, especially after a exec() failure. > > At the very least, not > > integrating the child with the postmaster's lifetime will prevent > > postgres from restarting because there's still a child attached to the > > shared memory. > > I think you're willfully missing the point. The reason we added > atexit_callback was to try to defend ourselves against third-party code > that did things in a non-Postgres-aware way. Hey, I am not arguing that we should remove protection, I am saying that we should make it more obvious that dangerous stuff happens. That way people can fix stuff during development rather than finding out stuff breaks in some corner cases on busy live systems. > Arguing that such code > should do things in a Postgres-aware way is not helpful for the concerns > here, and it's not relevant to reality either, because people will load > stuff like libperl into backends. Good luck getting a post-fork > on_exit_reset() call into libperl. libperl won't fork on it's own, without the user telling it to do so, and code that forks can very well be careful to do POSIX::_exit(), especially in case a exec fails. I don't have much problem telling people that a fork() without a direct exec() afterwards simply isn't supported from PLs. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Andres Freund writes: > On 2014-03-07 09:49:05 -0500, Tom Lane wrote: >> No, I think it should do nothing. The coding pattern shown in bug #9464 >> seems perfectly reasonable and I think we should allow it. > I don't think it's a reasonable pattern run background processes that > aren't managed by postgres with all shared memory still > accessible. You'll have to either also detach from shared memory and > related things, or you have to fork() and exec(). The code in question is trying to do that. And what do you think will happen if the exec() fails? > At the very least, not > integrating the child with the postmaster's lifetime will prevent > postgres from restarting because there's still a child attached to the > shared memory. I think you're willfully missing the point. The reason we added atexit_callback was to try to defend ourselves against third-party code that did things in a non-Postgres-aware way. Arguing that such code should do things in a Postgres-aware way is not helpful for the concerns here, and it's not relevant to reality either, because people will load stuff like libperl into backends. Good luck getting a post-fork on_exit_reset() call into libperl. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 04:10 PM, Tom Lane wrote: Florian Weimer writes: On 03/07/2014 03:57 PM, Tom Lane wrote: It's not a reason not to do something about the much larger chance of this happening in a direct child process, which certainly won't have a matching PID. Indeed. Checking getppid() in addition might narrow things down further. I don't think getppid adds much to the party. In particular, what to do if it returns 1? You can't tell if you're an orphaned backend (in which case you should still do normal shutdown) Oh. I didn't know that orphaned backends perform a normal shutdown. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Florian Weimer writes: > On 03/07/2014 03:57 PM, Tom Lane wrote: >> It's not a reason not to do something about the much larger chance of >> this happening in a direct child process, which certainly won't have a >> matching PID. > Indeed. Checking getppid() in addition might narrow things down further. I don't think getppid adds much to the party. In particular, what to do if it returns 1? You can't tell if you're an orphaned backend (in which case you should still do normal shutdown) or an orphaned grandchild. The standalone-backend case would also complicate matters. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 2014-03-07 09:49:05 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2014-03-07 00:03:48 -0500, Tom Lane wrote: > >> In the bug thread I proposed making atexit_callback check whether getpid() > >> still matches MyProcPid. > > > What are you proposing to do in that case? This is only one of the > > failure cases of forking carelessly, right? > > No, I think it should do nothing. The coding pattern shown in bug #9464 > seems perfectly reasonable and I think we should allow it. No doubt it's > safer if the child process does an on_exit_reset; but right now, if the > child fails to do so, atexit_callback is actively breaking things. And > I don't think we can rely on third-party libraries to call on_exit_reset > after forking. I don't think it's a reasonable pattern run background processes that aren't managed by postgres with all shared memory still accessible. You'll have to either also detach from shared memory and related things, or you have to fork() and exec(). At the very least, not integrating the child with the postmaster's lifetime will prevent postgres from restarting because there's still a child attached to the shared memory. I don't see any way it's be safe for a pg unaware library to start forking around and doing similar random things inside normal backends. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 03:57 PM, Tom Lane wrote: I think Florian's right that there's a risk there, but it seems pretty remote, and I don't see any reliable way to detect the case anyhow. (Process start time? Where would you get that from portably?) I don't think there's a portable source for that. On Linux, you'd have to use /proc. It's not a reason not to do something about the much larger chance of this happening in a direct child process, which certainly won't have a matching PID. Indeed. Checking getppid() in addition might narrow things down further. On Linux, linking against pthread_atfork currently requires linking against pthread (although this is about to change), and it might incur the pthread-induced overhead on some configurations. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Heikki Linnakangas writes: > On 03/07/2014 04:23 PM, Florian Weimer wrote: >> There's the PID reuse problem. Forking twice (with a delay) could end >> up with the same PID as MyProcPid. > Not if the parent process is still running. If the original parent backend is *not* still running, then running atexit_callback in the grandchild is just as dangerous if not more so; it could be clobbering shared-memory state belonging to some other session that has recycled the same PGPROC. I think Florian's right that there's a risk there, but it seems pretty remote, and I don't see any reliable way to detect the case anyhow. (Process start time? Where would you get that from portably?) It's not a reason not to do something about the much larger chance of this happening in a direct child process, which certainly won't have a matching PID. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
Andres Freund writes: > On 2014-03-07 00:03:48 -0500, Tom Lane wrote: >> In the bug thread I proposed making atexit_callback check whether getpid() >> still matches MyProcPid. > What are you proposing to do in that case? This is only one of the > failure cases of forking carelessly, right? No, I think it should do nothing. The coding pattern shown in bug #9464 seems perfectly reasonable and I think we should allow it. No doubt it's safer if the child process does an on_exit_reset; but right now, if the child fails to do so, atexit_callback is actively breaking things. And I don't think we can rely on third-party libraries to call on_exit_reset after forking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 2014-03-07 00:03:48 -0500, Tom Lane wrote: > In the bug thread I proposed making atexit_callback check whether getpid() > still matches MyProcPid. What are you proposing to do in that case? This is only one of the failure cases of forking carelessly, right? I think the only appropriate thing would be to make as much noise as possible in that case, which is probably something like writing a message to stderr, and then abort(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 04:23 PM, Florian Weimer wrote: On 03/07/2014 06:03 AM, Tom Lane wrote: In the bug thread I proposed making atexit_callback check whether getpid() still matches MyProcPid. If it doesn't, then presumably we inherited the atexit callback list, along with the value of MyProcPid, from some parent backend process whose elbow we should not joggle. Can anyone see a flaw in that? There's the PID reuse problem. Forking twice (with a delay) could end up with the same PID as MyProcPid. Not if the parent process is still running. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On 03/07/2014 06:03 AM, Tom Lane wrote: In the bug thread I proposed making atexit_callback check whether getpid() still matches MyProcPid. If it doesn't, then presumably we inherited the atexit callback list, along with the value of MyProcPid, from some parent backend process whose elbow we should not joggle. Can anyone see a flaw in that? There's the PID reuse problem. Forking twice (with a delay) could end up with the same PID as MyProcPid. Comparing the process start time would protect against that. Checking getppid() would have the same theoretical problem. -- Florian Weimer / Red Hat Product Security Team -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] atexit_callback can be a net negative
On Fri, Mar 7, 2014 at 2:03 AM, Tom Lane wrote: > In the bug thread I proposed making atexit_callback check whether getpid() > still matches MyProcPid. If it doesn't, then presumably we inherited the > atexit callback list, along with the value of MyProcPid, from some parent > backend process whose elbow we should not joggle. Can anyone see a flaw > in that? While my answer would be "not really" (lots of python libraries do the same to handle forks), there's an optional path: pthread_atfork. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers