Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Andres Freundwrites: > Or, probably more robust: Simply _exit(2) without further ado, and rely > on postmaster to output an appropriate error message. Arguably it's not > actually useful to see hundreds of "WARNING: terminating connection because of > crash of another server process" messages in the log anyway. At that point you might as well skip the entire mechanism and go straight to SIGKILL. IMO the only reason quickdie() exists at all is to try to send a helpful message to the client. And it is helpful --- your attempt to claim that it isn't sounds very much like wishful thinking. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
On 2017-06-22 10:41:41 -0700, Andres Freund wrote: > On 2017-02-02 12:18:22 -0800, Jimmy Yih wrote: > > In the above pull request, Heikki also mentions that a similar scenario can > > happen during palloc() as well... which is similar to what we saw in > > Greenplum a couple years back for a deadlock in a malloc() call where we > > responded by changing exit() to _exit() in quickdie as a fix. That could > > possibly be applicable to latest Postgres as well. > > Isn't the quickdie() issue that we palloc/malloc in the first place, > rather than the exit call? There's some risk for exit() too, but we > reset our own atexit handlers before exiting, so the risk seems fairly > small. > > > In my opinion the fix here would be to do it right and never emit error > messages from signal handlers via elog.c - we've progressed a lot > towards the goal and do a lot less in signal handlers these days - but > quickdie() is one glaring exception to that. I think a reasonable fix > here would be to use write() of a statically allocated message, rather > then elog proper, and not to send the message to the client. Libpq, and > I presume other clients, synthethize a message upon unexpected socket > closure anyway, and it's completely unclear whether we can send a > message anyway. Or, probably more robust: Simply _exit(2) without further ado, and rely on postmaster to output an appropriate error message. Arguably it's not actually useful to see hundreds of "WARNING: terminating connection because of crash of another server process" messages in the log anyway. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
On 2017-02-02 12:18:22 -0800, Jimmy Yih wrote: > In the above pull request, Heikki also mentions that a similar scenario can > happen during palloc() as well... which is similar to what we saw in > Greenplum a couple years back for a deadlock in a malloc() call where we > responded by changing exit() to _exit() in quickdie as a fix. That could > possibly be applicable to latest Postgres as well. Isn't the quickdie() issue that we palloc/malloc in the first place, rather than the exit call? There's some risk for exit() too, but we reset our own atexit handlers before exiting, so the risk seems fairly small. In my opinion the fix here would be to do it right and never emit error messages from signal handlers via elog.c - we've progressed a lot towards the goal and do a lot less in signal handlers these days - but quickdie() is one glaring exception to that. I think a reasonable fix here would be to use write() of a statically allocated message, rather then elog proper, and not to send the message to the client. Libpq, and I presume other clients, synthethize a message upon unexpected socket closure anyway, and it's completely unclear whether we can send a message anyway. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket
On Thu, Feb 2, 2017 at 3:18 PM, Jimmy Yihwrote: > In that pull request, we fix the issue by checking for proc_exit_inprogress. > Is there a reason why startup_die should not check for proc_exit_inprogress? startup_die() is just calling proc_exit(), so it seems like it might be better to fix it by putting the check into proc_exit itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] possible self-deadlock window after bad ProcessStartupPacket
Hello, There may possibly be a very small window for a double exit() self-deadlock during a forked backend process's ProcessStartupPacket returns and status is not STATUS_OK. The process will go into proc_exit and then a very well timed SIGQUIT will call startup_die for another proc_exit. If timed correctly, the two exit() calls can deadlock since exit() is not re-entrant. It seems extremely hard to hit the deadlock but I believe the opportunity is there. Using gdb, I was able to create the window and get this stacktrace: #0 startup_die (postgres_signal_arg=0) at postmaster.c:5090 #1 #2 proc_exit_prepare (code=0) at ipc.c:158 #3 0x007c4135 in proc_exit (code=0) at ipc.c:102 #4 0x0076b736 in BackendInitialize (port=0x2c13740) at postmaster.c:4207 #5 0x0076b190 in BackendStartup (port=0x2c13740) at postmaster.c:3979 #6 0x00767ad3 in ServerLoop () at postmaster.c:1722 #7 0x007671df in PostmasterMain (argc=3, argv=0x2bebad0) at postmaster.c:1330 #8 0x006b5df6 in main (argc=3, argv=0x2bebad0) at main.c:228 I got the stacktrace by doing the following: 1. gdb attach to postmaster and run set follow-fork child and break postmaster.c:4206(right after ProcessStartupPacket) and continue 2. In another terminal, open a psql session which should trigger the gdb follow 3. In the gdb session, set status=1 and step into proc_exit() 4. In another terminal, kill -s QUIT to send SIGQUIT to the child process. Or run pg_ctl stop -M immediate. 5. In the gdb session, step to process the signal into startup_die and run bt This was discovered while hacking on Greenplum Database (currently based off of Postgres 8.3) where we recently started encountering the self-deadlock intermittently in our testing environment. Here's the pull request discussion: https://github.com/greenplum-db/gpdb/pull/1662 In that pull request, we fix the issue by checking for proc_exit_inprogress. Is there a reason why startup_die should not check for proc_exit_inprogress? In the above pull request, Heikki also mentions that a similar scenario can happen during palloc() as well... which is similar to what we saw in Greenplum a couple years back for a deadlock in a malloc() call where we responded by changing exit() to _exit() in quickdie as a fix. That could possibly be applicable to latest Postgres as well. Regards, Jimmy