Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process
Hi Tom Lane, After removing our patch to change FATAL to LOG, we are not observing the crash now. Thank you for your support. We were struck with this issue for a while. Regards, Sandhya -Original Message- From: Tom Lane [mailto:t...@sss.pgh.pa.us] Sent: Friday, May 12, 2017 10:08 PM To: K S, Sandhya (Nokia - IN/Bangalore)Cc: 'Merlin Moncure' ; 'pgsql-hackers@postgresql.org' ; 'pgsql-b...@postgresql.org' ; Itnal, Prakash (Nokia - IN/Bangalore) ; T, Rasna (Nokia - IN/Bangalore) Subject: Re: [BUGS] Crash observed during the start of the Postgres process "K S, Sandhya (Nokia - IN/Bangalore)" writes: > I have filtered the logs based on PID (19825) to see if this helps to > debug the issue further. Is this really a stock Postgres build? The proximate cause of the PANIC is that the startup process is seeing other processes active even though it hasn't reachedConsistency. This is bad on any number of levels, quite aside from that particular PANIC, because those other processes are presumably seeing non-consistent database state. Looking elsewhere in the log, we see that indeed there seem to be several backend processes happily executing commands. For instance, here's the trace of one of them starting up: [19810-58f473ff.4d62-187] 2017-04-17 07:51:28.783 GMT < > DEBUG: 0: forked new backend, pid=19850 socket=10 [19810-58f473ff.4d62-188] 2017-04-17 07:51:28.783 GMT < > LOCATION: BackendStartup, postmaster.c:3884 [19850-58f47400.4d8a-1] 2017-04-17 07:51:28.783 GMT < > LOG: 57P03: the database system is starting up [19850-58f47400.4d8a-2] 2017-04-17 07:51:28.783 GMT < > LOCATION: ProcessStartupPacket, postmaster.c:2143 [19850-58f47400.4d8a-3] 2017-04-17 07:51:28.784 GMT < authentication> DEBUG: 0: postgres child[19850]: starting with ( Now, that LOG message proves that this backend has observed that the database is not ready to allow connections. So why did it only emit the message as LOG and keep going? The code for this in 9.3 looks like /* * If we're going to reject the connection due to database state, say so * now instead of wasting cycles on an authentication exchange. (This also * allows a pg_ping utility to be written.) */ switch (port->canAcceptConnections) { case CAC_STARTUP: ereport(FATAL, (errcode(ERRCODE_CANNOT_CONNECT_NOW), errmsg("the database system is starting up"))); break; ... I can't draw any other conclusion but that you've hacked something to make FATAL act like LOG. Which is a fatal mistake. Errors that are marked FATAL are generally ones where allowing the process to keep going is not an acceptable outcome. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process
On Fri, May 12, 2017 at 12:38 PM, Tom Lanewrote: > I can't draw any other conclusion but that you've hacked something > to make FATAL act like LOG. Which is a fatal mistake. I see what you did there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process
"K S, Sandhya (Nokia - IN/Bangalore)"writes: > I have filtered the logs based on PID (19825) to see if this helps to > debug the issue further. Is this really a stock Postgres build? The proximate cause of the PANIC is that the startup process is seeing other processes active even though it hasn't reachedConsistency. This is bad on any number of levels, quite aside from that particular PANIC, because those other processes are presumably seeing non-consistent database state. Looking elsewhere in the log, we see that indeed there seem to be several backend processes happily executing commands. For instance, here's the trace of one of them starting up: [19810-58f473ff.4d62-187] 2017-04-17 07:51:28.783 GMT < > DEBUG: 0: forked new backend, pid=19850 socket=10 [19810-58f473ff.4d62-188] 2017-04-17 07:51:28.783 GMT < > LOCATION: BackendStartup, postmaster.c:3884 [19850-58f47400.4d8a-1] 2017-04-17 07:51:28.783 GMT < > LOG: 57P03: the database system is starting up [19850-58f47400.4d8a-2] 2017-04-17 07:51:28.783 GMT < > LOCATION: ProcessStartupPacket, postmaster.c:2143 [19850-58f47400.4d8a-3] 2017-04-17 07:51:28.784 GMT < authentication> DEBUG: 0: postgres child[19850]: starting with ( Now, that LOG message proves that this backend has observed that the database is not ready to allow connections. So why did it only emit the message as LOG and keep going? The code for this in 9.3 looks like /* * If we're going to reject the connection due to database state, say so * now instead of wasting cycles on an authentication exchange. (This also * allows a pg_ping utility to be written.) */ switch (port->canAcceptConnections) { case CAC_STARTUP: ereport(FATAL, (errcode(ERRCODE_CANNOT_CONNECT_NOW), errmsg("the database system is starting up"))); break; ... I can't draw any other conclusion but that you've hacked something to make FATAL act like LOG. Which is a fatal mistake. Errors that are marked FATAL are generally ones where allowing the process to keep going is not an acceptable outcome. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process
Hi Merlin, Below is the log captured when the crash was encountered. STATEMENT: select count(1) from pg_ls_dir(current_setting('data_directory')) where pg_ls_dir = 'backup_label' LOG: 0: duration: 4.083 ms LOCATION: exec_simple_query, postgres.c:1145 DEBUG: 0: shmem_exit(0): 7 callbacks to make LOCATION: shmem_exit, ipc.c:212 DEBUG: 0: proc_exit(0): 3 callbacks to make LOCATION: proc_exit_prepare, ipc.c:184 DEBUG: 0: exit(0) LOCATION: proc_exit, ipc.c:135 DEBUG: 0: shmem_exit(-1): 0 callbacks to make LOCATION: shmem_exit, ipc.c:212 DEBUG: 0: proc_exit(-1): 0 callbacks to make LOCATION: proc_exit_prepare, ipc.c:184 DEBUG: 0: reaping dead processes LOCATION: reaper, postmaster.c:2669 DEBUG: 0: server process (PID 11104) exited with exit code 0 LOCATION: LogChildExit, postmaster.c:3385 DEBUG: 0: reaping dead processes LOCATION: reaper, postmaster.c:2669 LOG: 0: startup process (PID 10265) was terminated by signal 6: Aborted LOCATION: LogChildExit, postmaster.c:3407 LOG: 0: terminating any other active server processes LOCATION: HandleChildCrash, postmaster.c:3134 DEBUG: 0: sending SIGQUIT to process 10994 LOCATION: HandleChildCrash, postmaster.c:3233 DEBUG: 0: sending SIGQUIT to process 10269 LOCATION: HandleChildCrash, postmaster.c:3263 DEBUG: 0: sending SIGQUIT to process 10268 LOCATION: HandleChildCrash, postmaster.c:3275 WARNING: 57P02: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. Backtrace of the core generated: (gdb) bt #0 0x005563bcf9c0 in raise () from /lib64/libc.so.6 #1 0x005563bd42bc in abort () from /lib64/libc.so.6 #2 0x00012039e228 in errfinish () #3 0x00012039ef08 in elog_finish () #4 0x00012009eb08 in btree_redo () #5 0x0001200caff8 in StartupXLOG () #6 0x000120259958 in StartupProcessMain () #7 0x0001200d590c in AuxiliaryProcessMain () #8 0x000120253434 in ?? () Let me know if any further clarification/information is needed. Regards, Sandhya -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Tuesday, April 25, 2017 8:01 PM To: K S, Sandhya (Nokia - IN/Bangalore)Cc: pgsql-hackers@postgresql.org; t...@sss.pgh.pa.us; pgsql-b...@postgresql.org; Itnal, Prakash (Nokia - IN/Bangalore) ; T, Rasna (Nokia - IN/Bangalore) Subject: Re: [BUGS] Crash observed during the start of the Postgres process On Tue, Apr 25, 2017 at 8:44 AM, K S, Sandhya (Nokia - IN/Bangalore) wrote: > Hello, > > Did you get a chance to take a look into the issue? > > Please consider it with high priority. We will be awaiting your inputs. This email is heavily cross posted, which is obnoxious. Can you paste the relevant log snippet? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process
On Tue, Apr 25, 2017 at 8:44 AM, K S, Sandhya (Nokia - IN/Bangalore)wrote: > Hello, > > Did you get a chance to take a look into the issue? > > Please consider it with high priority. We will be awaiting your inputs. This email is heavily cross posted, which is obnoxious. Can you paste the relevant log snippet? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers